fp.

LLM runs once, offline, at build time — generates every possible way a human might phrase an intent. 22,614 exemplars compiled into an 8.5MB HDC vector space. At runtime: pure math, no LLM, 7ms.

Results across 85,125 test queries:

  First-pass action accuracy:    89.6%
  Action accuracy (with ASK):    100%
  App accuracy:                  100%
  Silent errors:                 0%
  Latency p95:                   <13ms
  Tokens per query:              0
  Model size:                    8.5MB, no GPU
  Improves with use:             Yes

The 10.4% that trigger ASK aren't failures. The system asks rather than guesses, the correct action is always in the candidate set, and every resolved ASK strengthens the model via Hebbian reinforcement. No retraining. No labeling pipeline. The production model is the learning model.

GPT-4o hits 98.5% accuracy — when given a pre-filtered shortlist of 200 actions and human-readable action keys. It can't do app selection across 3,146 apps. HDC does the whole thing in 7ms and gets better with every use.

We benchmarked honestly — full methodology in the paper including where GPT-4o wins.

Patent pending: US 63/969,729 Covers: build-time LLM→HDC pipeline, confidence gating, Hebbian self-improvement without retraining.

White paper + benchmark + Docker quickstart: https://github.com/glyphh-ai/model-pipedream