Hey folks! I’m Matt, CEO at Instant Domain Search. Quick summary: we distilled LLM judgments into a 22.7M-parameter embedding model and optimized CPU inference to deliver sub-10ms latency for semantic domain matches (correlation ≈0.87 with GPT-4).
The post walks through our training signal, distillation choices, quantization, index layout, and production latency/CPU learnings.
We’re a small team of 4 engineers building free, wicked fast search tools. AMA or feedback welcome!
mattmmm•5h ago
The post walks through our training signal, distillation choices, quantization, index layout, and production latency/CPU learnings.
We’re a small team of 4 engineers building free, wicked fast search tools. AMA or feedback welcome!