I hypothesized that this isn't a retrieval limit, but a compression limit.
I built Numen, a retrieval engine based on high-dimensional sparse-dense n-gram hashing (32k dimensions) rather than learned embeddings.
The Results (on LIMIT test set):
BM25 (Baseline): 93.6% E5-Mistral: 8.3% GritLM 7B: 12.9% Numen (My implementation): 93.9% It beats BM25 while maintaining a vector architecture, completely sidestepping the geometric bottleneck of dense models.
The benchmark script ( numen.ipynb ) is in the repo for reproduction.