I built an open-source system called Horaculo that analyzes coordination and divergence across financial news sources.
The goal is to quantify narrative alignment, entropy shifts, and historical source reliability.
Pipeline
Fetch 50–100 articles (NewsAPI)
Extract claims (NLP preprocessing)
Generate sentence embeddings (HuggingFace)
Compute cosine similarity in C++ (AVX2 + INT8 quantization)
Cluster narratives
Compute entropy + coordination metrics
Weight results using historical source credibility
Output structured JSON signals
Example Output (query: “oil”)
Json
Copiar código
{
"verdict": {
"winner_source": "Reuters",
"intensity": 0.85,
"entropy": 1.92
},
"psychology": {
"mood": "Fear",
"is_trap": true,
"coordination_score": 0.72
}
}
What it measures
Intensity → narrative divergence
Entropy → informational disorder
Coordination score → cross-source alignment
Credibility weighting → historical consensus accuracy per source
Performance
1.4s per query (~10 sources)
~100 queries/min
~150MB memory footprint
Python-only version was ~12s
C++ optimizations:
INT8 embedding quantization (4x size reduction)
AVX2 SIMD vectorized cosine similarity
PyBind11 integration layer
Storage
SQLite (local memory)
Optional Postgres
Each source builds a rolling credibility profile:
Json
Copiar código
{
"source": "Reuters",
"total_scans": 342,
"consensus_hits": 289,
"credibility": 0.85
}
Open Source (MIT)
GitHub: [
https://github.com/ANTONIO34346/HORACULO]
I'm particularly interested in feedback on:
The entropy modeling approach
Coordination detection methodology
Whether FAISS would be a better fit than the current SIMD engine
Scalability strategies for 100k+ embeddings