The system actually combines both approaches on every query - gets semantic matches via TF-IDF, retrieves structural relationships from Neo4j, then feeds both contexts to OpenAI for comprehensive answers.
Used the Indian Income Tax Act as test data since legal documents have natural graph structures. Queries like "What sections reference Section 80C?" get both the reference network AND content explanations.
Full transparency: includes some AI-assisted code as I was learning Neo4j/graph concepts, but the hybrid architecture and problem framing are mine.
Tech stack: Python, Neo4j, OpenAI API, scikit-learn (TF-IDF), numpy. Docker + Makefile for easy setup.
Would love feedback on this pattern for other structured documents.
tushr•3h ago
srijanshukla18•2h ago