Here's what it looks like:
import sqlite3
conn = sqlite3.connect(":memory:")
conn.load_extension("./libgraph.so")
conn.execute("CREATE VIRTUAL TABLE graph USING graph()")
# Create a social network
conn.execute("""SELECT cypher_execute('
CREATE (alice:Person {name: "Alice", age: 30}),
(bob:Person {name: "Bob", age: 25}),
(alice)-[:KNOWS {since: 2020}]->(bob)
')""")
# Query the graph with relationship patterns
conn.execute("""SELECT cypher_execute('
MATCH (a:Person)-[r:KNOWS]->(b:Person)
WHERE a.age > 25
RETURN a, r, b
')""")
The interesting part was building the complete execution pipeline - lexer, parser, logical planner, physical planner, and an iterator-based executor using the Volcano model. All in C99 with no dependencies beyond SQLite.What works now: - Full CREATE: nodes, relationships, properties, chained patterns (70/70 openCypher TCK tests) - MATCH with relationship patterns: (a)-[r:TYPE]->(b) with label and type filtering - WHERE clause: property comparisons on nodes (=, >, <, >=, <=, <>) - RETURN: basic projection with JSON serialization - Virtual table integration for mixing SQL and Cypher
Performance: - 340K nodes/sec inserts (consistent to 1M nodes) - 390K edges/sec for relationships - 180K nodes/sec scans with WHERE filtering
Current limitations (alpha): - Only forward relationships (no `<-[r]-` or bidirectional `-[r]-`) - No relationship property filtering in WHERE (e.g., `WHERE r.weight > 5`) - No variable-length paths yet (e.g., `[r*1..3]`) - No aggregations, ORDER BY, property projection in RETURN - Must use double quotes for strings: {name: "Alice"} not {name: 'Alice'}
This is alpha - API may change. But core graph query patterns work! The execution pipeline handles CREATE/MATCH/WHERE/RETURN end-to-end.
Next up: bidirectional relationships, property projection, aggregations. Roadmap targets full Cypher support by Q1 2026.
Built as part of Agentflare AI, but it's standalone and MIT licensed. Would love feedback on what to prioritize.
GitHub: https://github.com/agentflare-ai/sqlite-graph
Happy to answer questions about the implementation!
jeffreyajewett•2h ago
gwillen85•1h ago
mentalgear•1h ago
gwillen85•1h ago
leetrout•1h ago
gwillen85•1h ago
My theory: models are heavily trained on HTML/XML and many use XML tags in their own system prompts, so they're naturally fluent in that syntax. Makes nested structures more reliable in our testing.
Structured output endpoints help JSON a lot though.
mentalgear•1h ago
gwillen85•1h ago
But you're right that structured output endpoints make JSON generation more reliable, so supporting both formats long-term makes sense.