Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

https://github.com/agentflare-ai/sqlite-graph

18•gwillen85•2h ago

I've been working on adding graph database capabilities to SQLite with support for the Cypher query language. As of this week, both CREATE and MATCH operations work with full relationship support.

Here's what it looks like:

    import sqlite3
    conn = sqlite3.connect(":memory:")
    conn.load_extension("./libgraph.so")
    
    conn.execute("CREATE VIRTUAL TABLE graph USING graph()")
    
    # Create a social network
    conn.execute("""SELECT cypher_execute('
        CREATE (alice:Person {name: "Alice", age: 30}),
               (bob:Person {name: "Bob", age: 25}),
               (alice)-[:KNOWS {since: 2020}]->(bob)
    ')""")
    
    # Query the graph with relationship patterns
    conn.execute("""SELECT cypher_execute('
        MATCH (a:Person)-[r:KNOWS]->(b:Person) 
        WHERE a.age > 25 
        RETURN a, r, b
    ')""")

The interesting part was building the complete execution pipeline - lexer, parser, logical planner, physical planner, and an iterator-based executor using the Volcano model. All in C99 with no dependencies beyond SQLite.

What works now: - Full CREATE: nodes, relationships, properties, chained patterns (70/70 openCypher TCK tests) - MATCH with relationship patterns: (a)-[r:TYPE]->(b) with label and type filtering - WHERE clause: property comparisons on nodes (=, >, <, >=, <=, <>) - RETURN: basic projection with JSON serialization - Virtual table integration for mixing SQL and Cypher

Performance: - 340K nodes/sec inserts (consistent to 1M nodes) - 390K edges/sec for relationships - 180K nodes/sec scans with WHERE filtering

Current limitations (alpha): - Only forward relationships (no `<-[r]-` or bidirectional `-[r]-`) - No relationship property filtering in WHERE (e.g., `WHERE r.weight > 5`) - No variable-length paths yet (e.g., `[r*1..3]`) - No aggregations, ORDER BY, property projection in RETURN - Must use double quotes for strings: {name: "Alice"} not {name: 'Alice'}

This is alpha - API may change. But core graph query patterns work! The execution pipeline handles CREATE/MATCH/WHERE/RETURN end-to-end.

Next up: bidirectional relationships, property projection, aggregations. Roadmap targets full Cypher support by Q1 2026.

Built as part of Agentflare AI, but it's standalone and MIT licensed. Would love feedback on what to prioritize.

GitHub: https://github.com/agentflare-ai/sqlite-graph

Happy to answer questions about the implementation!

Comments

jeffreyajewett•2h ago

Nothing says weekend project like writing a Cypher planner from scratch in C99. We also recently launched AgentML -> check it out https://github.com/agentflare-ai/agentml (ALSO MIT)

gwillen85•1h ago

This will also be used in the yet to be released `memlite` which is our first wasm component for AgentML

mentalgear•1h ago

Interesting, yet the xml syntax feels quite verbose vs JSON for example.

gwillen85•1h ago

I agree but LLMs are very good at generating XML. Additionally SCXML which AgentML extends has been around and finalized for over 15 years. So generating AgentML works incredibly well.

leetrout•1h ago

I'm also curious if you know if anyone has any definitive test sets on this? Kind of like how Simon Willison uses the bird on the bicycle?

gwillen85•1h ago

Good question - we're working on case studies for this.

My theory: models are heavily trained on HTML/XML and many use XML tags in their own system prompts, so they're naturally fluent in that syntax. Makes nested structures more reliable in our testing.

Structured output endpoints help JSON a lot though.

mentalgear•1h ago

I get your point, however I wonder how much better they are than JSON when using structured output endpoints, which is likely what you would want to use with such a format.

gwillen85•1h ago

That's a fair point. We're considering adding JSON as a first-class citizen alongside XML - similar to OpenAPI supporting both JSON and YAML.

But you're right that structured output endpoints make JSON generation more reliable, so supporting both formats long-term makes sense.

leetrout•1h ago

I have an ELI5 question...

So you're doing the planning and execution which results in what? Some direct calls into sqlite that create tables? Under the hood is this using tables in a conventional manner where there are adjacency lists or just edges and vertexes or ... ?

I'm looking at `graphFindEdgesByType` and it says they're done with SQL queries - are you effectively transpiling some of the Cypher or just have routines that build queries as needed?

Thanks!

gwillen85•1h ago

Great Question!

The storage model is just regular SQLite tables. When you create a graph, it makes two backing tables: my_graph_nodes -- id, labels (JSON array), properties (JSON object) my_graph_edges -- id, source, target, edge_type, properties (JSON object) It's an edge list, not adjacency lists.

Query processing is not transpiling Cypher directly. There's a pipeline: Cypher → AST → Logical Plan → Physical Plan (optimizer) → Iterators → SQL queries The iterators generate SQL on-the-fly to fetch from those backing tables. Basically the Volcano model.

graphFindEdgesByType is Actually deprecated and is a no-op now. The comment says "edge lookups are done via SQL queries." They used to have in-memory structures but moved to just generating SQL like: SELECT e.target, e.id, e.edge_type FROM my_graph_edges e WHERE e.source = 123 AND e.edge_type = 'KNOWS'

So it's "build SQL queries as needed during execution" rather than "transpile the whole Cypher query upfront."

mentalgear•1h ago

I like the ambition and the open-source spirit behind your project! Open-source graph databases are fantastic.

That said, I’d encourage you to consider leveraging existing projects rather than starting from scratch. There are already mature, local / in-browser graph databases that could benefit from your skills and vision.

For example:

- Kuzu https://github.com/kuzudb/kuzu: This project had very active development but was recently archived (as of October 10, 2025). Continuiing or forking it could be a game-changer for the community.

- Cozodb https://www.cozodb.org/ It’s very feature-rich and actively seeking contributors. Your expertise could help push it even further.

I do get the appeal of building something from the ground up; it’s incredibly rewarding. But achieving production readiness is seriously challenging and time-consuming. These projects are already years ahead in scope, so contributing to them could accelerate your impact and save you from reinventing the wheel.

gwillen85•1h ago

Thanks for the suggestions! I'm familiar with both. Different category though - this is a SQLite extension, not a standalone database. The value prop is:

Zero friction - If you're already using SQLite (Python scripts, mobile apps, embedded systems), just .load graph_extension and you have graph capabilities Mix SQL + Cypher - Join your relational tables with graph traversals in the same query Works everywhere SQLite works - Serverless functions, Raspberry Pi, iOS apps, wherever Leverage SQLite's ecosystem - All existing tools, bindings, deployment patterns just work

Kuzu and CozoDB are excellent if you want a dedicated graph database. But if you've already got SQLite (which is everywhere), this lets you add graph features without rearchitecting.

Think of it like SQLite's FTS5 extension for full-text search - you're not competing with Elasticsearch, you're giving SQLite users a lightweight option that fits their existing workflow.

Show HN: Pipelex – Declarative language for repeatable AI workflows

Show HN: Research Hacker News, ArXiv & Google with Hierarchical Bayesian Models

Show HN: Learn German with Games

Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

Show HN: Are You a Good Estimator?

Show HN: HUD-like live annotation and sketching app for macOS

Show HN: Simple Video Resizer for iOS. No ads/tracking/in-app purchases

Show HN: Run Independent ATProto Networks in Docker or Kubernetes

Show HN: I Built an LSP and CLI for Ron (Rusty Object Notation)

Show HN: Bash Screensavers

Show HN: Free Geo (SEO for LLM)

Show HN: ISS in Real Time – 25 Years Aboard the International Space Station

Show HN: Kedr Programming Language

Show HN: Oblivious HTTP for Go

Show HN: Automate robot data quality improvement

Show HN: Butter – A Behavior Cache for LLMs

Show HN: Dexto – Connect your AI Agents with real-world tools and data

Show HN: Qwe – Atomic Version Control System

Show HN: JSON Query

Show HN: MyraOS – My 32-bit operating system in C and ASM (Hack Club project)

Show HN: Write Go code in JavaScript files

Show HN: Erdos – open-source, AI data science IDE

Show HN: GPU-Based Autorouting for KiCad

Show HN: Emotive Engine – Animation engine with musical time (not milliseconds)

Show HN: Dlog – Journaling and AI coach that learns what drives wellbeing (Mac)

Show HN: Git Auto Commit (GAC) – LLM-powered Git commit command line tool

Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf

Show HN: Ordered – A sorted collection library for Zig

Show HN: I was tired of people dmming me just "hi", so I made this - NoGreeting

Show HN: HortusFox – FOSS system for houseplants with enterprise-scale features

Show HN: Pipelex – Declarative language for repeatable AI workflows

Show HN: Research Hacker News, ArXiv & Google with Hierarchical Bayesian Models

Show HN: Learn German with Games

Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

Show HN: Are You a Good Estimator?

Show HN: HUD-like live annotation and sketching app for macOS

Show HN: Simple Video Resizer for iOS. No ads/tracking/in-app purchases

Show HN: Run Independent ATProto Networks in Docker or Kubernetes

Show HN: I Built an LSP and CLI for Ron (Rusty Object Notation)

Show HN: Bash Screensavers

Show HN: Free Geo (SEO for LLM)

Show HN: ISS in Real Time – 25 Years Aboard the International Space Station

Show HN: Kedr Programming Language

Show HN: Oblivious HTTP for Go

Show HN: Automate robot data quality improvement

Show HN: Butter – A Behavior Cache for LLMs

Show HN: Dexto – Connect your AI Agents with real-world tools and data

Show HN: Qwe – Atomic Version Control System

Show HN: JSON Query

Show HN: MyraOS – My 32-bit operating system in C and ASM (Hack Club project)

Show HN: Write Go code in JavaScript files

Show HN: Erdos – open-source, AI data science IDE

Show HN: GPU-Based Autorouting for KiCad

Show HN: Emotive Engine – Animation engine with musical time (not milliseconds)

Show HN: Dlog – Journaling and AI coach that learns what drives wellbeing (Mac)

Show HN: Git Auto Commit (GAC) – LLM-powered Git commit command line tool

Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf

Show HN: Ordered – A sorted collection library for Zig

Show HN: I was tired of people dmming me just "hi", so I made this - NoGreeting

Show HN: HortusFox – FOSS system for houseplants with enterprise-scale features

Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

Comments