frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust)

https://github.com/HelixDB/helix-db/
83•GeorgeCurtis•4h ago
Hey HN, we want to share HelixDB (https://github.com/HelixDB/helix-db/), a project a college friend and I are working on. It’s a new database that natively intertwines graph and vector types, without sacrificing performance. It’s written in Rust and our initial focus is on supporting RAG. Here’s a video runthrough: https://screen.studio/share/szgQu3yq.

Why a hybrid? Vector databases are useful for similarity queries, while graph databases are useful for relationship queries. Each stores data in a way that’s best for its main type of query (e.g. key-value stores vs. node-and-edge tables). However, many AI-driven applications need both similarity and relationship queries. For example, you might use vector-based semantic search to retrieve relevant legal documents, and then use graph traversal to identify relationships between cases.

Developers of such apps have the quandary of needing to build on top of two different databases—a vector one and a graph one—plus you have to link them together and sync the data. Even then, your two databases aren't designed to work together—for example, there’s no native way to perform joins or queries that span both systems. You’ll need to handle that logic at the application level.

Helix started when we realized that there are ways to integrate vector and graph data that are both fast and suitable for AI applications, especially RAG-based ones. See this cool research paper: https://arxiv.org/html/2408.04948v1. After reading that and some other papers on graph and hybrid RAG, we decided to build a hybrid DB. Our aim was to make something better to use from a developer standpoint, while also making it fast as hell.

After a few months of working on this as a side project, our benchmarking shows that we are on par with Pinecone and Qdrant for vectors, and our graph is up to three orders of magnitude faster than Neo4j.

Problems where a hybrid approach works particularly well include:

- Indexing codebases: you can vectorize code-snippets within a function (connected by edges) based on context and then create an AST (in a graph) from function calls, imports, dependencies, etc. Agents can look up code by similarity or keyword and then traverse the AST to get only the relevant code, which reduces hallucinations and prevents the LLM from guessing object shapes or variable/function names.

- Molecule discovery: Model biological interactions (e.g., proteins → genes → diseases) using graph types and then embed molecule structures to find similar compounds or case studies.

- Enterprise knowledge management: you can represent organisational structure, projects, and people (e.g., employee → team → project) in graph form, then index internal documents, emails, or notes as vectors for semantic search and link them directly employees/teams/projects in the graph.

I naively assumed when learning about databases for the first time that queries would be compiled and executed like functions in traditional programming. Turns out I was wrong, but this creates unnecessary latency by sending extra data (the whole written query), compiling it at run time, and then executing it. With Helix, you write the queries in our query language (HelixQL), which is then transpiled into Rust code and built directly into the database server, where you can call a generated API endpoint.

Many people have a thing against “yet another query language” (doubtless for good reason!) but we went ahead and did it anyway, because we think it makes working with our database so much easier that it’s worth a bit of a learning curve. HelixQL takes from other query languages such as Gremlin, Cypher and SQL with some extra ideas added in. It is declarative while the traversals themselves are functional. This allows complete control over the traversal flow while also having a cleaner syntax. HelixQL returns JSON to make things easy for clients. Also, it uses a schema, so the queries are type-checked.

We took a crude approach to building the original graph engine as a way to get an MVP out, so we are now working on improving the graph engine by making traversals massively parallel and pipelined. This means data is only ever decoded from disk when it is needed, and parts of reads are all processed in parallel.

If you’d like to try it out in a simple RAG demo, you can follow this guide and run our Jupyter notebook: https://github.com/HelixDB/helix-db/tree/main/examples/rag_d...

Many thanks! Comments and feedback welcome!

Comments

sync•4h ago
Looks nice! Are you looking to compete with https://www.falkordb.com or do something a bit different?
GeorgeCurtis•4h ago
Pretty much, our biggest focus is on Graph and Hybrid RAG. They seem to have really honed in on Graph RAG since the last time I checked their website.

One of the problems I know people experience with them is that they're super slow at bulk reading.

Oh also, they aren't built in Rust haha

esafak•4h ago
How does it compare with https://kuzudb.com/ ?
GeorgeCurtis•3h ago
Kuzu don't support incremental indexing on the vectors. The vector index is completely separate and decoupled from the graph.

I.e: You have to re-index all of the vectors when you make an update to them.

SchwKatze•3h ago
Super cool!!! I'll try it this week and go back to give a feedback.
GeorgeCurtis•3h ago
I look forward to it :)
hbcondo714•3h ago
Congrats! Any chance Helixdb can be run in the browser too, maybe via WASM? I'm looking for a vector db that can be pre-populated on the server and then be searched on the client so user queries (chat) stay on-device for privacy / compliance reasons.
GeorgeCurtis•3h ago
Interesting, we've had a few people ask about this. So essentially you'd call the server to retrieve the HNSW and then store it in the browser and use WASM to query it?

Currently the road block for that is the LMDB storage engine. We have on our own storage engine on our roadmap, which we want to include WASM support with. If you wanna talk about it reach out to my twitter: https://x.com/georgecurtiss

J_Shelby_J•3h ago
How do you think about building the graph relationships? Any special approaches you use?
GeorgeCurtis•3h ago
Pretty much the same way you would with any graph DB, with the added benefit of being able to treat a vector as a node by creating those explicit relationships between them.

Does that answer your question properly?

carlhjerpe•3h ago
Nice "I'll have this name" when there's already the helix editor :)
GeorgeCurtis•3h ago
First I'm hearing from it. The Beatles must've been super pissed when Apple took their name :(
carlhjerpe•3h ago
https://crates.io/search?q=Helix

I'm surprised none in the team searched crates.io once before picking the name. Good luck!

GeorgeCurtis•2h ago
we just started off as a side project and thought the name fitted well. With the strands, graph type structure, connections...

We didn't think of getting people to use it until we found it was solving a real pain point for people, so weren't worried about trademarks or names. There was no other helix db so that was good enough for us at the time.

carlhjerpe•2h ago
It's not the end of the world, just me being a bit grumpy. I mean it when I say good luck! :)
GeorgeCurtis•1h ago
Thank you :)
tavianator•1h ago
> There was no other helix db

https://en.wikipedia.org/wiki/Helix_(database)

GeorgeCurtis•1h ago
There was no active one. We saw this and thought it would be a nice nod to history. We've actually spoken to some developers at apple who thought this was really neat :)
itishappy•1h ago
I don't think `helix-editor` is even on crates.io, just placeholders.

https://github.com/helix-editor/helix/discussions/7038

That being said, when I saw `helix-db` I was thrown too. "What's a text editor doing writing a vector-graph database, I thought they were working on plugins?"

bbatsell•2h ago
I can't tell if this is droll sarcasm, but just in case not...

https://en.wikipedia.org/wiki/Apple_Corps_v_Apple_Computer

cormullion•2h ago
perhaps it’s a homage to the famous Helix database (see Wikipedia)
GeorgeCurtis•2h ago
well noted
javierluraschi•3h ago
What is the max number of dimensions supported for a vector?
GeorgeCurtis•3h ago
There is currently no cap. We will probably impose a similar cap to Qdrant or Pinecone some time soon ~64k. There's obviously a performance trade off as you go up, but we hope to massively offset this by doing binary quantisation within the next couple of months.
huevosabio•3h ago
Can I run this as an embedded DB like sqlite?

Can I sidestep the DSL? I want my LLMs to generate queries and using a new language is going to make that hard or expensive.

GeorgeCurtis•2h ago
Currently you can't run us embedded and I'm not sure how you could sidestep the DSL :/

We're working on putting our grammar in llama's cpp code so that it only outputs grammatically correct HQL. But, even without that it shouldn't be hard or expensive to do. I wrote a Claude wrapper that had our docs in its context window, it did a good job of writing queries most of the time.

tough•17m ago
you could refactor your claude-wrapper into a mcp-server maybe

how does llama's cpp special sauce work to enforce outputs syntax?

elpalek•2h ago
What method/model are you using for sparse search?
GeorgeCurtis•1h ago
We're going to use BM25. Currently it is just dense search. Coming very soon
elpalek•1h ago
have you thought about SPALDE models? ex: https://arxiv.org/abs/2109.10086
GeorgeCurtis•1h ago
Looks really interesting, I'll have a proper read. What would be your reasoning to incorporate this if we already have vector functionality and semantic search?
mdaniel•2h ago
> so much easier that it’s worth a bit of a learning curve

I think you misspelled "vendor lock in"

GeorgeCurtis•1h ago
You can literally use us for free haha. There's not a language that properly encapsulates graph and vector functionality, so we needed to make our own. Also, we thought it was dumb that query languages weren't type-safe... So we changed that
basonjourne•1h ago
why not surrealdb?
GeorgeCurtis•27m ago
General consensus is it's really slow, I like the concept of surreal though. Our first, and extremely bare bones, version of the graph db was 1-2 orders of magnitude faster than surreal (we haven't run benchmarks against surreal recently, but I'll put them here when we're done)
Attummm•1h ago
It sounds very intriguing indeed. However, the README makes some claims. Are there any benchmarks to support them?

> Built for performance we're currently 1000x faster than Neo4j, 100x faster than TigerGraph

GeorgeCurtis•1h ago
Those were actual benchmarks that we run, we didn't get a chance to write them out before posting. I'll get on it now and notify by replying to this comment when they're on the readme :)
rohanrao123•41m ago
Congrats on the launch! I'm one of the authors of that paper you cited, glad it was useful and inspiring to building this :) Let me know if we can support in any way!
GeorgeCurtis•15m ago
Wow! I enjoyed reading it a lot and it was definitely inspiring for this project!

Would love to talk to you about it and make sure we capture all of the pain points if you're open to it? :)

tmpfs•28m ago
This is very interesting, are there any examples of interacting with LLMs? If the queries are compiled and loaded into the database ahead of time the pattern of asking an LLM to generate a query from a natural language request seems difficult because current LLMs aren't going to know your query language yet and compiling each query for each prompt would add unnecessary overhead.
raufakdemir•28m ago
How can I migrate neo4j to this?
GeorgeCurtis•25m ago
We can build an ingestion engine for you :)

We've built SQL and PGVector ones already, just waiting for someone who could make use of other ones before we build them.

Let us know! Twitter in my bio

lennertjansen•7m ago
how did you get it 3 OOMs faster than neo4j?

Type-Constrained Code Generation with Language Models

https://arxiv.org/abs/2504.09246
1•tough•11s ago•0 comments

Eliminating Array Bounds Checks

https://www.romainguy.dev/posts/2025/eliminating-array-bounds-checks/
1•mfiguiere•2m ago•0 comments

How (memory) safe is Zig?

https://www.scattered-thoughts.net/writing/how-safe-is-zig/
1•vortex_ape•3m ago•0 comments

Ask HN: Relational DB to Graph DB?

1•mysteriousBag•3m ago•0 comments

The Colorless Man (Short Film Made with a $600 Budget)

https://old.reddit.com/r/midjourney/comments/1kls7kl/the_colorless_man_short_film_made_with_a_600/
1•rubslopes•4m ago•0 comments

Preview release of ty, a type checker for Python

https://twitter.com/charliermarsh/status/1922333022658978089
1•jez•8m ago•0 comments

Show HN: Simple AI-powered commit msgs script

https://tomdekan.com/articles/ai-commit-messages
1•tomdekan•13m ago•0 comments

OpenAI Is in Talks to Acquire Programming Tool Windsurf for $3B

https://www.nytimes.com/2025/05/13/technology/openai-windsurf-talks.html
1•donohoe•17m ago•0 comments

Y Combinator says Google is a monopolist, no comment about its OpenAI ties

https://techcrunch.com/2025/05/13/y-combinator-says-google-is-a-monopolist-that-has-stunted-the-startup-ecosystem/
7•mastazi•17m ago•0 comments

Consultant Means Nothing (and Everything) – breaking down the mess of labels

https://davidraistrick.com/blog/2025-05-13-consultant-means-nothing/
1•keen99•21m ago•1 comments

The Vibes

https://taoofmac.com/space/blog/2025/05/13/2230
1•rcarmo•21m ago•0 comments

Florida Ban on "… Lewd Conduct" … Where Children Are Present Struck Down

https://reason.com/volokh/2025/05/13/florida-ban-on-depicting-or-simulating-lewd-conduct-in-performances-where-children-are-present-struck-down/
1•treetalker•22m ago•0 comments

The Penultimate Conditional Syntax

https://dotat.at/@/2025-05-13-if-is.html
1•todsacerdoti•22m ago•0 comments

Pete Rose among players reinstated by MLB commissioner Rob Manfred

https://www.cincinnati.com/story/sports/mlb/reds/2025/05/13/pete-rose-reinstated-mlb-commissioner-rob-manfred-eligible-baseball-hall-of-fame/83605873007/
1•gscott•22m ago•0 comments

AI Hallucination in Filings Involving … Law Firm Lead to $31K in Sanctions

https://reason.com/volokh/2025/05/13/ai-hallucination-in-filings-involving-14th-largest-u-s-law-firm-lead-to-31k-in-sanctions/
1•treetalker•24m ago•0 comments

Cardiac: A CARDboard Illustrative Aid to Computation [pdf]

https://www.cs.drexel.edu/~bls96/museum/CARDIAC_manual.pdf
2•throwaway71271•25m ago•0 comments

Y Combinator's Little Tech Summit was a bizarre snapshot of DC

https://www.theverge.com/politics/651439/maga-tech-populism-antitrust-bannon-lina-khan-ftc
2•baobun•26m ago•0 comments

Insurers seek to surcharge California homeowners for L.A. County fire costs

https://www.latimes.com/business/story/2025-05-13/insurers-seeking-to-surcharge-california-homeowners-for-l-a-county-fire-costs
1•speckx•26m ago•0 comments

Research: Gen AI Makes People More Productive–and Less Motivated

https://hbr.org/2025/05/research-gen-ai-makes-people-more-productive-and-less-motivated?ab=HP-hero-featured-1
1•pseudolus•27m ago•0 comments

FreeBSD fans rally round zVault upstart

https://www.theregister.com/2025/05/12/second_preview_zvault/
1•rodrigo975•28m ago•0 comments

Matrix3D: Large Photogrammetry Model All-in-One

https://nju-3dv.github.io/projects/matrix3d/
2•bentocorp•34m ago•0 comments

Gravity Could Be Proof We're Living in a Computer Simulation

https://gizmodo.com/gravity-could-be-proof-were-living-in-a-computer-simulation-new-theory-suggests-2000601707
2•jchrisa•35m ago•0 comments

Amazon warns it'll terminate your account for screenshotting Prime Video

https://www.neowin.net/news/amazon-allegedly-warns-itll-terminate-your-account-for-screenshotting-prime-video/
6•bundie•37m ago•2 comments

Adwaita Sans and Mono Typefaces

https://gitlab.gnome.org/GNOME/adwaita-fonts
1•cl3misch•38m ago•0 comments

Hearing Intervention, Social Isolation, and Loneliness

https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2833601
1•bookofjoe•38m ago•0 comments

ESPN direct-to-consumer streaming service will debut at $29.99 a month

https://apnews.com/article/espn-streaming-sportscenter-app-ec2b43ab06a2e1a874894486c8102434
1•geox•38m ago•1 comments

Show HN: A C-Suite AI Agent Meant for SMB

https://askcaa.com/
1•iamasuperuser•41m ago•1 comments

Advanced Protection: Google's Strongest Security for Mobile Devices

https://security.googleblog.com/2025/05/advanced-protection-mobile-devices.html
2•canucker2016•42m ago•0 comments

Ask HN: Can anybody please explain this hack?

2•GWBullshit•42m ago•5 comments

Fixrleak: Fixing Java Resource Leaks with GenAI

https://www.uber.com/blog/fixrleak-fixing-java-resource-leaks-with-genai/
1•carimura•49m ago•0 comments