Show HN: I wrote a full text search engine in Go

47•novocayn•3h ago

Comments

kdawkins•2h ago

This is very cool! Your readme is intersting and well written - I didn't know I could be so interested in the internals of a full text search engine :)

What was the motivation to kick this project off? Learning or are you using it somehow?

novocayn•2h ago

I’m learning the internals of FTS engines while building a vector database from scratch. Needed a solid FTS index, so I built one myself :)

It ended up being a clean, reusable component, so I decided to carve it out into a standalone project

The README is mostly notes from my Notion pages, glad you found it interesting!

n_u•1h ago

What are you building a vector database from scratch for?

novocayn•1h ago

Mostly wanted a refresher on GPU accelerated indexes and Vector DB internals. And maybe along the way, build an easy on-ramp for folks who want to understand how these work under the hood

add-sub-mul-div•2h ago

Why did you create this new account if there's already 3 existing accounts promoting your stuff and only your stuff?

novocayn•2h ago

Because running a three-account bot‑net farm is fun :D Okay, jk, please don’t mod me out.

One’s for browsing HN at work, the other’s for home, and the third one has a username I'm not too fond of.

I’ll stick to this one :) I might have some karma on the older ones, but honestly, HN is just as fun from everywhere

wolfgarbe•2h ago

Great work! Would be interesting to see how it compares to Lucene performance-wise, e.g. with a benchmark like https://github.com/quickwit-oss/search-benchmark-game

novocayn•1h ago

Thanks! Honestly, given it's hacked together in a weekend not sure it’d measure up to Lucene/Bleve in any serious way.

I intended this to be an easy on-ramp for folks who want to get a feel for how FTS engines work under the hood :)

llllm•49m ago

Not _that_ long ago Bleve was also hacked together over a few weekends.

I appreciate the technical depth of the readme, but I’m not sure it fits your easy on-ramp framing.

Keep going and keep sharing.

n_u•1h ago

Cool project!

I see you are using a positional index rather than doing bi-word matching to support positional queries.

Positional indexes can be a lot larger than non-positional. What is the ratio of the size of all documents to the size of the positional inverted index?

novocayn•1h ago

Observation is spot on. Biword matching would definitely ease this. Stealing bi-word matching for a future iteration, tysm :D

n_u•52m ago

Well bi-word matching requires that you still have all of the documents stored to verify the full phrase occurs in the document rather than just the bi-words. So it isn't always better.

For example the phrase query "United States of America" doesn't occur in the document "The United States is named after states of the North American continent. The capital of America is Washington DC". But "United States", "states of" and "of America" all appear in it.

There's a tradeoff because we still have to fetch the full document text (or some positional structure) for the filtered-down candidate documents containing all of the bi-word pairs. So it requires a second stage of disk I/O. But as I understand most practitioners assume you can get away with less IOPS vs positional index since that info only has to fetched for a much smaller filtered-down candidate set rather than for the whole posting list.

But that's why I was curious about the storage ratio of your positional index.

eudoxus•1h ago

Would love to hear how this compares to another popular go based full text search engine (with a not too dissimilar name) https://github.com/blevesearch/bleve?

novocayn•1h ago

Bleve is an absolute beast! built with <3 at Couchbase Fun fact: the folks who maintain it sit right across from me at work

Copenjin•1h ago

Did you vibe code this? A few things here and there are a bit of a giveaway imho.

fatty_patty89•42m ago

What makes you think so?

niux•27m ago

Probably the commit history.

novocayn•21m ago

Yayiee, the “cant prove it” Doakes Dexter meme, making it to HN

novocayn•25m ago

On my way to make a Dexter meme on this

When you think OP vibe-coded the project but can’t prove it yet

https://x.com/FG_Artist/status/1974267168855392371

haute_cuisine•12m ago

I put Overview section from the Readme into an AI content detector and it says 92% AI. Some comment blocks inside codebase are rated as 100% AI generated.

novocayn•9m ago

Claude: "You're absolutely right" :D

ge96•3m ago

Another possible tell (not saying this is vibe coded) is when every function is documented, almost too much comments

oldgregg•29m ago

looks great! would love to see benchmark with bleve and a lightweight vector implementation.

Xeoncross•8m ago

I really liked the README, that was a good use of AI.

If you're interested in the idea of writing a database, I recommend you checkout https://github.com/thomasjungblut/go-sstables which includes sstables, a skiplist, a recordio format and other database building blocks like a write-ahead log.

Also https://github.com/BurntSushi/fst which has a great Blog post explaining it's compression (and been ported to Go) which is really helpful for autocomplete/typeahead when recommending searches to users or doing spelling correction for search inputs.

Provides a more advanced collection of components to build your own database.

Future Data Systems Seminar Series – Fall 2025

The Prime Minister who tried to have a life outside the office

Show HN: Evals pass, agents fail." A/B test agents with Raindrop Experiments

Sub-agents in Claude Code: I tried them

Azure Portal Outage

German conservatives block Chat Control

Central bank says what the Federal Reserve won't

AI models that lie, cheat and plot murder: how dangerous are LLMs really?

New York AG James, a Trump foe, indicted for bank

All in on MatMul? Don't Put All Your Tensors in One Basket

Show HN: Browser extension to analyze my son's Math Academy data

Why "Market Bullets" is the only pre-market newsletter I still read

Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC

US Launches Financial Rescue of Argentina, Treasury Buys Pesos

Data quantity doesn't matter when poisoning an LLM

Leveraging the Zeigarnik Effect

Groundbreaking Tech Inside Meta's $800 AR Glasses but Don't Count on Fixing Them

Dominion Voting sold to company run by ex-GOP election official

Big Brother has gone full Orwell

Show HN: Engin – a modular application framework for Python

Parents Never Bought You That Power Wheels? Here's a Cooler One for $10k

This was the fix to the iPhone Antennagate in 2010. 20 bytes

Show HN: Oneseal – Secrets, configs, and platform outputs as code

Show HN: Summeze – Turn videos into editable LaTeX summaries in seconds

Magnolia: Interactive Shell Navigation and History

Tick byte makes people vegan

Slave-Making Ant

City of New York vs. Meta, Google, TikTok et al. (S.D.N.Y. 1:25-CV-08332) [pdf]

MCP plugins became enterprise security's biggest blind spot

An email marketing platform that deletes accounts without sending email notices

Future Data Systems Seminar Series – Fall 2025

The Prime Minister who tried to have a life outside the office

Show HN: Evals pass, agents fail." A/B test agents with Raindrop Experiments

Sub-agents in Claude Code: I tried them

Azure Portal Outage

German conservatives block Chat Control

Central bank says what the Federal Reserve won't

AI models that lie, cheat and plot murder: how dangerous are LLMs really?

New York AG James, a Trump foe, indicted for bank

All in on MatMul? Don't Put All Your Tensors in One Basket

Show HN: Browser extension to analyze my son's Math Academy data

Why "Market Bullets" is the only pre-market newsletter I still read

Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC

US Launches Financial Rescue of Argentina, Treasury Buys Pesos

Data quantity doesn't matter when poisoning an LLM

Leveraging the Zeigarnik Effect

Groundbreaking Tech Inside Meta's $800 AR Glasses but Don't Count on Fixing Them

Dominion Voting sold to company run by ex-GOP election official

Big Brother has gone full Orwell

Show HN: Engin – a modular application framework for Python

Parents Never Bought You That Power Wheels? Here's a Cooler One for $10k

This was the fix to the iPhone Antennagate in 2010. 20 bytes

Show HN: Oneseal – Secrets, configs, and platform outputs as code

Show HN: Summeze – Turn videos into editable LaTeX summaries in seconds

Magnolia: Interactive Shell Navigation and History

Tick byte makes people vegan

Slave-Making Ant

City of New York vs. Meta, Google, TikTok et al. (S.D.N.Y. 1:25-CV-08332) [pdf]

MCP plugins became enterprise security's biggest blind spot

An email marketing platform that deletes accounts without sending email notices

Show HN: I wrote a full text search engine in Go

Comments