frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Implementing HNSW (Hierarchical Navigable Small World) Vector Search in PHP

https://centamori.com/index.php?slug=hierarchical-navigable-small-world-hnsw-php&lang=en
35•centamiv•2h ago

Comments

centamiv•2h ago
OP here. I wrote this implementation to deeply understand the mechanics behind HNSW (layers, entry points, neighbor selection) without relying on external libraries. While PHP isn't the typical choice for vector search engines, I found it surprisingly capable for this use case, especially with JIT enabled on PHP 8.x. It serves as a drop-in solution for PHP monoliths that need semantic search features without adding the complexity of a separate service like Qdrant or Pinecone. If you want to jump straight to the code, the open-source repo is here: https://github.com/centamiv/vektor Happy to answer any questions about the implementation details!
hu3•36m ago
Great writeup. Thanks for talking the time to organise and share.

It's tempting to use this in projects that use PHP.

Is it useable with a corpus of like 1.000 3kb markdown files? And 10.000 files?

Can I also index PHP files so that searches include function and class names? Perhaps comments?

How much ram and disk memory we would be talking about?

And the speed?

My first goal would to index a PHP project and its documentation so that an LLM agent could perform semantic search using my MCP tool.

centamiv•24m ago
I tested it myself with 1k documents (about 1.5M vectors) and performance is solid (a few milliseconds per search). I haven't run more aggressive benchmarks yet.

Since it only stores the vectors, the actual size of the Markdown document is irrelevant; you just need to handle the embedding and chunking phases carefully (you can use a parser to extract code snippets).

RAM isn't an issue because I aim for random data access as much as possible. This avoids saturating PHP, since it wasn't exactly built for this kind of workload.

I'm glad you found the article and repo useful! If you use it and run into any problems, feel free to open an issue on GitHub.

Random09•20m ago
The only small thing you forgot to mention - it requires use of AI. Open Ai to be specific. I've got baited.
centamiv•13m ago
Apologies if it felt that way! I used OpenAI in the examples just because it's the quickest 'Hello World' for embeddings right now, but the library itself is completely agnostic.

HNSW is just the indexing algorithm. It doesn't care where the vectors come from. You can generate them using Ollama (locally) HuggingFace, Gemini...

As long as you feed it an array of floats, it will index it. The dependency on OpenAI is purely in the example code, not in the engine logic.

fithisux•43m ago
It makes perfect sense to implement it in a high level language that allows understandability.

Very good contribution.

centamiv•39m ago
Thank you! That was exactly the goal. Modern PHP turned out to be surprisingly expressive for this kind of 'executable pseudocode'. Glad you appreciated it!
rvnx•9m ago
Cool blog post, smart guy, very thoughtful and not a copy-paste of Python code like 99% of folks. Nice to see
centamiv•2m ago
Thank you, really appreciate that

Show HN: Tiny Diffusion – Minimal diffusion LM in 364 lines

https://github.com/nathan-barry/tiny-diffusion
1•nathan-barry•1m ago•0 comments

Show HN: VLLora MCP – Debug agent traces and let your coding agent fix the code

https://vllora.dev/blog/introducing-vllora-mcp-server/
1•mrun1729•3m ago•0 comments

The true year of Linux

https://old.reddit.com/r/Fedora/comments/1q0lq7n/the_true_year_of_linux/
1•sipofwater•4m ago•0 comments

Exceptionally Gifted Children

https://www.educationprogress.org/p/exceptionally-gifted-children
1•stared•8m ago•0 comments

The Post-American Internet

https://pluralistic.net/2026/01/01/39c3/#the-new-coalition
2•HotGarbage•11m ago•0 comments

Using the Corne Split Keyboard for Half a Year

https://rugu.dev/en/blog/corne/
1•birdculture•12m ago•0 comments

You Are Not Dumb, You Just Lack the Prerequisites

https://lelouch.dev/blog/you-are-probably-not-dumb/
2•sebg•16m ago•0 comments

MakerHub – an on-the-go companion for focus, wellbeing, creativity

https://www.makerhub.app
1•tanyaZai•16m ago•0 comments

NameCheap revokes a domain dedicated to hosting footage from Gaza

https://twitter.com/receipts_lol/status/2006732606164152651
2•rnmmrnm•17m ago•1 comments

Show HN: I built a tool that turns prompts into full-stack web and mobile apps

1•genvibe•18m ago•0 comments

Understanding DuckLake's Metadata Tables

https://thefulldatastack.substack.com/p/understanding-ducklakes-metadata
1•nhemerson•19m ago•0 comments

Show HN: HumanMark – open-source AI content detection (self-hosted, offline)

https://github.com/vinpatel/humanmark
1•mindtrades•19m ago•0 comments

Show HN: Vect AI – An execution-focused AI platform for marketing automation

https://www.google.com/search?q=site%3Avect.pro&oq=&gs_lcrp=EgZjaHJvbWUqCQgAECMYJxjqAjIJCAAQIxgnG...
1•afrazullal•20m ago•0 comments

Traffic Analytics for WordPress Forms

https://snapforms.tech/articles/traffic-analytics-for-wordpress-forms/
2•spectreflow•20m ago•1 comments

Dodging sketchy browser extensions in 2026

https://wardblog.substack.com/p/dodging-sketchy-browser-extensions
1•bennydog224•21m ago•1 comments

How plants create mitraphylline, a natural compound linked to anticancer effects

https://www.sciencedaily.com/releases/2025/12/251227082728.htm
1•QueensGambit•21m ago•0 comments

Experimental Nvidia Driver (Turing+) for Haiku OS

https://discuss.haiku-os.org/t/haiku-nvidia-porting-nvidia-driver-for-turing-gpus/16520?page=8
3•Tiberium•24m ago•1 comments

Show HN: Cistern, a macOS menu bar tool that shows CircleCI builds

https://github.com/atombender/cistern
1•atombender•26m ago•0 comments

MCP Chat Studio – A Postman-Like UI for Testing MCP Servers

https://github.com/JoeCastrom/mcp-chat-studio
1•JoeCastrom•27m ago•1 comments

The man taking over the Large Hadron Collider

https://www.theguardian.com/science/2025/dec/31/large-hadron-collider-head-of-cern-mark-thomson
2•naves•32m ago•0 comments

Cameras and Lenses

https://ciechanow.ski/cameras-and-lenses/
6•sebg•34m ago•0 comments

You Will Be OK

https://www.lesswrong.com/posts/fwQburGDyGoSSweT9/you-will-be-ok
2•sebg•35m ago•0 comments

No iPhone 18 Launch This Year

https://www.macrumors.com/2026/01/01/no-iphone-18-launch-this-year/
2•mfiguiere•36m ago•0 comments

Ask HN: Which cloud service to use for overpass API?

1•nasaeclipse•37m ago•0 comments

Meta enables chronological timelines in the Nederlands after court ruling

https://nltimes.nl/2026/01/01/meta-adjusts-facebook-instagram-timelines-court-ruling-changes-missing
3•giuliomagnifico•37m ago•0 comments

Ex_acv_fast review: "water" fasted 6 days, new record

https://www.exfatloss.com/p/ex_acv_fast-review-water-fasted-6
1•paulpauper•38m ago•0 comments

Taxation in a Strong AI World

https://marginalrevolution.com/marginalrevolution/2026/01/taxation-in-a-strong-ai-world.html
1•paulpauper•38m ago•0 comments

Decision Trees vs. Boosting: The One Expert vs. the Committee

https://mateolafalce.github.io/2026/Decision%20Trees%20vs.%20Boosting_%20The%20One%20Expert%20vs....
1•lafalce•39m ago•0 comments

Autism hasn't increased

https://marginalrevolution.com/marginalrevolution/2026/01/autism-hasnt-increased.html
23•paulpauper•39m ago•6 comments

Layoutz – Simple, beautiful CLI output for Haskell

https://flora.pm/packages/@hackage/layoutz
1•PaulHoule•40m ago•0 comments