Valori – A Python-native Vector Database I built from scratch

9•varshith17•2mo ago

I’ve been working on a project called Valori, a Python-native vector database I built from the ground up — not by reinventing every algorithm, but by wiring together efficient, well-known indexing and search techniques into a cohesive, hackable framework.

The idea came from my frustration with existing vector DBs that were either too heavy for experimentation or too opaque to modify. I wanted something simple, modular, and extensible — so I built it.

What it does:

Lets you store, index, and search high-dimensional vectors

Supports multiple indices (Flat, HNSW, IVF, LSH, Annoy)

Has memory, disk, and hybrid storage backends

Includes a full document processing pipeline (parsing, cleaning, chunking, embedding)

Offers quantization, persistence, and plugin-based extensibility

All written in Python, integrated with NumPy, and production-tested with logging and monitoring built in.

Install:

pip install valori

GitHub: https://github.com/varshith-Git/valori

PyPI: https://pypi.org/project/valori

I’d love to hear your thoughts —

What’s missing for you in current vector DBs?

If you’ve built LLM or RAG systems, what do you wish a lightweight, pure Python DB like this handled better?

Would you prefer tighter integrations (LangChain, Haystack, etc.) or a more “build-it-yourself” style?

Feedback, criticism, or collaboration ideas are all welcome. — Varshith (varshith.gudur17@gmail.com )

Comments

varshith17•2mo ago

PYPI: https://pypi.org/project/valori/

Github: https://github.com/varshith-Git/valori

https://valori-python-vector-db.lovable.app/

bendtb•2mo ago

What’s the advantage if this being in python?

steffann•2mo ago

I think the “simple, modular, and extensible” makes this interesting. And for those, it being written in Python are relevant.

varshith17•2mo ago

Exactly Python makes the whole stack composable instead of compiled shut. That’s where the fun (and flexibility) lives.

varshith17•2mo ago

The point isn’t raw speed it’s hackability. You can plug in new models or indexing layers in minutes without dropping to C++.

redskyluan•2mo ago

dude you already missed the window.

nothing is better than sqlite as a library and don't use high perforamnce as your value for a python product

varshith17•2mo ago

SQLite’s perfect if you’ve got rows and tables. Valori’s for when you’ve got embeddings and chaos.

mattfrommars•2mo ago

how much was this vibe coded? looks cool but its too much for me to digest.

where did you get the original mental model to begin building it?

varshith17•2mo ago

It’s definitely dense, but not as wild as it looks. The mental model was: take the core building blocks from FAISS and Milvus, make them composable in Python, and expose everything clearly.

The “vibe” part came from trying to make it feel like a system that could run in production, not just a toy. So yeah, it’s a little heavy, but it earned the vibe honestly.

luke-stanley•2mo ago

I am very picky, hard to place, but from a quick look at the README, I'd say the API interface on display seemed like the right level of abstraction for having to deal with the messy reality.

Since you're asking for feedback:

- perhaps some of the document type specific dependencies by optional?

- could there be LESS config surface?

- I noticed GitHub CI action has a cross.

It's good to add how to use with Astral "uv" these days, especially anything that might pull in PyTorch dependency hell, which they have mostly solved if used correctly!

Nice work!

varshith17•2mo ago

Love this kind of feedback, thank you. You nailed it on optional deps and config sprawl; I’m trimming both. CI cross is just coverage noise, and I’ll add uv setup notes it really cleans up the PyTorch mess. Glad the API felt right — that was the hardest part to get “just enough abstraction” right.

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Non AI-obsessed tech forums

Ask HN: Ideas for small ways to make the world a better place

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

AI Regex Scientist: A self-improving regex solver

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

Ask HN: Any International Job Boards for International Workers?

Tell HN: Another round of Zendesk email spam

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is Connecting via SSH Risky?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is it just me or are most businesses insane?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Kernighan on Programming

We built a serverless GPU inference platform with predictable latency

Ask HN: How Did You Validate?

Ask HN: Does a good "read it later" app exist?

Ask HN: Cheap laptop for Linux without GUI (for writing)

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Test management tools for automation heavy teams

Ask HN: OpenClaw users, what is your token spend?

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: Are "provably fair" JavaScript games trustless?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Non AI-obsessed tech forums

Ask HN: Ideas for small ways to make the world a better place

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

AI Regex Scientist: A self-improving regex solver

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

Ask HN: Any International Job Boards for International Workers?

Tell HN: Another round of Zendesk email spam

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is Connecting via SSH Risky?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is it just me or are most businesses insane?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

Kernighan on Programming

We built a serverless GPU inference platform with predictable latency

Ask HN: How Did You Validate?

Ask HN: Does a good "read it later" app exist?

Ask HN: Cheap laptop for Linux without GUI (for writing)

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Test management tools for automation heavy teams

Ask HN: OpenClaw users, what is your token spend?

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: Are "provably fair" JavaScript games trustless?

Valori – A Python-native Vector Database I built from scratch

Comments