Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

https://github.com/pheonix-delta/axiom-voice-agent

2•shubham-coder•6h ago

I built a Voice agent platform my drobotics lab of my university..which is already being cloned by 330+ people within 12hrs .. I am a first year cse student and so I tried to figure out a way to actually run everything on my laptop and working on it currently to completely transform to edge ai voice assistants for the robotics and 100% private and local control of robotics related project of my lab..

The intersting features are : 1> I used json rag with real time embeddings so that for a few specs and info we don't need to set a whole pipeline..

I have already built " Hierarchical Agentic Rag with Hybrid Search ( knowledge graph + vector search) u can view that on my profile ...

I am actively trying to share as much as possible related to it but that project is actually linked with a huge set of files it's 693k points of data with pgvector+ postgress .. give a visit u will get more idea from that

2> I had tried every sort of whisper models.. faster whisper .. turbo or anything u can u think of ..even with a self c++ engine .. but that model itself was hallucintion prone architecture..

Then I moved to parakeet tdt with silero vad and not parakeet rnn for better speed and optimisations .. repo has further details ..

3> fine tuned a dataset from anthropic rlhf through space and glinner and convert that to a perfect training dataset of the Lama 3.2 3b ..

I will attach the dataset of u need or will upload that to hugging face if u want to use it for yourself..

4> attached phonetic correctors for both output from parakeet and llama for better tts working .

5> I used setfit to route the queries and confidence based semantic search for faster and accurate as much as possible

6> I am using sherpa onxx and qued the tts and stt and everything but as a experimentation I have also achieved llama generating respond and kokora processing as a batch with a full nyc working as well and everything on my laptop...

7> along with these my frontend also relies on heavy three.js and 3d view files but I had applied optimisations there which works perfectly with everything together on the laptop..

8> I also applied glued interaction to the llm model .. implemented FIFO with 5 interactions and storing them for future fine tuning and phonetic words additions.

Pls give a visit it and let me know if I should learn something new ..

One kind note : as a enthusiast spending so much energy on these things things .. I have taken help from ai for the md files and expansion or explanations in the codes for better help of every single person...

Comments

shubham-coder•4h ago

I honestly didn't expect this to get much attention, but we hit ~330+ clones in the last 24 hours.

That unexpected load actually helped me find a few bugs in the setup script (specifically with the pgvector config on Windows), which I've just patched. If anyone else hits memory issues on 4GB cards, let me know—I'm actively optimizing the quantization now

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Browser based state machine simulator and visualizer

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: ARM64 Android Dev Kit

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: Horizons – OSS agent execution engine

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Slop News – HN front page now, but it's all slop

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Browser based state machine simulator and visualizer

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: ARM64 Android Dev Kit

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: Horizons – OSS agent execution engine

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Comments