The intersting features are : 1> I used json rag with real time embeddings so that for a few specs and info we don't need to set a whole pipeline..
I have already built " Hierarchical Agentic Rag with Hybrid Search ( knowledge graph + vector search) u can view that on my profile ...
I am actively trying to share as much as possible related to it but that project is actually linked with a huge set of files it's 693k points of data with pgvector+ postgress .. give a visit u will get more idea from that
2> I had tried every sort of whisper models.. faster whisper .. turbo or anything u can u think of ..even with a self c++ engine .. but that model itself was hallucintion prone architecture..
Then I moved to parakeet tdt with silero vad and not parakeet rnn for better speed and optimisations .. repo has further details ..
3> fine tuned a dataset from anthropic rlhf through space and glinner and convert that to a perfect training dataset of the Lama 3.2 3b ..
I will attach the dataset of u need or will upload that to hugging face if u want to use it for yourself..
4> attached phonetic correctors for both output from parakeet and llama for better tts working .
5> I used setfit to route the queries and confidence based semantic search for faster and accurate as much as possible
6> I am using sherpa onxx and qued the tts and stt and everything but as a experimentation I have also achieved llama generating respond and kokora processing as a batch with a full nyc working as well and everything on my laptop...
7> along with these my frontend also relies on heavy three.js and 3d view files but I had applied optimisations there which works perfectly with everything together on the laptop..
8> I also applied glued interaction to the llm model .. implemented FIFO with 5 interactions and storing them for future fine tuning and phonetic words additions.
Pls give a visit it and let me know if I should learn something new ..
One kind note : as a enthusiast spending so much energy on these things things .. I have taken help from ai for the md files and expansion or explanations in the codes for better help of every single person...
shubham-coder•4h ago
That unexpected load actually helped me find a few bugs in the setup script (specifically with the pgvector config on Windows), which I've just patched. If anyone else hits memory issues on 4GB cards, let me know—I'm actively optimizing the quantization now