That is, there is nothing here that one could not easily write without a library.
Ingestion + Agentic Search are two areas that we're focused on in the short term.
The only place I see that actually operates on chunks does so by fetching them from Redis, and AFAICT nothing in the repo actually writes to Redis, so I assume the chunker is elsewhere.
https://github.com/agentset-ai/agentset/blob/main/packages/j...
What does query generation mean in this context, it’s probably not SQL queries right?
One of the key features in Claude Code is "Agentic Search" aka using (rip)grep/ls to search a codebase without any of the overhead of RAG.
Sounds like even RAG approaches use a similar approach (Query Generation).
https://jakobs.dev/learnings-ingesting-millions-pages-rag-az...
The big LLM-based rerankers (e.g. Qwen3-reranker) are what you always wanted your cross-encoder to be, and I highly recommend giving them a try. Unfortunately they're also quite computationally expensive.
Your metadata/tabular data often contains basic facts that a human takes for granted, but which aren't repeated in every text chunk - injecting it can help a lot in making the end model seem less clueless.
The point about queries that don't work with simple RAG (like "summarize the most recent twenty documents") is very important to keep in mind. We made our UI very search-oriented and deemphasized the chat, to try to communicate to users that search is what's happening under the hood - the model only sees what you see.
The difference is this feature explicitly isn't designed to do a whole lot, which is still the best way to build most LLM-based products and sandwich it between non-LLM stuff.
This, combined with a subsequent reranker, basically eliminated any of our issues on search.
One thing I’m always curious about is if you could simplify this and get good/better results using SPLADE. The v3 models look really good and seem to provide a good balance of semantic and lexical retrieval.
What is re-ranking in the context of RAG? Why not just show the code if it’s only 5 lines?
Here's sample code: https://docs.cohere.com/reference/rerank
Once Bedrock KB backed by S3 Vectors is released from Beta it'll eat everybody's lunch.
I'm correcting you less out of pedantry, and more because I find the correct term to be funny.
manishsharan•2h ago
Chunking strategy is a big issue. I found acceptable results by shoving large texts to to gemini flash and have it summarize and extract chunks instead of whatever text splitter I tried. I use the method published by Anthropic https://www.anthropic.com/engineering/contextual-retrieval i.e. include full summary along with chunks for each embedding.
I also created a tool to enable the LLM to do vector search on its own .
I do not use Langchain or python.. I use Clojure+ LLMs' REST APIs.
esafak•1h ago
manishsharan•1h ago
Not sensitive to latency at all. My users would rather have well researched answers than poor answers.
Also, I use batch mode APIs for chunking .. it is so much cheaper.