frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

J.P. Morgan's OpenAI loan is strange

https://marketunpack.com/j-p-morgans-openai-loan-is-strange/
58•vrnvu•24m ago•21 comments

Claude Code on the Web

https://www.anthropic.com/news/claude-code-on-the-web
110•adocomplete•1h ago•48 comments

Production RAG: what I learned from processing 5M+ documents

https://blog.abdellatif.io/production-rag-processing-5m-documents
202•tifa2up•4h ago•53 comments

BERT is just a single text diffusion step

https://nathan.rs/posts/roberta-diffusion/
278•nathan-barry•5h ago•66 comments

AWS Multiple Services Down in us-east-1

https://health.aws.amazon.com/health/status?ts=20251020
1264•kondro•12h ago•1616 comments

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system

https://www.tomshardware.com/tech-industry/semiconductors/alibaba-says-new-pooling-system-cut-nvi...
240•hd4•7h ago•170 comments

Space Elevator

https://neal.fun/space-elevator/
1329•kaonwarb•15h ago•296 comments

DeepSeek OCR

https://github.com/deepseek-ai/DeepSeek-OCR
794•pierre•13h ago•206 comments

AWS outage shows internet users 'at mercy' of too few providers, experts say

https://www.theguardian.com/technology/2025/oct/20/amazon-web-services-aws-outage-hits-dozens-web...
176•evolve2k•2h ago•108 comments

TernFS – an exabyte scale, multi-region distributed filesystem

https://www.xtxmarkets.com/tech/2025-ternfs/#posix-shaped
51•kirlev•2h ago•4 comments

x86-64 Playground – An online assembly editor and GDB-like debugger

https://x64.halb.it/
35•modinfo•2h ago•2 comments

Dutch spy services have restricted intelligence-sharing with the United States

https://intelnews.org/2025/10/20/01-3416/
165•Refreeze5224•2h ago•75 comments

How to stop Linux threads cleanly

https://mazzo.li/posts/stopping-linux-threads.html
124•signa11•5d ago•47 comments

Optical diffraction patterns made with a MOPA laser engraving machine [video]

https://www.youtube.com/watch?v=RsGHr7dXLuI
76•emsign•6d ago•10 comments

The Great Horse Manure Crisis of 1894: predictions of 9 feet of manure in cities

https://en.wikipedia.org/wiki/Great_horse_manure_crisis_of_1894
26•SweetSoftPillow•6d ago•29 comments

Chess grandmaster Daniel Naroditsky has passed away

https://old.reddit.com/r/chess/comments/1obnbmu/grandmaster_daniel_naroditsky_has_passed_away/
191•ntnbr•2h ago•50 comments

Servo v0.0.1

https://github.com/servo/servo
386•undeveloper•7h ago•109 comments

The longest baseball game took 33 innings to win

https://www.mlb.com/news/the-longest-professional-baseball-game-ever-played
7•mooreds•5d ago•9 comments

Docker Systems Status: Full Service Disruption

https://www.dockerstatus.com/pages/incident/533c6539221ae15e3f000031/68f5e1c741c825463df7486c
303•l2dy•12h ago•120 comments

Entire Linux Network stack diagram (2024)

https://zenodo.org/records/14179366
527•hhutw•16h ago•45 comments

Automate all the things with Swift Subprocess

https://blog.jacobstechtavern.com/p/swift-subprocess
25•jakey_bakey•1w ago•3 comments

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

https://github.com/lackeyjb/playwright-skill
108•syntax-sherlock•8h ago•34 comments

Peanut Allergies Have Plummeted in Children

https://www.nytimes.com/2025/10/20/well/peanut-allergy-drop.html
48•JumpCrisscross•1h ago•35 comments

Qt Group Buys IAR Systems Group

https://www.qt.io/stock/qt-completes-the-recommended-public-cash-offer-to-the-shareholders-of-iar...
53•shrimp-chimp•7h ago•31 comments

Modeling Others' Minds as Code

https://arxiv.org/abs/2510.01272
54•PaulHoule•6h ago•38 comments

iOS 26.1 lets users control Liquid Glass transparency

https://www.macrumors.com/2025/10/20/ios-26-1-liquid-glass-toggle/
9•dabinat•23m ago•0 comments

Pointer Pointer (2012)

https://pointerpointer.com
212•surprisetalk•1w ago•27 comments

Speaking of Amazon, here's a fresh post from an engineer who just quit

https://nekrolm.github.io/blog.html
16•souvlakee•2h ago•4 comments

The Peach meme: On CRTs, pixels and signal quality (again)

https://www.datagubbe.se/crt2/
73•zdw•1w ago•32 comments

Fractal Imaginary Cubes

https://www.i.h.kyoto-u.ac.jp/users/tsuiki/icube/fractal/index-e.html
49•strstr•1w ago•7 comments
Open in hackernews

Production RAG: what I learned from processing 5M+ documents

https://blog.abdellatif.io/production-rag-processing-5m-documents
198•tifa2up•4h ago

Comments

manishsharan•3h ago
Thanks for sharing. TIL about rerankers.

Chunking strategy is a big issue. I found acceptable results by shoving large texts to to gemini flash and have it summarize and extract chunks instead of whatever text splitter I tried. I use the method published by Anthropic https://www.anthropic.com/engineering/contextual-retrieval i.e. include full summary along with chunks for each embedding.

I also created a tool to enable the LLM to do vector search on its own .

I do not use Langchain or python.. I use Clojure+ LLMs' REST APIs.

esafak•3h ago
Have you measured your latency, and how sensitive are you to it?
manishsharan•2h ago
>> Have you measured your latency, and how sensitive are you to it?

Not sensitive to latency at all. My users would rather have well researched answers than poor answers.

Also, I use batch mode APIs for chunking .. it is so much cheaper.

jascha_eng•3h ago
I have a RAG setup that doesn't work on documents but other data points that we use for generation (the original data is call recordings but it is heavily processed to just a few text chunks). Instead of a reranker model we do vector search and then simply ask GPT-5 in an extra call which of the results is the most relevant to the input question. Is there an advantage to actual reranker models rather than using a generic LLM?
tifa2up•3h ago
OP here. rerankers are finetuned small models, they're cheap and very fast compared to an additional GPT-5 call.
jascha_eng•2h ago
It's an async process in my case (custom deep research like) so speed is not that critical
esafak•3h ago
They say the chunker is the most important part, but theirs looks rudimentary: https://github.com/agentset-ai/agentset/blob/main/packages/e...

That is, there is nothing here that one could not easily write without a library.

tifa2up•3h ago
OP here. We've been working on agentset.ai full-time for 2 months. The product now gets you something working quite well out of the box. Better than most people with no experience in RAG (I'd say so with confidence).

Ingestion + Agentic Search are two areas that we're focused on in the short term.

teraflop•2h ago
I'm not sure there is a chunker in this repo. The file you linked certainly doesn't seem to perform any chunking, it just defines a data model for chunks.

The only place I see that actually operates on chunks does so by fetching them from Redis, and AFAICT nothing in the repo actually writes to Redis, so I assume the chunker is elsewhere.

https://github.com/agentset-ai/agentset/blob/main/packages/j...

alexchantavy•3h ago
> What moved the needle: Query Generation

What does query generation mean in this context, it’s probably not SQL queries right?

daemonologist•3h ago
It's described in the remainder of the point - they use an LLM to generate additional search queries, either rephrasings of the user's query or bringing additional context from the chat history.
goleary•2h ago
Here's an interesting read on the evolution beyond RAG: https://www.nicolasbustamante.com/p/the-rag-obituary-killed-...

One of the key features in Claude Code is "Agentic Search" aka using (rip)grep/ls to search a codebase without any of the overhead of RAG.

Sounds like even RAG approaches use a similar approach (Query Generation).

smokel•1h ago
The article raises several interesting points, but I find its claim that Claude Code relies primarily on grep for code search unconvincing. It's clear that Claude Code can parse and reason about code structure, employing techniques far beyond simple regex matching. Since this assumption underpins much of the article's argument, it makes me question the overall reliability of its conclusions a bit.

Or am I completely misunderstanding how Claude Code works?

andreasgl•2h ago
I think they mean query expansion: https://en.wikipedia.org/wiki/Query_expansion
nextworddev•3h ago
Exactly what kind of processing was done? Your pipeline is a function of the use case, lest you overengineer…
js98•3h ago
Similar writeup I did about 1.5 years ago for processing millions of (technical) pages for RAG. Lots has stayed the same it seems

https://jakobs.dev/learnings-ingesting-millions-pages-rag-az...

winstonp•1h ago
I also built a RAG system about a year back for technical search, everything seems the same!
daemonologist•3h ago
I concur:

The big LLM-based rerankers (e.g. Qwen3-reranker) are what you always wanted your cross-encoder to be, and I highly recommend giving them a try. Unfortunately they're also quite computationally expensive.

Your metadata/tabular data often contains basic facts that a human takes for granted, but which aren't repeated in every text chunk - injecting it can help a lot in making the end model seem less clueless.

The point about queries that don't work with simple RAG (like "summarize the most recent twenty documents") is very important to keep in mind. We made our UI very search-oriented and deemphasized the chat, to try to communicate to users that search is what's happening under the hood - the model only sees what you see.

thethimble•2h ago
I wish there was more info on the article about actual customer usage - particularly whether it improved process efficiency. It's great to focus on the technical aspects of system optimization but unless this translates to tangible business value it's all just hype.
agentcoops•1h ago
I agree completely with your point, especially the difficulty of developing the user's mental model for what's going on with context and the need to move away from chat UX. It's interesting that there are still few public examples of non-chat UIs that make context management explicit. It's possible that the big names tried this and decided it wasn't worth it -- but from comments here it seems like everyone that has built a production RAG system has come to the opposite conclusion. I'm guessing the real reason is otherwise: likely for the consumer apps controlling context (especially for free users) and inference time is one of the main levers for cost management at scale. Private RAGs, on the other hand, are more concerned with maximizing result quality and minimizing time spent by employee on a particular problem with cost per query much less of a concern --- that's been my experience at least.
leetharris•2h ago
Embedding based RAG will always just be OK at best. It is useful for little parts of a chain or tech demos, but in real life use it will always falter.
sgt•2h ago
What do you recommend? Query generation?
esafak•2h ago
Compared with what?
leetharris•50m ago
Full text agentic retrieval. Instead of cosine similarity on vectors, parsing metadata through an agentic loop.

To give a real world example, the way Claude Code works versus how Cursor's embedded database works.

charcircuit•2h ago
Most of my ChatGPT queries use RAG (based on the query ChatGPT will decide if it needs to search the web) to get up to date information about the world. In reality life it's effective and it's why every large provider supports it.
underlines•2h ago
rag will be pronounced differently ad again and again. it has its use cases. we moved to agentic search having rag as a tool while other retrieval strategies we added use real time search in the sources. often skipping ingested and chunked soueces. large changes next windows allow for putting almost whole documents into one request.
phillipcarter•2h ago
Not necessarily? It's been the basis of one of the major ways people would query their data since 2023 on a product I worked on: https://www.honeycomb.io/blog/introducing-query-assistant

The difference is this feature explicitly isn't designed to do a whole lot, which is still the best way to build most LLM-based products and sandwich it between non-LLM stuff.

mediaman•2h ago
The point about synthetic query generation is good. We found users had very poor queries, so we initially had the LLM generate synthetic queries. But then we found that the results could vary widely based on the specific synthetic query it generated, so we had it create three variants (all in one LLM call, so that you can prompt it to generate a wide variety, instead of getting three very similar ones back), do parallel search, and then use reciprocal rank fusion to combine the list into a set of broadly strong performers. For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

This, combined with a subsequent reranker, basically eliminated any of our issues on search.

avereveard•1h ago
final tip is to also feed the interpretation of the user search to the user on the other side, so he can check if the llm understanding was correct.
deepsquirrelnet•1h ago
> For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

One thing I’m always curious about is if you could simplify this and get good/better results using SPLADE. The v3 models look really good and seem to provide a good balance of semantic and lexical retrieval.

siva7•1h ago
Boy, that should not be the concern of the end user (developer) but those implementing RAG solutions as a service at Amazon, Microsoft, Openai and so on.
n_u•2h ago
> Reranking: the highest value 5 lines of code you'll add. The chunk ranking shifted a lot. More than you'd expect. Reranking can many times make up for a bad setup if you pass in enough chunks. We found the ideal reranker set-up to be 50 chunk input -> 15 output.

What is re-ranking in the context of RAG? Why not just show the code if it’s only 5 lines?

tifa2up•2h ago
OP. Reranking is a specialized LLM that takes the user query, and a list of candidate results, then re-sets the order based on which ones are more relevant to the query.

Here's sample code: https://docs.cohere.com/reference/rerank

yahoozoo•1h ago
What is the difference between reranking versus generating text embeddings and comparing with cosine similarity?
tifa2up•1h ago
text similarity finds items that closely match. Reranking my select items that are less semantically "similar" but are more relevant to the query.
derefr•5m ago
My understanding:

If you generate embeddings (of the query, and of the candidate documents) and compare them for similarity, you're essentially asking whether the documents "look like the question."

If you get an LLM to evaluate how well each candidate document follows from the query, you're asking whether the documents "look like an answer to the question."

An ideal candidate chunk/document from a cosine-similarity perspective, would be one that perfectly restates what the user said — whether or not that document actually helps the user. Which can be made to work, if you're e.g. indexing a knowledge base where every KB document is SEO-optimized to embed all pertinent questions a user might have that "should lead" to that KB document. But for such documents, even dumb fulltext search on the user's query will surface them; LLMs aren't gaining you any ground here.

An ideal candidate chunk/document from a re-ranking LLM's perspective, would be one that an instruction-following LLM (with the whole corpus in its context) would spit out as a response, if it were prompted with the user's query. E.g. if the user asks a question that could be answered with data, a document containing that data would rank highly. That's probably what you want!

383toast•2h ago
They should've tested other embedding models, there are better ones than openai's (and cheaper)
prettyblocks•2h ago
Which do you suggest?
roze_sha•2h ago
https://huggingface.co/spaces/mteb/leaderboard
383toast•1h ago
yep
leftnode•52m ago
The Qwen3 600M and 4B embedding models are near state of the art and aren't too computationally intensive.
hatmanstack•2h ago
Not here to schlep for AWS but S3 Vectors is hands down the SOTA here. That combined with a Bedrock Knowledge Base to handle Discovery/Rebalance tasks makes for the simplest implementation on the Market.

Once Bedrock KB backed by S3 Vectors is released from Beta it'll eat everybody's lunch.

arcanemachiner•1h ago
Shill, not schlep.

I'm correcting you less out of pedantry, and more because I find the correct term to be funny.

hatmanstack•1h ago
I feel like I'm schelpin' through these comments, it's all mishigas
esafak•1h ago
You feel like a schlemiel, perhaps?
hatmanstack•1h ago
more a schlimazel, Charles Schultzie, Lucy's everywhere
pietz•1h ago
I find it interesting that so many services and tools were investigated except for embedding models. I would have thought that's one of the biggest levers.
Trias11•1h ago
they just grabbed the better one (3-large) right off the bat. 6x cost to 3-small, but it's still tiny.
bityard•1h ago
I must be missing something, this says it can be self-hosted. But the first page of the self-hosting docs say you need accounts with no less than 6 (!) other third-party hosted services.

We have very different ideas about the meaning of self-hosted.

goodev•52m ago
I consider this to be good open source and I'm a happy user of their OSS offering. Want no hosted dependencies? Then go write it all in Rust.
dgfitz•8m ago
I’ve never worked in such a space where the deployed environment had unfettered internet access, no access at all actually.

I’ve probably missed a huge wave of programming technology because of this, and I’ve figured out a way to make it work for a consistent paycheck over these past 20 years.

I’m also not a great example, I think I’ve watched 7 whole hours of YouTube videos ever, and those were all for car repair help.

I shy away from tech that needs to be online/connected/whatever.

dcreater•59m ago
do you still use langchain/llamaindex for other agents/AI use cases?
JudoJJ•17m ago
Really solid write-up — it’s rare to see someone break down the real tradeoffs of scaling RAG beyond the toy examples. The bit about reranking and chunking actually saving more than fancy LLM tricks hits home to me.