frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ran a 5k queries on 50k documents to understand the file vs. vector RAG debate

2•gdad•1h ago
title: Ran a 5k queries on 50k documents to understand the file vs vector rag debate

Was curious about the noise on file-based RAG as opposed to vector-RAG. So benchmarked Tantivy vs. Chroma to quantify the trade-offs in modern RAG pipelines. I used 5 datasets: CodeXGlue, MS MARCO, SQuAD, HotpotQA, and SciQ.

- Indexing/Embedding was 76x slower for Vectors ($O(s)$ vs $O(ms)$). Query latency was 11x slower

- In SciQ, keyword search outperformed vectors by 32% (MRR). Terms like "Mitochondria" are specific keys, not semantics. Vectors tended to drift toward semantically similar but factually incorrect answers.

- In HotpotQA, I noticed a trend where vectors find the "answer" document but miss the "bridge" document because it isn't semantically similar to the prompt. Finding the right document is not the same as having enough context to prove the answer.

The Data (MRR):

| Dataset | Domain | Keyword | Vector | Winner |

| :--- | :--- | :--- | :--- | :--- |

| CodeXGlue | Code | 0.29 | 0.91 | Vector (+213%) |

| SciQ | Science | 0.81 | 0.61 | Keyword (+32%) |

| HotpotQA | Reasoning | 0.55 | 0.50 | Keyword (+10%) |

Curious to learn if others have similar observations or views.

How to make spamming us uncomfortable for LinkedIn and friends

1•zx8080•58s ago•0 comments

How Humans Became Microplastic

https://unherd.com/2024/11/how-humans-became-microplastic/
1•voxleone•1m ago•0 comments

Pi-Mono Coding Agent

https://github.com/badlogic/pi-mono
1•tin7in•1m ago•0 comments

Aura Farm Prompt – Free Aura Farm Prompts for ChatGPT, Gemini and AI Art

https://aurafarmprompt.org
1•john_mayor•2m ago•0 comments

Sadiq Khan to urge ministers to act over 'colossal' impact of AI on London jobs

https://www.theguardian.com/politics/2026/jan/15/sadiq-khan-to-urge-ministers-to-act-over-colossa...
2•veltas•9m ago•0 comments

We analyzed 3K K8s clusters: one config line costs millions

https://wozz.io/blog/kubernetes-memory-overprovisioning-study-2026
1•wozzio•9m ago•0 comments

Solving the Electroporation Bottleneck

https://press.asimov.com/articles/electroporation
1•mailyk•10m ago•0 comments

SRT Server Live Streaming Software

https://www.red5.net/srt-streaming/
1•mondainx•14m ago•0 comments

Cardputer uLisp Machine (2024)

http://www.ulisp.com/show?52G4
1•tosh•15m ago•0 comments

People who built Wikipedia, technically (2021)

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2021-01-31/Technology_report
1•todsacerdoti•17m ago•0 comments

X says Grok now blocks undress photo edits where theyre illegal

https://apnews.com/article/grok-musk-deepfake-nudification-abuse-f0d62ec68576dcfe203cada2424bd107
1•criexe•17m ago•0 comments

Show HN: Setflow – Create harmonically mixed DJ sets from your Rekordbox library

https://www.setflow.app/
1•stuartevansuk•20m ago•0 comments

Banning Things for Other People Is Easy

https://dogdogfish.com/blog/2026/01/14/banning-things-for-other-people/
2•matthewsharpe3•25m ago•0 comments

An Interview with United CEO Scott Kirby About Tech Transformation

https://stratechery.com/2026/an-interview-with-united-ceo-scott-kirby-about-tech-transformation/
1•feross•26m ago•1 comments

Show HN: Leaftide – Garden planner with climate-aware scheduling (Django/Htmx)

https://leaftide.com/
1•JoaoW•26m ago•0 comments

Nearly 5M Accounts Removed Under Australia's New Social Media Ban

https://www.nytimes.com/2026/01/15/world/australia/social-media-ban-australia.html
1•donohoe•28m ago•1 comments

Adding Dependency Injection to PdfDocument in iText

https://kb.itextpdf.com/itext/adding-dependency-injection-to-the-pdfdocument-cla
3•whizzx•28m ago•0 comments

Three from the same family killed in Iran

https://www.iranintl.com/en/202601155532
1•ukblewis•30m ago•0 comments

Chinese Universities Surge in Global Rankings as U.S. Schools Slip

https://www.nytimes.com/2026/01/15/us/harvard-global-ranking-chinese-universities-trump-cuts.html
3•mynti•31m ago•2 comments

Show HN: Merge Excel Files

https://www.mergeexcelfiles.online/
1•foxiren•33m ago•0 comments

I Failed Your Coding Test. I'd Fail It Again

https://medium.com/@danielbentes/i-failed-a-coding-interview-heres-why-i-m-not-going-back-afdc4d9...
2•fornbogi•35m ago•2 comments

Podcast: Chinese EVs are transforming the auto industry

https://podcasts.apple.com/us/podcast/chinas-ev-boom-with-tu-le/id1864408706?i=1000745265483
2•hunglee2•42m ago•1 comments

Steamosaic: Generate a mosaic of your Steam account

https://steamosaic.com/
1•delduca•45m ago•0 comments

LongCat ZigZag Attention

https://www.k-a.in/LoZA.html
2•everlier•46m ago•1 comments

Full AI Music and Video

https://www.youtube.com/watch?v=YxGJLbbWZdc
1•spotlayn•47m ago•0 comments

Notepad.pub

https://notepad.pub
2•jbrooksuk•52m ago•0 comments

Hand-stitched Indian ship arrives in Oman to rousing welcome

https://www.bbc.com/news/articles/cwyn15110gvo
2•breve•52m ago•0 comments

Show HN: An AI assistant you can text via Apple satellite messaging

https://olly.bot/travel.html
1•mmoustafa•55m ago•1 comments

Commodore 64 Ultimate review (2026)

https://www.theguardian.com/games/2026/jan/15/commodore-64-ultimate-review-computer
5•limbicsystem•56m ago•0 comments

Why can't I bank a solar generated summer kWh to use in the winter?

https://www.philiprsteele.co.uk/post/why-can-t-i-bank-a-solar-generated-summer-kwh-to-use-in-the-...
2•domh•56m ago•1 comments