frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: RAG-corpus-profiler – A linter for RAG datasets (dedup, PII, quality)

https://github.com/aashirpersonal/rag-corpus-profiler
1•aashirpersonal•2h ago

Comments

aashirpersonal•2h ago
Hi HN,

I’ve been building RAG systems for a while, and I noticed 90% of retrieval failures aren't due to the LLM—they're due to the data. I got tired of debugging hallucinations only to find the retriever had pulled "Page 1 of 5" headers or five duplicate versions of an old policy.

I couldn't find a simple "pandas-profiling" equivalent for unstructured text, so I built this.

It runs locally (CLI) and helps you:

Detect semantic duplicates (using all-MiniLM-L6-v2) to save vector storage costs.

Flag PII (API keys, emails) before they get indexed.

Identify "coverage gaps" by comparing user queries against your docs.

It outputs a standalone HTML report you can show to stakeholders.

Written in Python, open source (MIT). Feedback welcome!

https://github.com/aashirpersonal/rag-corpus-profiler

Alzheimer's can be reversed to achieve full neurological recovery in animals

https://case.edu/news/new-study-shows-alzheimers-disease-can-be-reversed-achieve-full-neurologica...
1•thunderbong•51s ago•0 comments

Achieving Lasting Remission for HIV

https://knowablemagazine.org/content/article/health-disease/2025/lasting-remission-hiv-with-broad...
1•PaulHoule•2m ago•0 comments

Japanese pen maker Pilot raises price of bestseller for first time in 40 years

https://www.ft.com/content/94c1e62e-f953-4f48-9572-fedba69ef5e3
2•bookofjoe•3m ago•1 comments

Using .gov Email Addresses for Age and Information Verification

https://blog.certisfy.com/2025/12/using-gov-email-addresses-for-age-and.html
1•Edmond•5m ago•0 comments

The HTML Elements Time Forgot

https://htmhell.dev/adventcalendar/2025/22/
1•todsacerdoti•7m ago•0 comments

MultiLang‑ASM – The first multilingual x86_64 assembler (10languages,reversible)

https://github.com/cyberenigma-lgtm/MultiLang-ASM
1•neuro-os•10m ago•1 comments

Is Alexa Overloaded

1•dzdt•22m ago•1 comments

Timeless Games

https://cxong.github.io/2025/12/timeless-games
3•todsacerdoti•23m ago•1 comments

Tesla Robotaxis Are Big on Wall St. but Lagging on Roads

https://www.nytimes.com/2025/12/25/business/tesla-robotaxis-austin-waymo.html
2•edward•24m ago•0 comments

Salesforce regrets firing 4000 experienced staff and replacing them with AI

https://maarthandam.com/2025/12/25/salesforce-regrets-firing-4000-staff-ai/
5•whynotmaybe•24m ago•0 comments

Ask HN: MIT grad, junior dev layoffs – watching my daughter lose faith in merit

3•MITfather•24m ago•2 comments

The Smell of Kerosene [pdf]

https://www.nasa.gov/wp-content/uploads/2021/04/88797main_kerosene.pdf
2•belter•25m ago•0 comments

Show HN: Festive Greetings – Create and share Holiday Cards with your loved ones

https://festivegreeting.vercel.app/
2•mr_o47•30m ago•0 comments

Show HN: Paste Recipe – AI-powered recipe formatter

https://www.pasterecipe.com
1•BuildItBusk•34m ago•1 comments

The Architecture of Open Source Applications

https://aosabook.org/en/index.html
2•bcye•40m ago•0 comments

Waymo is using the Honk app to pay $20-$24 to manually close doors

https://www.washingtonpost.com/technology/2025/12/25/waymo-robots-human-work/
3•sleepingreset•40m ago•0 comments

Artists revolt as X's latest feature lets users AI-edit any photo

https://piunikaweb.com/2025/12/25/x-grok-ai-edit-image-feature-artists-leaving-no-opt-out/
2•doright•41m ago•1 comments

Largest Companies by Marketcap

https://companiesmarketcap.com/
1•ksec•42m ago•1 comments

Legible Hacker News

https://adam.farkas.pro/legible-hackernews/
2•piersj225•47m ago•2 comments

Inferal Workspace Architecture: How We Work at Inferal

https://gist.github.com/yrashk/59b1cd144864bc3320a0ac0c766d4f55
1•yrashk•47m ago•1 comments

Show HN: Q-SSP – Quantum-Entropy Sanitization (7.997 bits/byte)

https://github.com/Alpha-Legents/Q-SSP
1•zenith_vortex•48m ago•1 comments

The worst fire in space history

https://www.sciencefocus.com/space/fire-in-space-jerry-linenger
1•slow_typist•48m ago•1 comments

Older Americans Quit Weight-Loss Drugs in Droves

https://www.nytimes.com/2025/12/21/health/older-people-glp1-weight.html
1•prmph•52m ago•1 comments

I wrote a 2M-character novel with ChatGPT, without an outline

1•hideroze•53m ago•2 comments

I learned to stop worrying and love AI slop

https://www.technologyreview.com/2025/12/23/1130396/how-i-learned-to-stop-worrying-and-love-ai-slop/
1•Brajeshwar•58m ago•0 comments

AI overestimates how smart people are, according to economists

https://techxplore.com/news/2025-12-ai-overestimates-smart-people-economists.html
1•Brajeshwar•58m ago•0 comments

Bee collecting honeydew produced by scale insects [video]

https://www.youtube.com/watch?v=-4lijMoA_3M
1•joebig•59m ago•0 comments

Asahi Linux with Sway on the MacBook Air M2

https://daniel.lawrence.lu/blog/2024-12-01-asahi-linux-with-sway-on-the-macbook-air-m2/
2•andsoitis•1h ago•0 comments

Complaint Tablet to EA-NāṣIR – Oldest Customer Complaint

https://en.wikipedia.org/wiki/Complaint_tablet_to_Ea-nāṣir
1•andsoitis•1h ago•0 comments

Show HN: Crossview – visualize Crossplane resources and compositions

https://corpobit.com/products/crossview
1•moeidheidari•1h ago•0 comments