frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
1•hhs•2m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

1•vampiregrey•5m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•6m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
1•hhs•8m ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•8m ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

1•Philpax•8m ago•0 comments

Postgres Message Queue (PGMQ)

https://github.com/pgmq/pgmq
1•Lwrless•12m ago•0 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
1•cui•15m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
1•geox•16m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
2•EA-3167•17m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
5•fliellerjulian•19m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•21m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•21m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
2•RickJWagner•23m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•23m ago•0 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
9•jbegley•24m ago•1 comments

DevXT – Building the Future with AI That Acts

https://devxt.com
2•superpecmuscles•25m ago•4 comments

A Minimal OpenClaw Built with the OpenCode SDK

https://github.com/CefBoud/MonClaw
1•cefboud•25m ago•0 comments

The silent death of Good Code

https://amit.prasad.me/blog/rip-good-code
3•amitprasad•25m ago•0 comments

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

https://www.vo2maxpro.com/blog/internal-negotiation-heart-rate
1•GoodluckH•27m ago•0 comments

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)

https://github.com/AveryClapp/glance
2•AveryClapp•28m ago•0 comments

Busy for the Next Fifty to Sixty Bud

https://pestlemortar.substack.com/p/busy-for-the-next-fifty-to-sixty-had-all-my-money-in-bitcoin-...
1•mithradiumn•29m ago•0 comments

Imperative

https://pestlemortar.substack.com/p/imperative
1•mithradiumn•30m ago•0 comments

Show HN: I decomposed 87 tasks to find where AI agents structurally collapse

https://github.com/XxCotHGxX/Instruction_Entropy
2•XxCotHGxX•33m ago•1 comments

I went back to Linux and it was a mistake

https://www.theverge.com/report/875077/linux-was-a-mistake
3•timpera•34m ago•1 comments

Octrafic – open-source AI-assisted API testing from the CLI

https://github.com/Octrafic/octrafic-cli
1•mbadyl•36m ago•1 comments

US Accuses China of Secret Nuclear Testing

https://www.reuters.com/world/china/trump-has-been-clear-wanting-new-nuclear-arms-control-treaty-...
3•jandrewrogers•37m ago•2 comments

Peacock. A New Programming Language

2•hashhooshy•41m ago•1 comments

A postcard arrived: 'If you're reading this I'm dead, and I really liked you'

https://www.washingtonpost.com/lifestyle/2026/02/07/postcard-death-teacher-glickman/
4•bookofjoe•43m ago•1 comments

What to know about the software selloff

https://www.morningstar.com/markets/what-know-about-software-stock-selloff
2•RickJWagner•46m ago•0 comments
Open in hackernews

Ask HN: Who is doing the best Word/PDF RAG tool with deep research?

4•_samjarman•6mo ago
Hi HN, which SaaS providers are you eyeing up these days for your RAG needs with thousands of PDFs or Word docs and with a agent that can take its time and give well researched, cited answers? TIA!

Comments

randomname4325•6mo ago
checkout www.Airwave.us. They are focused on field services where techs comb through thousands of pages of manuals/documentation for part numbers or specific instructions that have to be 100% accurate.
Norcim133•6mo ago
I spent the last 2 months trying out RAG/parsing plays. My use-case required high accuracy on complex tables and figures.

Ranking: 1. LlamaCloud/LlamaParse 2. GroundX 3. Unstructured.io 4. Google RAG Engine 5. Docling ... capability gap... 6. Azure - Document Intelligence 7. AWS - Textract 8. LlamaIndex (DIY)

Imanari•6mo ago
This ranking is just for the parsing, not the RAG Portion, correct?
Norcim133•6mo ago
Correct-ish. LlamaCloud and GroundX do everything up to retrieval. Here is an interactive graphic of major players along RAG flow: https://claude.ai/public/artifacts/b872435b-1d9c-461e-a29c-b...
TXTOS•6mo ago
I've been working on something that directly targets this problem: WFGY — a reasoning engine built for RAG on large-scale PDF/Word documents, especially when you're doing deep research, not just shallow QA.

Instead of just chunking text and throwing it into an embedding model, WFGY builds a persistent semantic resonance layer — meaning it tracks context through formatting breaks, footnotes, diagram captions, even corrupted OCR sections.

The engine applies multiple self-correcting pathways (we call them BBMC and BBPF) so even when parsing is incomplete or wrong, reasoning still holds. That’s crucial if your source materials are academic papers, messy reports, or 1000+ page archives.

It’s open source. No tuning. Works with any LLM. No tricks.

Backed by the creator of tesseract.js (36k) — who gets why document mess is the real challenge.

Check it out: https://github.com/onestardao/WFGY

lisa_coicadan•6mo ago
Great thread, we’ve seen the exact same pain points around working with large volumes of complex PDFs/Word docs.

At Retab.com, we focus on the “hard pre-RAG” layer: turning raw documents : including scanned reports, OCR messes, financial statements, or regulatory filings... into clean, structured, model-ready data.

Instead of relying on embeddings over noisy text chunks, we use schema-driven generation, multi-LLM consensus, and an evaluation UI to ensure output is accurate, complete, and explainable. No manual parsing, no hallucinations, just structured JSON (or any format you want), ready for retrieval, agents, or analytics.

We work with teams doing RAG on contracts, audits, earnings reports, etc.. anywhere that “close enough” isn’t good enough. Happy to run your hardest docs through Retab if you want to benchmark against WFGY or LlamaParse

_samjarman•6mo ago
What makes a PDF 'hard' in your mind?