news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

FinanceBench: Agentic RAG beats full-context by 7.7 points using the same model

https://meetdewey.com/blog/financebench-eval

1•lambdabaa•1h ago

Comments

lambdabaa•1h ago

We ran Dewey's agentic retrieval endpoint on all 150 FinanceBench questions, a benchmark of financial Q&A over real SEC filings. To control for model improvements, we also ran Claude Opus 4.6 directly with each PDF loaded into context (no retrieval). Full-context scored 76.0%; agentic retrieval with the same model scored 83.7%. Six PepsiCo 10-Ks exceeded Claude's 1M token limit and couldn't be answered via full-context at all.

The finding that surprised us most: document enrichment (section summaries, table captions) added 3.8 points for Opus and cost 1.6 points for GPT-5.4. Same features, opposite effects. The explanation is in the tool call distributions. Opus averaged 21 searches per question, GPT-5.4 averaged 9. Enrichment is a navigation aid and if you're not navigating, it's noise.

Show HN: Embedding Similarity with Confidence Intervals

https://www.embedding-analytics.com

1•areebms•48s ago•0 comments

OpenRAG

https://github.com/langflow-ai/openrag

1•saikatsg•1m ago•0 comments

Testing suggests Google's AI Overviews tell lies per hour

https://arstechnica.com/google/2026/04/analysis-finds-google-ai-overviews-is-wrong-10-percent-of-...

1•mikelgan•2m ago•1 comments

The Privilege of Doing Nothing

https://alterick.bearblog.dev/nothing/

1•speckx•2m ago•0 comments

A 24/7 live AI-generated sitcom where agents write their own episodes

https://tv.bothn.com

1•pranabsarkar•3m ago•0 comments

Middle East ceasefire in serious doubt as Israel assaults Lebanon

https://www.theguardian.com/world/2026/apr/08/middle-east-ceasefire-doubt-israel-lebanon-iran-oil...

1•n1b0m•6m ago•0 comments

When Moltbook's Supabase key went public, the AI agents didn't panic

https://aigeopolitics.substack.com/p/the-moltbook-social-media-platform-dd5

1•billfriend•7m ago•0 comments

NASA Artemis II Wallpapers

https://www.nasa.gov/artemis-ii-mobile-wallpapers/

2•bookofjoe•9m ago•0 comments

Write Not to Be Read (Yes, You Read That Right)

https://canro91.github.io/2026/04/07/NotRead/

1•speckx•10m ago•0 comments

I Solved Connect 4

https://www.youtube.com/watch?v=KaljD3Q3ct0

1•gavide•10m ago•0 comments

Show HN: Host Infinite Python Services

https://phemeral.dev/docs

1•cshjoshi•13m ago•0 comments

AI-Driven Demand for Gas Turbines Risks a New Energy Crunch

https://www.bloomberg.com/features/2025-bottlenecks-gas-turbines/

1•sethbannon•14m ago•0 comments

Built a tool that simulates company-specific interviewers

https://portlumeai.com

1•portlumeai•14m ago•0 comments

A Learning a Day: Daily Posts Since May 2008

https://alearningaday.blog/archives/

1•Olshansky•14m ago•0 comments

Show HN: OpenMix, open-source computational framework for formulation science

https://github.com/vijayvkrishnan/openmix

1•vijayvkrishnan•15m ago•0 comments

With Cox V. Sony The Supreme Court Provides Another Internet-Protecting Decision

https://www.techdirt.com/2026/04/07/with-cox-v-sony-the-supreme-court-provides-yet-another-intern...

2•hn_acker•15m ago•1 comments

What Is Ghost Murmur? Secretive CIA Tool Linked to Iran Airman Rescue

https://www.newsweek.com/ghost-murmur-secretive-cia-tool-iran-airman-rescue-11797688

3•petethomas•15m ago•0 comments

Upgrading MacBook Neo to 1 TB using iPhone parts [video]

https://www.youtube.com/watch?v=bIeEGeTd5DE

2•burnt-resistor•16m ago•0 comments

Why LLMs Can't Play Chess

https://www.nicowesterdale.com/blog/why-llms-cant-play-chess

3•osrec•16m ago•1 comments

Understanding the Kalman Filter with a Simple Radar Example

https://kalmanfilter.net

2•alex_be•16m ago•0 comments

Tesla can play music from a floppy drive

https://twitter.com/olegkutkov/status/2041925827416277460

1•stefan_•16m ago•0 comments

RenderDraw Lens – Give AI coding tools visual context from the browser

https://renderdraw.com/tools/lens

2•eshivers•17m ago•0 comments

Do DMCA Takedown Notices Need to Expressly Refer to the Lack of Fair Use?

https://blog.ericgoldman.org/archives/2026/03/do-dmca-takedown-notices-need-to-expressly-refer-to...

2•hn_acker•17m ago•1 comments

Show HN: Canvora – describe what you want, get a branded visual in any language

https://canvora.ai

1•vivekalogics•19m ago•0 comments

Greece to ban social media for under-15s from next year

https://www.bbc.com/news/articles/ckgx1x742x5o

3•Brajeshwar•21m ago•0 comments

The reason your Fort Lauderdale competitor is ranking above you

https://fortauderdaleseo.substack.com/p/the-reason-your-fort-lauderdale-competitor

1•auditnews•21m ago•1 comments

How Pakistan managed to get the US and Iran to a ceasefire

https://www.aljazeera.com/features/2026/4/8/how-pakistan-managed-to-get-the-us-and-iran-to-a-ceas...

1•rkp8000•21m ago•0 comments

Scaling Managed Agents: Decoupling the brain from the hands

https://www.anthropic.com/engineering/managed-agents

2•meetpateltech•21m ago•0 comments

Claude Managed Agents

https://claude.com/blog/claude-managed-agents

7•adocomplete•23m ago•1 comments

The Download: water threats in Iran and AI's impact on what entrepreneurs make

https://www.technologyreview.com/2026/04/08/1135405/the-download-water-threats-iran-ais-impact-on...

1•joozio•23m ago•0 comments