frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

1M context window now generally available for Claude Opus and Sonnet 4.6

https://twitter.com/claudeai/status/2032509548297343196
1•tosh•4m ago•0 comments

Ask HN: Is "fast, cheap, correct – pick two" still true in software development?

1•VWWHFSfQ•6m ago•2 comments

Pandas Exercises for Data Analysis (Interactive)

https://machinelearningplus.com/python/101-pandas-exercises-python-interactive/
2•selva86•7m ago•1 comments

Why physical AI is becoming manufacturing's next advantage

https://www.technologyreview.com/2026/03/13/1134184/why-physical-ai-is-becoming-manufacturings-ne...
2•joozio•9m ago•0 comments

AI‑driven fraud and corporate crime: Risks, controls and insurance implications

https://www.wtwco.com/en-us/insights/2026/02/ai-driven-fraud-and-corporate-crime-risks-controls-a...
1•mooreds•9m ago•0 comments

Nvidia: Parrot

https://nvlabs.github.io/parrot/index.html
1•tosh•10m ago•0 comments

Your Phone Is an Entire Computer

https://medhir.com/blog/your-phone-is-an-entire-computer
2•medhir•10m ago•0 comments

AI writing has a homogeneity problem

https://usenoren.ai/blog/why-ai-writing-sounds-the-same
1•snoren•12m ago•1 comments

Adobe to Offer $75M in Free Services to Settle Government Lawsuit

https://www.bloomberg.com/news/articles/2026-03-13/adobe-cancellation-lawsuit-settled-for-150-mil...
3•1vuio0pswjnm7•13m ago•0 comments

Opus 4.6 1M is now the default Opus model for Claude Code users

https://www.threads.com/@boris_cherny/post/DV1Wt9XkcwB
2•rob•13m ago•1 comments

Ask HN: Did Claude Code just bump Opus default to 1M context?

1•9wzYQbTYsAIc•13m ago•1 comments

Adobe will pay $75M to settle US cancellation fee lawsuit

https://www.theverge.com/tech/894555/adobe-75-million-doj-settlement-subscriptions
3•speckx•13m ago•0 comments

Why people in L.A. are strapping cameras on their bodies to do chores

https://www.latimes.com/business/story/2026-03-12/why-people-in-la-are-strapping-cameras-on-their...
1•megabless123•14m ago•0 comments

How to Secure a Terraform Scripts

1•sandipan1988•14m ago•0 comments

Adobe pays $75M to settle over termination fees, subscription cancellations

https://www.reuters.com/world/adobe-pay-75-million-resolve-us-lawsuit-over-fees-subscription-canc...
3•1vuio0pswjnm7•14m ago•0 comments

Show HN: Context Gateway – Compress agent context before it hits the LLM

https://github.com/Compresr-ai/Context-Gateway
6•ivzak•14m ago•3 comments

Custom clothing is cheap and easy to order

https://pilk.website/4/custom-clothing-is-cheap-and-easy
1•npilk•14m ago•0 comments

MLX: Basics

https://ml-explore.github.io/mlx/build/html/usage/quick_start.html
1•tosh•15m ago•0 comments

Show HN: Stint – Fire-and-forget AI agent orchestration

https://github.com/ilocn/stint
1•niceguy1827•16m ago•0 comments

Show HN: Let AI agents debug your Valkey/Redis

https://www.npmjs.com/package/@betterdb/mcp
1•kaliades•17m ago•0 comments

Aircraft Lease

https://en.wikipedia.org/wiki/Aircraft_lease
1•kamaraju•19m ago•0 comments

Who is footing the AI energy bill? Debate over data center electricity costs

https://www.cnbc.com/2026/03/13/ai-data-centers-electricity-prices-backlash-ratepayer-protection....
2•1vuio0pswjnm7•19m ago•0 comments

Productivity and Entropy

https://www.subbu.org/articles/2026/productivity-and-entropy/
2•kiyanwang•19m ago•1 comments

What's My ΔE(OK)JND?

https://www.keithcirkel.co.uk/whats-my-jnd/
2•grezesf•19m ago•1 comments

DoShare Personal Cloud - Your Cloud, Your Rules

https://getcloud.doshare.me
1•vednig•20m ago•0 comments

Tomorrow's World: Nellie the School Computer 15 February 1969 – BBC [video]

https://www.youtube.com/watch?v=f1DtY42xEOI
1•stevekemp•20m ago•0 comments

John Carmack about open source and anti-AI activists

https://twitter.com/id_aa_carmack/status/2032460578669691171
4•tzury•21m ago•0 comments

Gamers' Worst Nightmares About AI Are Coming True

https://www.wired.com/story/gamers-ai-nightmares-are-coming-true/
4•Brajeshwar•22m ago•1 comments

HSBC UK Banking App Blocks Use Until Sideloaded Bitwarden Is Removed

https://twitter.com/benonwine/status/2032160764845285606
1•hnburnsy•22m ago•0 comments

The Accidental Room (2018)

https://99percentinvisible.org/episode/the-accidental-room/
3•blewboarwastake•23m ago•0 comments
Open in hackernews

LLMs ace bar exams, but even the best gets 1 in 12 local queries wrong

https://voygr-tech.github.io/llm-local-search-benchmark-report/
4•yamarkov•1h ago

Comments

yamarkov•1h ago
VOYGR team here. We built this because we kept running into the same problem: LLMs confidently recommending places that turned out to be closed, fabricated, or in the wrong neighborhood. We wanted to measure how bad it actually is.

Setup: 345 prompts across 50+ cities, 5 task types (discovery, place details, navigation, booking, sharing), each run across ChatGPT, Gemini, Claude, and Perplexity with search ON and OFF. 2,415 total evaluated responses. Every recommended place was verified against Google Search and Maps.

What surprised us:

1. Search makes booking tasks worse. Enabling web search improved discovery by ~8 points but hurt transactional tasks. Claude and Gemini both lost 5+ points on "help me book a table" prompts. Models switched from giving step-by-step advice to quoting search snippets.

2. Every model confidently books you a table at closed restaurants. We tested a permanently closed Buenos Aires restaurant. All 7 configs gave booking guidance and seating tips. Even search-equipped models didn't catch it.

3. The real gap is constraint matching. Models find real places but ignore parts of the prompt: price range, neighborhood, cuisine type. Ask for "affordable rooftop bars in Gangnam" and you get champagne lounges with $30 cocktails. This gap is 16 points between the best and worst provider.

The full methodology is in the report. We're planning to open-source the benchmark repo (all 345 prompts, evaluation pipeline, and raw results) in the coming weeks.

We built a *Business Validation API* designed for AI developers and agents, catching these failures before they reach production. Pass in a place name and address from any LLM response and get back: existence verification and operating status. These are the exact checks that would have caught fatal flaws in this benchmark. Link is in the report if you want to try it.

Happy to answer questions about methodology or anything else.