frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

PhantomBench: Benchmarking the Non-Existential Threat of Language Models

https://arxiv.org/abs/2606.11105
1•root-parent•1h ago

Comments

root-parent•1h ago
"...Hallucinations, where language models (LMs) generate factually ungrounded responses, pose serious risks, as users tend to blindly rely on them. This is particularly concerning in high-stakes domains, where consequences of such model behavior can lead to significant harms. Despite notable progress in understanding hallucinations, it remains unclear how reliably these models can recognize the limits of their knowledge....

...We introduce PhantomBench, the first large-scale benchmark of its kind, comprising more than 60K non-existent terms and entities derived from real concepts across diverse domains. Using our benchmark, we evaluate a total of 21 models of various types and sizes....

...We show staggering hallucination rates across the board (with average rates as high as 86.7% in some cases), and note that even frontier models surprisingly fail to abstain on non-existent concepts, especially when the input presumes their existence..."

IsItPatched – instant verdict on whether your software version is safe

https://isitpatched.com/
1•NickBotha•1m ago•0 comments

I built a one click installer on token compression engine for AI tools

1•rachidalm•1m ago•0 comments

Show HN: BizChecker AI – 6 competing AI models stress-test your business idea

https://bizchecker.ai/
1•CryptoAMLcheck•3m ago•0 comments

Building a Polymarket Trading Bot (System Design Overview)

https://github.com/Benjam1nCup/Polymarket-trading-bot-python-V2
1•Benjam1ncup•3m ago•0 comments

Show HN: Omegacode – an agent agnostic version of Claude Workflows

https://github.com/SawyerHood/omegacode
1•sawyerjhood•4m ago•0 comments

The Web is the Next Platform (1995)

https://benslivka.com/2017/08/15/the-web-is-the-next-platform-5271995/
1•downbad_•5m ago•0 comments

Trip report: June 2026 ISO C++ standards meeting (Brno, Czechia)

https://herbsutter.com/2026/06/13/brno-trip-report/
1•matt_d•7m ago•0 comments

Show HN: Color Lab – Interactive 3D gamut explorer and uniform palette generator

https://colorlab.ferreyrapons.com
1•saabi•7m ago•0 comments

How to build a virtual cell and biology scaling laws

https://letter.nikomc.com/p/virtual-cells
1•ogundipeore•16m ago•0 comments

Now what?

https://blog.danieljanus.pl/now-what/
2•nathell•16m ago•0 comments

Rivian's CEO on Tesla's Cybertruck, Ferrari's Luce, and What Happens If the R2

https://www.wired.com/story/interview-with-rivian-ceo-rj-scaringe/
1•joozio•16m ago•0 comments

A case for AI-pragmatism – leverage cheap compute while it's here

https://quickthoughts.ca/posts/a-case-for-pragmatism-in-ai/
1•quickthoughts•17m ago•0 comments

We recommend Highway over std::simd

https://github.com/google/highway/blob/master/g3doc/std_simd_comparison.md
1•coffeeaddict1•19m ago•0 comments

China's Notorious University-Entrance Exam, Gaokao Is Changing

https://www.economist.com/china/2026/06/11/chinas-notorious-university-entrance-exam-is-changing
2•karakoram•20m ago•1 comments

Write-heavy sysbench tests, a large server, modern Postgres and MySQL

http://smalldatum.blogspot.com/2026/06/write-heavy-sysbench-tests-large-server.html
1•ksec•20m ago•0 comments

The Army bought 10k IVAS headsets. Soldiers won't use them

https://taskandpurpose.com/news/ivas-headset-never-used/
4•iancmceachern•20m ago•1 comments

TensorSharp: Open Source Local LLM Inference Engine

https://github.com/zhongkaifu/TensorSharp
1•zhongkaifu•21m ago•1 comments

How Can Soccer Players Bend Their Shots in Midair?

https://www.wired.com/story/how-can-soccer-players-bend-their-shots-in-midair/
2•joozio•22m ago•0 comments

Scoping Rules: Global, Project, Path-Glob

https://medium.com/@tacoda/scoping-rules-global-project-path-glob-e2eea5d52f5e
1•tacoda•22m ago•0 comments

Why DuckDB

https://duckdb.org/why_duckdb
1•tosh•23m ago•0 comments

Show HN: Share any server without trusting the recipient

https://nesdzo.com/share-temporary-server-access/
1•nesdzoltd•23m ago•0 comments

Global density and biomass of arbuscular mycorrhizal fungal networks

https://www.science.org/doi/10.1126/science.adu4373
4•zdw•24m ago•0 comments

Amazon CEO's Talks with U.S. Officials Triggered Crackdown on Anthropic Models

https://www.wsj.com/tech/ai/amazon-ceos-talks-with-u-s-officials-triggered-crackdown-on-anthropic...
15•ls612•25m ago•7 comments

Blood pressure tech floods the market after FDA relaxes wearables oversight

https://www.statnews.com/2026/05/28/fda-wellness-guidance-unvetted-blood-pressure-tech-floods-mar...
4•randycupertino•32m ago•1 comments

Google Earth: Flight Simulator

https://twitter.com/googleearth/status/2065449043925381293
5•tosh•32m ago•1 comments

The adder at the heart of Intel's 8087 floating-point chip

https://www.righto.com/2026/06/intel-8087-adder-reverse-engineered.html
7•pwg•34m ago•1 comments

Vercel Drop

https://vercel.com/changelog/vercel-drop
2•taubek•37m ago•1 comments

AI Coding at Home Without Going Broke

https://stephen.bochinski.dev/blog/2026/06/13/ai-coding-at-home-without-going-broke/
29•sbochins•38m ago•21 comments

More Tailscale tricks for your jailbroken Kindle

https://tailscale.com/blog/jailbroken-kindle-proxy-tun-modes
1•Brajeshwar•42m ago•0 comments

Show HN: Lightweight C++23 S3 client with no extra deps (just curl and OpenSSL)

https://github.com/ggcr/s3cpp
2•ggcr•42m ago•0 comments