frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

https://seedance-2.net
1•dallen97•3m ago•0 comments

Wally: A fun, reliable voice assistant in the shape of a penguin

https://github.com/JLW-7/Wally
1•PaulHoule•4m ago•0 comments

Rewriting Pycparser with the Help of an LLM

https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/
1•y1n0•6m ago•0 comments

Lobsters Vibecoding Challenge

https://gist.github.com/MostAwesomeDude/bb8cbfd005a33f5dd262d1f20a63a693
1•tolerance•6m ago•0 comments

E-Commerce vs. Social Commerce

https://moondala.one/
1•HamoodBahzar•7m ago•1 comments

Avoiding Modern C++ – Anton Mikhailov [video]

https://www.youtube.com/watch?v=ShSGHb65f3M
1•linkdd•8m ago•0 comments

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

https://www.aegismind.app
2•aegismind_app•12m ago•1 comments

Zig – Package Management Workflow Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
1•Retro_Dev•14m ago•0 comments

AI-powered text correction for macOS

https://taipo.app/
1•neuling•17m ago•1 comments

AppSecMaster – Learn Application Security with hands on challenges

https://www.appsecmaster.net/en
1•aqeisi•18m ago•1 comments

Fibonacci Number Certificates

https://www.johndcook.com/blog/2026/02/05/fibonacci-certificate/
1•y1n0•20m ago•0 comments

AI Overviews are killing the web search, and there's nothing we can do about it

https://www.neowin.net/editorials/ai-overviews-are-killing-the-web-search-and-theres-nothing-we-c...
3•bundie•25m ago•1 comments

City skylines need an upgrade in the face of climate stress

https://theconversation.com/city-skylines-need-an-upgrade-in-the-face-of-climate-stress-267763
3•gnabgib•26m ago•0 comments

1979: The Model World of Robert Symes [video]

https://www.youtube.com/watch?v=HmDxmxhrGDc
1•xqcgrek2•30m ago•0 comments

Satellites Have a Lot of Room

https://www.johndcook.com/blog/2026/02/02/satellites-have-a-lot-of-room/
2•y1n0•30m ago•0 comments

1980s Farm Crisis

https://en.wikipedia.org/wiki/1980s_farm_crisis
4•calebhwin•31m ago•1 comments

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•36m ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•43m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•51m ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•51m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
2•rolph•53m ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•54m ago•3 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•56m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•58m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•59m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•1h ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
3•rolph•1h ago•1 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•1h ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•1h ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
6•cratermoon•1h ago•0 comments
Open in hackernews

Show HN: HalluMix – A Benchmark for Real-World LLM Hallucination Detection

https://huggingface.co/blog/quotientai/hallumix
4•jneagu•9mo ago
We built HalluMix to evaluate how well hallucination detectors perform in the kinds of messy, high-stakes environments where LLMs are actually deployed: long-form outputs, multi-document contexts, and domain-specific tasks like law, medicine, science, and news.

Most existing benchmarks focus on synthetic or short-form QA data. That didn’t reflect what we were seeing in production, so we built our own to test our hallucination detectors, and decided to open source it.

The dataset includes 6,500 examples across QA, summarization, and NLI tasks. We added distractor documents, shuffled the context, and removed assumptions about format (like requiring a question) to better reflect real-world conditions.

We ran 7 detection systems on it, both open-source models and commercial APIs. While some performed well on shorter examples, even the best struggled with long-form content and multi-document grounding -- precisely where hallucinations tend to be most harmful.

Would love feedback, especially from anyone working on evals, hallucination detection, or RAG.

Links: – HF Dataset: https://huggingface.co/datasets/quotient-ai/hallumix - HF Blog: https://huggingface.co/blog/quotientai/hallumix – Internal Blog: https://blog.quotientai.co/introducing-hallumix-a-task-agnos... – Paper: https://arxiv.org/abs/2505.00506