frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Goal: Ship 1M Lines of Code Daily

2•feastingonslop•10m ago•0 comments

Show HN: Codex-mem, 90% fewer tokens for Codex

https://github.com/StartripAI/codex-mem
1•alfredray•12m ago•0 comments

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

https://github.com/pnrajan/fastlangml
1•sachuin23•16m ago•1 comments

LineageOS 23.2

https://lineageos.org/Changelog-31/
1•pentagrama•19m ago•0 comments

Crypto Deposit Frauds

2•wwdesouza•20m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
2•lostlogin•20m ago•0 comments

Framing an LLM as a safety researcher changes its language, not its judgement

https://lab.fukami.eu/LLMAAJ
1•dogacel•23m ago•0 comments

Are there anyone interested about a creator economy startup

1•Nejana•24m ago•0 comments

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

https://github.com/8ddieHu0314/Skill-Lab
1•qu4rk5314•24m ago•0 comments

2003: What is Google's Ultimate Goal? [video]

https://www.youtube.com/watch?v=xqdi1xjtys4
1•1659447091•24m ago•0 comments

Roger Ebert Reviews "The Shawshank Redemption"

https://www.rogerebert.com/reviews/great-movie-the-shawshank-redemption-1994
1•monero-xmr•27m ago•0 comments

Busy Months in KDE Linux

https://pointieststick.com/2026/02/06/busy-months-in-kde-linux/
1•todsacerdoti•27m ago•0 comments

Zram as Swap

https://wiki.archlinux.org/title/Zram#Usage_as_swap
1•seansh•40m ago•0 comments

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

https://greensdictofslang.com/
1•mxfh•41m ago•0 comments

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

https://www.bloomberg.com/news/articles/2026-02-06/nvidia-ceo-says-ai-capital-spending-is-appropr...
1•virgildotcodes•44m ago•2 comments

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

https://www.styloshare.com
1•stylofront•45m ago•0 comments

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

1•PhantomKey•49m ago•0 comments

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

https://github.com/qrafty-ai/teleop_xr
1•playercc7•52m ago•1 comments

The Highest Exam: How the Gaokao Shapes China

https://www.lrb.co.uk/the-paper/v48/n02/iza-ding/studying-is-harmful
2•mitchbob•56m ago•1 comments

Open-source framework for tracking prediction accuracy

https://github.com/Creneinc/signal-tracker
1•creneinc•58m ago•0 comments

India's Sarvan AI LLM launches Indic-language focused models

https://x.com/SarvamAI
2•Osiris30•59m ago•0 comments

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

https://github.com/TermiX-official/cryptoclaw
1•cryptoclaw•1h ago•0 comments

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

https://twitter.com/sathish316/status/2020116849065971815
1•sathish316•1h ago•2 comments

CReact Version 0.3.0 Released

https://github.com/creact-labs/creact
1•_dcoutinho96•1h ago•0 comments

Show HN: CReact – AI Powered AWS Website Generator

https://github.com/creact-labs/ai-powered-aws-website-generator
1•_dcoutinho96•1h ago•0 comments

The rocky 1960s origins of online dating (2025)

https://www.bbc.com/culture/article/20250206-the-rocky-1960s-origins-of-online-dating
1•1659447091•1h ago•0 comments

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

https://github.com/Parassharmaa/agent-fetch
1•paraaz•1h ago•0 comments

Why there is no official statement from Substack about the data leak

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
14•witnessme•1h ago•4 comments

Effects of Zepbound on Stool Quality

https://twitter.com/ScottHickle/status/2020150085296775300
2•aloukissas•1h ago•1 comments

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

https://seedance.ai/
2•bigbromaker•1h ago•0 comments
Open in hackernews

Show HN: HalluMix – A Benchmark for Real-World LLM Hallucination Detection

https://huggingface.co/blog/quotientai/hallumix
4•jneagu•9mo ago
We built HalluMix to evaluate how well hallucination detectors perform in the kinds of messy, high-stakes environments where LLMs are actually deployed: long-form outputs, multi-document contexts, and domain-specific tasks like law, medicine, science, and news.

Most existing benchmarks focus on synthetic or short-form QA data. That didn’t reflect what we were seeing in production, so we built our own to test our hallucination detectors, and decided to open source it.

The dataset includes 6,500 examples across QA, summarization, and NLI tasks. We added distractor documents, shuffled the context, and removed assumptions about format (like requiring a question) to better reflect real-world conditions.

We ran 7 detection systems on it, both open-source models and commercial APIs. While some performed well on shorter examples, even the best struggled with long-form content and multi-document grounding -- precisely where hallucinations tend to be most harmful.

Would love feedback, especially from anyone working on evals, hallucination detection, or RAG.

Links: – HF Dataset: https://huggingface.co/datasets/quotient-ai/hallumix - HF Blog: https://huggingface.co/blog/quotientai/hallumix – Internal Blog: https://blog.quotientai.co/introducing-hallumix-a-task-agnos... – Paper: https://arxiv.org/abs/2505.00506