news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

GPS and Time Dilation – Special and General Relativity

https://philosophersview.com/gps-and-time-dilation/

1•mistyvales•1m ago•0 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd

1•davidcondrey•1m ago•1 comments

Show HN: I built a clawdbot that texts like your crush

https://14.israelfirew.co

1•IsruAlpha•3m ago•0 comments

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm

1•walterbell•6m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf

1•todsacerdoti•8m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721

1•_august•9m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/

2•martialg•9m ago•0 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816

1•chrsw•9m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc

1•jeffreyjin•10m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm

1•grantpitt•10m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist

1•crescit_eundo•15m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action

1•breve•15m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/

1•sijain2•16m ago•0 comments

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...

13•randycupertino•17m ago•3 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/

3•janandonly•19m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now

1•SerCe•20m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465

2•NBenkovich•20m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning

1•swyx•20m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes

1•onesandofgrain•28m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...

9•karakoram•28m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/

1•mwenge•28m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/

1•r1z4•28m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340

2•PaulHoule•30m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay

2•dshearer•31m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/

1•todsacerdoti•32m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets

2•thoughtfulchris•34m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch

1•ccie14019•36m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/

2•SirLJ•38m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/

4•randycupertino•39m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing

3•breve•44m ago•0 comments

Open in hackernews

SigmaEval – statistical evaluation for GenAI apps

https://github.com/Itura-AI/SigmaEval

1•TarekOraby•4mo ago

Comments

TarekOraby•4mo ago

Hey HN, I released SigmaEval, a Python framework to evaluate GenAI applications.

Non-deterministic outputs of LLM-based apps don’t fit pass/fail tests, leading teams to often ship without confidence. SigmaEval aims to solve this by adopting a statistical evaluation approach, similar to that used in clinical trials. It supports statements such as: “We are 95% confident that our AI will resolve at least 90% of user issues with a quality score of 8/10 or higher.”

It works in three steps:

- Define “good”: You describe the test scenario and desired outcome in plain English (e.g., “when a new user asks about the bot’s capabilities” -> “then the bot lists its main functions”).

- Simulate: An AI user simulator exercises your app repeatedly, switching styles (polite, impatient, verbose) to build a diverse conversation set.

- Judge & analyze: An AI judge scores each conversation against your definition of success. SigmaEval runs binomial and bootstrap tests to decide whether you meet your quality bar at a chosen confidence level.

SigmaEval is LLM-provider, and testing-framework, agnostic.

Open source (Apache 2.0).

GitHub: https://github.com/Itura-AI/sigmaeval

PyPI: https://pypi.org/project/sigmaeval-framework/

I’m the creator and happy to answer questions.