frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•2m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•3m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•3m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•4m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•5m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•6m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•7m ago•0 comments

Fantasy football that celebrates great games

https://www.silvestar.codes/articles/ultigamemate/
1•blenderob•7m ago•0 comments

Show HN: Animalese

https://animalese.barcoloudly.com/
1•noreplica•7m ago•0 comments

StrongDM's AI team build serious software without even looking at the code

https://simonwillison.net/2026/Feb/7/software-factory/
1•simonw•8m ago•0 comments

John Haugeland on the failure of micro-worlds

https://blog.plover.com/tech/gpt/micro-worlds.html
1•blenderob•8m ago•0 comments

Show HN: Velocity - Free/Cheaper Linear Clone but with MCP for agents

https://velocity.quest
2•kevinelliott•9m ago•2 comments

Corning Invented a New Fiber-Optic Cable for AI and Landed a $6B Meta Deal [video]

https://www.youtube.com/watch?v=Y3KLbc5DlRs
1•ksec•10m ago•0 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
2•nmfccodes•11m ago•0 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
2•eatitraw•17m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•17m ago•0 comments

The Super Sharp Blade

https://netzhansa.com/the-super-sharp-blade/
1•robin_reala•18m ago•0 comments

Smart Homes Are Terrible

https://www.theatlantic.com/ideas/2026/02/smart-homes-technology/685867/
1•tusslewake•20m ago•0 comments

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•21m ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•21m ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
3•birdmania•21m ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
5•samasblack•23m ago•2 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•24m ago•0 comments

Kagi Translate

https://translate.kagi.com
2•microflash•25m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•26m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
2•facundo_olano•28m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•28m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•28m ago•1 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•29m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•30m ago•0 comments
Open in hackernews

Show HN: EvalView pytest style tests for AI agents (budgets, hallucinations)

https://github.com/hidai25/eval-view
1•hidai25•1mo ago

Comments

hidai25•1mo ago
Hi HN, I built EvalView after an agent that worked fine in dev started inventing numbers in prod. Tracing showed me what happened after the fact, but I wanted CI to fail the deploy the moment the agent drifted. EvalView is basically pytest for agents: you write a YAML test, run it N times (to catch flakiness), and fail the build if behavior regresses. Example:

```yaml name: "Refund policy doesn't hallucinate" runs: 10 pass_rate: 0.8 input: query: "What's our refund policy?" assert: - tool_called: "kb_search" - no_unsupported_claims: true - max_cost_usd: 0.05 ```

Instead of exact text matching, the checks focus on constraints: did it call the right tools, did it make claims not supported by tool results/context, and did it stay within cost/latency budgets. I also added optional local LLM-as-judge via Ollama so evals don’t burn API credits on every run. If you’re shipping agents to prod, what’s been your worst failure mode: tool misuse, budget blowups, or confident nonsense? Happy to answer questions. Hidai