frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
1•Brajeshwar•1m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
1•Brajeshwar•1m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•1m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•4m ago•0 comments

Kernel Key Retention Service

https://www.kernel.org/doc/html/latest/security/keys/core.html
1•networked•4m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
1•righthand•7m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•8m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•8m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•9m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•14m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•18m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•23m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•24m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•25m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
3•okaywriting•31m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•34m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•35m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•36m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•37m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•37m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•37m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•38m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•42m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•42m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•43m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•43m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•52m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•52m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
2•surprisetalk•54m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•54m ago•0 comments
Open in hackernews

AI will make formal verification go mainstream

https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html
6•mau•1mo ago

Comments

mvr123456•1mo ago
This is going to happen and is real stuff we could be working towards with the tools that we already have. No need for AGI vaporware, no waiting around for the perfect agentic playground. Not even necessarily a big requirement for excellent reasoning. Just using LLMs for what they are actually good at, i.e. fuzzy translators, stylistic filters, and compilers.

Gradual-typing was practice and hints. Like gradual typing, gradual spec'ing could be an iterative and kind of parallel annotated representation, ignored by the main runtime unless called for, and ignored by developers that aren't interested in it. But when there's enough of it to hit some kind of critical mass, then it's suddenly very powerful and lots of very interesting stuff is possible

mutkach•1mo ago
I certainly hope so.

I wonder, what is the actual blocker right now? I'd assume that LLMs are still not very good with specifications and verifcation languages? Anyone tried Datalog, TLA+, etc. with LLMs? I suppose that Gemini was specifically trained on Lean. Or at least some IMO-finetuned fork of it. Anyhow, there's probably a large Lean dataset collected somewhere in Deepmind servers, but that's not certification applicable necessarily, I think?

> AI also creates a need to formally verify more software: rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct.

At RL stage LLMs could game the training*, proving easier invariants then actually expected (the proof is correct and possibly short - means positive reward). It would take additional care it to set it up right.

* I mean, if you set it to generate a code AND a proof to it.

lukeasrodgers•1mo ago
I would like this to be true, but am a bit skeptical.

I am what the article calls an "industrial software engineer" and I work on "low- to medium-assurance" projects, but have used various formal methods (alloy and TLA+) in my work to prevent and discover bugs.

I've experimented with using LLMs to generate both Alloy and TLA+ a couple times over the past years, and the problems I see are:

- They have gotten better over the last few years, but still can only produce useful results in the hands of someone who is moderately competent. Becoming moderately competent requires many hours of investment in these tools, and you will lose much of this competence if you don't keep it up. For example, I can still read TLA+ and Pluscal but can't write them without lots of referring to the docs because I only write them like once or twice a year.

- They suffer even more from GIGO than other aspects of software development. If you can't really rigorously define your problem you will get a bad model/output that only gives you false confidence. A large part of the value of doing formal methods is building the muscle for thinking rigorously. Hillel Wayne says this in several places, that doing enough TLA+ (e.g.) work gives you a much better innate sense for where there will be race conditions.

- There will still be a cultural and technical problems with integrating formal methods, and their artifacts, into the rest of your codebase and team. For example, how do you prevent drift? Will you have a CI automation that uses an LLM to detect when the spec has diverged from the code?

I'm not saying it is impossible that this will happen, and I would love to be wrong, but the general tendency I see with LLM use is to make software developers less intimately familiar with their tools, and less invested in deeply understanding their code. That bodes ill for formal methods even more than regular programming.

mutkach•1mo ago
What would you suggest as a reference problem (a benchmark of sorts) to try to play with formal methods for someone with just a bit of formal verification background but not in the field of software verification? Can you suggest some helpful materials?

I've come across TLA+ multiple times, but it seems it was more targeted towards distributed systems (Lamport being the creator, that makes sense). Is it correct, that it would be useless in other domains?