frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Semcheck – AI Tool for checking implementation follows spec

https://github.com/rejot-dev/semcheck
13•duckerduck•4d ago
Hi HN, like many I've been interested in the direction software engineering is taking now that coding LLMs are becoming prevalent. It seems that we're not quite there for "natural language programming", but it seems new abstractions are already starting to form. In order to explore this further I've built semcheck (semantic checker). It's a simple cli tool that can be used in CI or pre-commit to check that your implementation matches your specification using LLMs.

The inspiration came while I was working on another project where I needed a data structure for a GeoJSON object, I passed Claude the text of RFC-7946 and it gave me an implementation. It took some back and forth after that before I was happy with it, but this also meant the RFC went out of context for the LLM. That's why I asked Claude again to check the RFC to make sure we haven't strayed too far from the spec. It occurred to me that it would be good to have a formal way of defining these kinds of checks that can be run in a pre-commit or merge request flow.

Creating this tool was itself an experiment to try "spec-driven-development" using Claude Code, a middle ground between completely vibe-coding and traditional programming. My workflow was as follows: ask AI to write a spec and implementation plan, edit these manually to my liking, then ask AI to execute one step at a time. Being careful that the AI doesn't drift too far from what I think is required. My very first commit [1] is the specification of the config file structure and an implementation plan.

As soon as semcheck was in a state where it could check itself it started to find issues [2]. I found that this workflow improves not just your implementation but helps you refine your specification at the same time.

Besides specification, I also started to include documentation in my rules, making sure that the configuration examples and CLI flags I have in my README.md file stay in line with implementation [3].

The best thing is that you can put found issues directly back into your AI editor for a quick iteration cycle.

Some learnings:

- LLMs are very good at finding discrepancies, as long as the number of files you pass to the comparison function isn't too large, in other words the true-positive results are quite good.

- False-positives: the LLM is a know-it-all (literally) and often thinks it knows better. The LLM is eager to use its own world knowledge to find faults. This can both be nice and problematic. I've often had it complain that my Go version doesn't exist, but it was simply released after the knowledge cutoff of that model. I specifically prompt [4] the model to only find discrepancies, but it often "chooses" to use its knowledge anyway.

- In an effort to reduce false-positives I ask the model to give me a confidence score (0-1), to indicate to me how sure it was that the issue it found is actually applicable in this scenario. The models are always super confident and output values > 0.7 almost exclusively.

- One thing that did reduced false-positives significantly is asking the model to give its reasoning before assigning a severity level to an issue found.

- In my (rudimentary) experiments I found that "thinking" models like O3 don't improve on performance much and are not worth the additional tokens/time. (likely because I already ask for the reasoning anyway)

- The models that perform best are Claude 4 and GPT-4.1

Let me know if you could see this be useful in your workflow, and what feature you would need to make it functional.

[1]: https://github.com/rejot-dev/semcheck/commit/ce0af27ca0077fe...

[2]: https://github.com/rejot-dev/semcheck/commit/2f96fc428b551d9...

[3]: https://github.com/rejot-dev/semcheck/blob/47f7aaf98811c54e2...

[4]: https://github.com/rejot-dev/semcheck/blob/fec2df48304d9eff9...

X-Clacks-Overhead

https://xclacksoverhead.org/home/about
36•weinzierl•3d ago•5 comments

The messy reality of SIMD (vector) functions

https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/
77•mfiguiere•7h ago•41 comments

Being too ambitious is a clever form of self-sabotage

https://maalvika.substack.com/p/being-too-ambitious-is-a-clever-form
409•alihm•16h ago•123 comments

The EU wants to decrypt your private data by 2030

https://www.techradar.com/vpn/vpn-privacy-security/the-eu-wants-to-decrypt-your-private-data-by-2030
7•senfiaj•10m ago•1 comments

The Moat of Low Status

https://usefulfictions.substack.com/p/learn-to-love-the-moat-of-low-status
189•jger15•2d ago•75 comments

Mini NASes marry NVMe to Intel's efficient chip

https://www.jeffgeerling.com/blog/2025/mini-nases-marry-nvme-intels-efficient-chip
379•ingve•22h ago•185 comments

Build Systems à la Carte (2018) [pdf]

https://www.microsoft.com/en-us/research/wp-content/uploads/2018/03/build-systems.pdf
11•djoldman•3d ago•2 comments

The History of Electronic Music in 476 Tracks (1937–2001)

https://www.openculture.com/2025/06/the-history-of-electronic-music-in-476-tracks.html
65•bookofjoe•2d ago•19 comments

What I learned building an AI coding agent for a year

https://jamesgrugett.com/p/what-i-learned-building-an-ai-coding
7•vinhnx•2h ago•1 comments

What 'Project Hail Mary' teaches us about the PlanetScale vs. Neon debate

https://blog.alexoglou.com/posts/database-decisions/
10•konsalexee•2h ago•3 comments

Gecode is an open source C++ toolkit for developing constraint-based systems

https://www.gecode.org/
13•gjvc•5h ago•3 comments

N-Back – A Minimal, Adaptive Dual N-Back Game for Brain Training

https://n-back.net
47•gregzeng95•2d ago•12 comments

OBBB signed: Reinstates immediate expensing for U.S.-based R&D

https://www.kbkg.com/feature/house-passes-tax-bill-sending-to-president-for-signature
346•tareqak•13h ago•248 comments

Incapacitating Google Tag Manager (2022)

https://backlit.neocities.org/incapacitate-google-tag-manager
182•fsflover•19h ago•122 comments

EverQuest

https://www.filfre.net/2025/07/everquest/
234•dmazin•21h ago•123 comments

Why I left my tech job to work on chronic pain

https://sailhealth.substack.com/p/why-i-left-my-tech-job-to-work-on
333•glasscannon•1d ago•201 comments

Telli (YC F24) Is Hiring Engineers [On-Site Berlin]

https://hi.telli.com/join-us
1•sebselassie•6h ago

Go, PET, Let Hen - Curious adventures in (Commodore) BASIC tokenizing

https://www.masswerk.at/nowgobang/2025/go-pet-let-hen
10•masswerk•3h ago•1 comments

Baba Is Eval

https://fi-le.net/baba/
212•fi-le•2d ago•44 comments

ADXL345 (2024)

https://www.tinytransistors.net/2024/08/25/adxl345/
41•picture•11h ago•2 comments

Nvidia won, we all lost

https://blog.sebin-nyshkim.net/posts/nvidia-is-full-of-shit/
739•todsacerdoti•16h ago•385 comments

Scientists capture slow-motion earthquake in action

https://phys.org/news/2025-06-scientists-capture-motion-earthquake-action.html
19•PaulHoule•3d ago•0 comments

Problems the AI industry is not addressing adequately

https://www.thealgorithmicbridge.com/p/im-losing-all-trust-in-the-ai-industry
48•baylearn•3h ago•47 comments

In a milestone for Manhattan, a pair of coyotes has made Central Park their home

https://www.smithsonianmag.com/science-nature/in-a-milestone-for-manhattan-a-pair-of-coyotes-has-made-central-park-their-home-180986892/
153•sohkamyung•4d ago•148 comments

We're all CTO now

https://jamie.ideasasylum.com/2025/07/01/you%27re-all-cto-now
47•fside•4d ago•54 comments

Show HN: I AI-coded a tower defense game and documented the whole process

https://github.com/maciej-trebacz/tower-of-time-game
272•M4v3R•1d ago•138 comments

The story behind Caesar salad

https://www.nationalgeographic.com/travel/article/story-behind-caesar-salad
121•Bluestein•18h ago•75 comments

Wind Knitting Factory

https://www.merelkarhof.nl/work/wind-knitting-factory
246•bschne•1d ago•60 comments

Impact of PCIe 5.0 Bandwidth on GPU Content Creation and LLM Performance

https://www.pugetsystems.com/labs/articles/impact-of-pcie-5-0-bandwidth-on-gpu-content-creation-performance/
32•zdw•1d ago•15 comments

Writing a Game Boy Emulator in OCaml (2022)

https://linoscope.github.io/writing-a-game-boy-emulator-in-ocaml/
250•ibobev•1d ago•58 comments