frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Astronauts on ISS told to shelter as repairs under way to fix air leaks

https://www.bbc.com/news/live/c4g44ew3g1kt
109•janpot•1h ago•57 comments

Mouseless – keyboard-driven control of macOS/Linux/Windows

https://mouseless.click
227•riddley•2d ago•111 comments

Stop Using Conventional Commits

https://sumnerevans.com/posts/software-engineering/stop-using-conventional-commits/
24•jsve•26m ago•4 comments

Cooldown Support for Ruby Bundler

https://blog.rubygems.org/2026/06/03/cooldown-let-new-gems-be-vetted.html
59•calyhre•2d ago•12 comments

Tracing a powerful GNSS interference source over Europe

https://arxiv.org/abs/2606.03673
258•mimorigasaka•7h ago•113 comments

Redis 8.8: New array data structure, rate limiter, performance improvements

https://redis.io/blog/announcing-redis-8-8/
130•ksec•2d ago•59 comments

I tested every IP KVM in my Homelab

https://www.jeffgeerling.com/blog/2026/i-tested-every-ip-kvm/
34•vquemener•1h ago•6 comments

Entanglement Builds Space-Time. Now "Magic" Gives It Gravity

https://www.quantamagazine.org/entanglement-builds-space-time-now-magic-gives-it-gravity-20260603/
114•rbanffy•7h ago•102 comments

Changing how we develop Ladybird

https://ladybird.org/posts/changing-how-we-develop-ladybird/
650•EdwinHoksberg•8h ago•435 comments

C++: The Documentary

https://herbsutter.com/2026/06/04/c-the-documentary-released-today/
280•ingve•11h ago•194 comments

Dutch gov't will only allow European company to operate DigiD platform

https://nltimes.nl/2026/06/05/dutch-govt-will-allow-european-company-operate-digid-platform
46•TechTechTech•1h ago•8 comments

Fine-tuning an LLM to write docs like it's 1995

https://passo.uno/fine-tuning-docs-llm/
150•taubek•10h ago•52 comments

databow: a Rust CLI to query any database with an ADBC driver

https://columnar.tech/blog/introducing-databow//
96•hckshr•2d ago•19 comments

Nango (YC W23, dev infra) is hiring staff back end engineers

https://nango.dev/careers
1•bastienbeurier•4h ago

ESP32 Bit Pirate, a Hardware Hacking Tool with WebCLI That Speaks Every Protocol

https://github.com/geo-tp/ESP32-Bit-Pirate
112•geotp•8h ago•36 comments

Lee Kuan Yew's Singapore Story (2023)

https://www.historytoday.com/archive/feature/lee-kuan-yews-singapore-story
105•pepys•8h ago•91 comments

Meta enables ADB on deprecated Portal devices [video]

https://fb.watch/HxPu0fSyeH/
275•jenders•15h ago•107 comments

Azure Linux 4.0 is Microsoft's first general-purpose Linux

https://www.boxofcables.dev/azure-linux-4-0-is-microsofts-first-general-purpose-linux/
145•haydenbarnes•12h ago•118 comments

Leap in DNA synthesis slashes time to build new genetic sequences

https://spectrum.ieee.org/faster-dna-synthesis-sidewinder
90•natalcleft•22h ago•18 comments

Anthropic's open-source framework for AI-powered vulnerability discovery

https://github.com/anthropics/defending-code-reference-harness
488•binyu•19h ago•137 comments

Ask HN: What is your (AI) dev tech stack / workflow? (June 2026)

15•dv35z•52m ago•17 comments

At the Autograph Show

https://oldster.substack.com/p/at-the-autograph-show
29•NaOH•2d ago•2 comments

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

https://github.com/zdk/lowfat
52•zdkaster•6h ago•36 comments

I'm skeptical about efforts to revolutionize schooling

https://www.scotthyoung.com/blog/2026/05/27/revolutionize-schooling/
263•andrewstuart•2d ago•417 comments

The IsUpMap lets you check the status of over 100 major sites at once

https://isupmap.com/
106•mikelgan•11h ago•37 comments

Open Code Review – An AI-powered code review CLI tool

https://github.com/alibaba/open-code-review
234•geoffbp•16h ago•66 comments

Do transformers need three projections? Systematic study of QKV variants

https://arxiv.org/abs/2606.04032
201•Anon84•16h ago•36 comments

Leak Reveals Microsoft Wants Its AI to Be 'Addictive'

https://kotaku.com/microsoft-ai-scout-addictive-satya-nadella-404-media-copilot-2000702924
11•thm•33m ago•0 comments

Programmers will document for Claude, but not for each other

https://blog.plover.com/2026/03/09/#documentation-wins-2
109•surprisetalk•3h ago•105 comments

Communication on European Tech Sovereignty, and an EU Open-Source Strategy

https://digital-strategy.ec.europa.eu/en/library/communication-european-tech-sovereignty-accompan...
83•jrepinc•5h ago•56 comments
Open in hackernews

Show HN: I benchmarked LLM agents on fixing real-world security vulnerabilities

https://giovannigatti.github.io/cve-bench/
4•ggattip•8h ago
I built a benchmark with 20 real CVEs across 18 Python projects (Pillow, GitPython, yt-dlp, urllib3, etc). I've run it over 5 LLM agents (3 OpenAI, 2 poolside) and 3 different prompts (full advisory, locate, diagnose) with a total of 300 runs. The agents are tasked to fix security vulnerabilities in a sandboxed environment and they are scored against a hidden security tests from the maintainer's own fix.

Best solve rate was 50%. On the other 50%, some fixes are sometimes coherent and pass all regression tests, but vulnerability still present.

The main differentiator I found between models is cost: gpt-5.5 at 12× more expensive than gpt-5.4-mini while producing statistically similar results. Within-family performance gaps are small, which points out the difference is likely due to model training data. I also did a power analysis and the task count needed to detect a meaningful within-family edge at ~700.

Full write-up: https://giovannigatti.github.io/cve-bench

Code: https://github.com/GiovanniGatti/cve-bench

Comments

KyleTheDev•1h ago
"The goal isn’t to rank models, but to understand how they fail."

The goal isn't to write an informative blog post describing what you learned, but to generate slop and expect other folks to read it.

I really wish people would stop doing this. I love reading about your side projects and all of the cool things you're doing. But, it just feels insulting to open up something that's so obviously completely AI generated. If you aren't willing to write it in your own voice, why would it be worth reading?

sdsdffsddfs•52m ago
You know the meme where a concise sentence is translated by an LLM into a loquacious formal email which is then again summarized to a concise statement by another LLM on the receiving end?

I believe that's what we need to do here. People have some interesting information to share, but they don't care about penmanship and that's not just being lazy. It takes a lot of time to produce a nice post. I cannot guarantee the author used an LLM but there sure is a suspicious amount of em-dashes.

Anyway, there are still some interesting data points so I'd recommend to run the website through an LLM to get a nice summary if the prominent TL;DR is too short for you. Times are a-changing.

KyleTheDev•41m ago
I agree somewhat. My issue is primarily that, without the author actually penning the post themselves, we have little to no evidence that they've actually done anything. Maybe the data is all AI generated or hallucinated, maybe the validations weren't thorough. I could determine all of these things myself, via rigorous review of the blog post. But at that point, I'm just doing the research myself, of what use is the post?

For work communications, I agree with you. There's an inherent accountability there. If you send me AI slop, and something goes terribly wrong, you'll be held accountable for the slop. Here, the slop is just noise that prevents us from finding the truly interesting posts.

sdsdffsddfs•3m ago
It's a very interesting issue you raise here. Notice that even if he typed it all out himself we wouldn't be any wiser. Literally nothing would have changed. Using the "I can verify that he performed a lot of work" as a quality signal always was a questionable - albeit understandable - choice but in the LLM-age it's useless.

<super_weird_rant>

I don't think I like it, but I think we are heading towards a situation where all information is filtered, reviewed and validated before it even becomes available to you. We need to do a lot of work to define what "reviewed" and "validated" mean here, but I don't see many ways around it. This would, however, require a vast attitude shift whereby we have some way of proclaiming "facts" and "arguments" tied to "proofs" that can be automated in some fashion, not just for code, but for all communication in general.

Stuff like "X is true in 50% of cases" need to be automatically and transparently tied to some part of a system that supports your claim which itself can be tied to some greater system, etc. If we have UIs that support this cleanly we can inspect the veracity of claims ourselves as so far the validation is feasible/practical/economical. Perhaps some sort of "this claim is true under X,Y conditions"-fingerprint made by some trusted VerificationAgency, a chain of trust so to speak, like our certificate systems. Or perhaps a P2P network of open-source "ClaimVerifiers". If everything is by default written with verification in mind, not just code, but literally everything that needs to be correct, I think that would be quite interesting.

OK, this is super weird so I'll let myself out now.

</super_weird_rant>