frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

https://arxiv.org/abs/2603.03823
42•mpweiher•2h ago

Comments

verdverm•2h ago
Really long-term task benchmark showing significant improvements in very recent models, while also showing really bad regression rates across the board.
challengerVIE•1h ago
To me using agents daily, the long term vision with maintainability in mind really makes the difference between us humans and agents, I like the idea. However evaluating long term maintainability over an average of just 500 loc changes does not sound like long term maintainability being measured here
KronisLV•48m ago
> The benchmark comprises 100 tasks, each corresponding on average to an evolution history spanning 233 days and 71 consecutive commits in a real-world code repository.

This seems like a really cool thing to benchmark! Technically it'd be possible to take GitHub repos that the AI orgs probably already have, cross-reference the code against the issues and regressions, and train/validate on that.

The dataset would need to be way bigger to get close to the likes of SWE-bench: https://www.swebench.com/original.html

"Vibe coded stuff gets hard to maintain and will end up buggy." Yeah, so make models that deal with that better, optimize for maintainability and consistency.

Cool to see Claude doing decently though!

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

https://arxiv.org/abs/2603.03823
42•mpweiher•2h ago•3 comments

Cloud VM benchmarks 2026

https://devblog.ecuadors.net/cloud-vm-benchmarks-2026-performance-price-1i1m.html
241•dkechag•9h ago•107 comments

Show HN: Curiosity – DIY 6" Newtonian Reflector Telescope

https://curiosity-telescope.vercel.app/
23•big_Brain69•3h ago•4 comments

Warn about PyPy being unmaintained

https://github.com/astral-sh/uv/pull/17643
185•networked•8h ago•67 comments

Rijksmuseum researchers discover new painting by Rembrandt van Rijn

https://www.rijksmuseum.nl/en/press/press-releases/rijksmuseum-researchers-discover-new-painting-...
17•ohjeez•3d ago•0 comments

From RGB to L*a*b* color space (2024)

https://kaizoudou.com/from-rgb-to-lab-color-space/
37•kqr•4d ago•8 comments

How to run Qwen 3.5 locally

https://unsloth.ai/docs/models/qwen3.5
166•Curiositry•10h ago•47 comments

CasNum

https://github.com/0x0mer/CasNum
282•aebtebeten•13h ago•35 comments

Notes on Writing WASM

https://notes.brooklynzelenka.com/Blog/Notes-on-Writing-Wasm
4•vinhnx•57m ago•0 comments

MonoGame: A .NET framework for making cross-platform games

https://github.com/MonoGame/MonoGame
80•azhenley•8h ago•49 comments

A decade of Docker containers

https://cacm.acm.org/research/a-decade-of-docker-containers/
303•zacwest•17h ago•203 comments

Emacs internals: Deconstructing Lisp_Object in C (Part 2)

https://thecloudlet.github.io/blog/project/emacs-02/
72•thecloudlet•2d ago•3 comments

Dumping Lego NXT firmware off of an existing brick (2025)

https://arcanenibble.github.io/dumping-lego-nxt-firmware-off-of-an-existing-brick.html
206•theblazehen•2d ago•11 comments

Yoghurt delivery women combatting loneliness in Japan

https://www.bbc.com/travel/article/20260302-the-yoghurt-delivery-women-combatting-loneliness-in-j...
292•ranit•21h ago•158 comments

I'm Not Consulting an LLM

https://lr0.org/blog/p/gpt/
28•birdculture•1h ago•7 comments

Show HN: A weird thing that detects your pulse from the browser video

https://pulsefeedback.io/
83•kilroy123•3d ago•38 comments

Autoresearch: Agents researching on single-GPU nanochat training automatically

https://github.com/karpathy/autoresearch
120•simonpure•13h ago•33 comments

Ask HN: Why there are no actual studies that show AI is more productive?

20•make_it_sure•1h ago•19 comments

Best performance of a C++ singleton

https://andreasfertig.com/blog/2026/03/best-performance-of-a-cpp-singleton/
34•jandeboevrie•1d ago•21 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
9•surprisetalk•3d ago•1 comments

The surprising whimsy of the Time Zone Database

https://muddy.jprs.me/links/2026-03-06-the-surprising-whimsy-of-the-time-zone-database/
124•jprs•15h ago•36 comments

In 1985 Maxell built a bunch of life-size robots for its bad floppy ad

https://buttondown.com/suchbadtechads/archive/maxell-life-size-robots/
111•rfarley04•3d ago•13 comments

Ten years of deploying to production

https://brandonvin.github.io/2026/03/04/ten-years-of-deploying-to-production.html
31•mooreds•2d ago•4 comments

Sem – Semantic version control. Entity-level diffs on top of Git

https://github.com/ataraxy-labs/sem
7•pabs3•4h ago•0 comments

FLASH radiotherapy's bold approach to cancer treatment

https://spectrum.ieee.org/flash-radiotherapy
212•marc__1•18h ago•65 comments

macOS code injection for fun and no profit (2024)

https://mariozechner.at/posts/2024-07-20-macos-code-injection-fun/
95•jstrieb•3d ago•17 comments

Files are the interface humans and agents interact with

https://madalitso.me/notes/why-everyone-is-talking-about-filesystems/
224•malgamves•23h ago•121 comments

Lisp-style C++ template meta programming

https://github.com/mistivia/lmp
51•mistivia•12h ago•6 comments

SigNoz (YC W21) is hiring for engineering, growth and product roles

https://signoz.io/careers
1•pranay01•17h ago

To the Polypropylene Makers

https://www.lesswrong.com/posts/HQTueNS4mLaGy3BBL/here-s-to-the-polypropylene-makers
31•raldi•4h ago•4 comments