frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

https://arxiv.org/abs/2603.03823
36•mpweiher•1h ago

Comments

verdverm•1h ago
Really long-term task benchmark showing significant improvements in very recent models, while also showing really bad regression rates across the board.
challengerVIE•56m ago
To me using agents daily, the long term vision with maintainability in mind really makes the difference between us humans and agents, I like the idea. However evaluating long term maintainability over an average of just 500 loc changes does not sound like long term maintainability being measured here
KronisLV•30m ago
> The benchmark comprises 100 tasks, each corresponding on average to an evolution history spanning 233 days and 71 consecutive commits in a real-world code repository.

This seems like a really cool thing to benchmark! Technically it'd be possible to take GitHub repos that the AI orgs probably already have, cross-reference the code against the issues and regressions, and train/validate on that.

The dataset would need to be way bigger to get close to the likes of SWE-bench: https://www.swebench.com/original.html

"Vibe coded stuff gets hard to maintain and will end up buggy." Yeah, so make models that deal with that better, optimize for maintainability and consistency.

Cool to see Claude doing decently though!

Show HN: Conflicts.app, Iran conflict dashboard better then alternatives

https://www.conflicts.app/dashboard
1•juliusolsson•2m ago•0 comments

Show HN: J2Download – A simple online downloader supporting 40 platforms

https://j2download.com/
1•manhg•3m ago•0 comments

Bippy: React Internals Toolkit

https://www.bippy.dev/
1•handfuloflight•3m ago•0 comments

The Window Chrome of Our Discontent

https://pxlnv.com/blog/window-chrome-of-our-discontent/
1•SoKamil•7m ago•0 comments

How I've learned that certainty is the thing to fear

https://www.bbc.com/news/articles/c1w5z1d447lo
1•cmsefton•7m ago•0 comments

Show HN: Muffle – Blur everything except the active window in macOS

https://www.getmuffle.com/
1•AbjMV•9m ago•1 comments

I was "early" in agentic coding. Here's my story

2•noemit•15m ago•0 comments

Show HN: Drizby – WIP Metabase Alternative

https://www.drizby.com
1•cliftonc•17m ago•0 comments

The First Multi-Behavior Brain Upload

https://twitter.com/alexwg/status/2030217301929132323
1•DarkCow•17m ago•0 comments

Anthropic CEO reveals the reasons he rejected The Pentagon

https://xcancel.com/0xmitsurii/status/2030451168678457766
4•doener•17m ago•0 comments

Show HN: Stardial – a highly customizable terminal clock (Rust)

https://github.com/hisuic/stardial
2•firesushi•19m ago•0 comments

Emporion: A P2P Economy for Agents

https://github.com/garydevenay/emporion
1•garydevenay•19m ago•0 comments

Microsoft/Hve-Core

https://github.com/microsoft/hve-core
2•coderlens•19m ago•0 comments

Solving Compaction with Lobotomy

https://grimridge.net/blog/solving-compaction-with-lobotomy/
1•WadeGrimridge•21m ago•0 comments

Pushing and pulling: three reactivity algorithms

https://jonathan-frere.com/posts/reactivity-algorithms/
1•fanf2•22m ago•0 comments

Reverse engineering a DOS game with no source code using Codex 5.4

https://github.com/ammaarreshi/SkyRoads-Codex
1•smusamashah•23m ago•1 comments

Show HN: OpenClaw – Self-host OpenClaw in one command

1•congzhangzh•29m ago•0 comments

Money and collateral in an AI-first society

https://adlrocha.substack.com/p/adlrocha-money-and-collateral-in
1•adlrocha•32m ago•0 comments

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

1•albert_e•34m ago•1 comments

Ask HN: How are you handling persistent memory across local Ollama sessions

1•null-phnix•35m ago•0 comments

Show HN: Spadyum – An Open-Source Civilization Backup Protocol

https://github.com/kivancadiguzel-design/Spadyum-Genesis/blob/main/README.md
1•Spadyum_Genesis•35m ago•0 comments

Julia Snail – An Emacs Development Environment for Julia Like Clojure's Cider

https://github.com/gcv/julia-snail
1•TheWiggles•37m ago•0 comments

Notes on Writing WASM

https://notes.brooklynzelenka.com/Blog/Notes-on-Writing-Wasm
3•vinhnx•39m ago•0 comments

Making Firefox's right-click not suck, more, with userChrome.css

https://joshua.hu/firefox-making-right-click-not-suck-even-more-with-userchrome
3•mmsc•41m ago•1 comments

Run prompts on a schedule with Claude Code

https://code.claude.com/docs/en/scheduled-tasks
1•blacktulip•41m ago•0 comments

Show HN: Open-source self-hosted Intercom and CCTV platform

https://github.com/rosteleset/SmartYard-Server
2•sbca68•43m ago•0 comments

Show HN: Self-Evolving Skill – empirical results from a 5-round experiment

https://github.com/191341025/Self-Evolving-Skill
1•tiansenxu•48m ago•0 comments

What Is AI Reading?

https://generativepulse.ai/report/
1•doener•48m ago•0 comments

Rcarmo/piclaw: An all-in one agent environment with a mobile-first web UI

https://github.com/rcarmo/piclaw
1•rcarmo•53m ago•0 comments

Show HN: Termix – One dashboard for all your AI coding agents

https://github.com/rustykuntz/termix
2•rustykuntz•53m ago•1 comments