frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Neomacs: Rewriting the Emacs display engine in Rust with GPU rendering via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•1m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
1•ShinyaKoyano•6m ago•0 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
1•m00dy•7m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•8m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
1•okaywriting•14m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
1•todsacerdoti•17m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•18m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•19m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•20m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•20m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•21m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•21m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•25m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•25m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•26m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•26m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•35m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•35m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•37m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•37m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•37m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
5•pseudolus•38m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•38m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•39m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
2•1vuio0pswjnm7•40m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
3•obscurette•40m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
2•jackhalford•41m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•42m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
2•tangjiehao•44m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•45m ago•1 comments
Open in hackernews

Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python

https://github.com/SWE-agent/mini-swe-agent
7•lieret•6mo ago

Comments

lieret•6mo ago
In 2024, we developed SWE-bench and SWE-agent at Princeton University and helped kickstart the coding agent revolution.

Back then, LMs were optimized to be great at chatting, but not much else. This meant that agent scaffolds had to get very creative (and complicated) to make LMs perform useful work.

But in 2025, LMs are actively optimized for agentic coding, and we ask:

*What the simplest coding agent that could still score near SotA on the benchmarks?*

*Turns out, it just requires 100 lines of code!*

And this system still *resolves 65% of all GitHub issues in the SWE-bench verified benchmark* with Sonnet 4 (for comparison, when Anthropic launched Sonnet 4, they reported 70% with their own scaffold that was never made public).

Honestly, we're all pretty stunned ourselves—we've now spent more than a year developing SWE-agent, and would not have thought that such a small system could perform nearly as good.

I'll link to the project below (all open-source, of course). The hello world example is incredibly short & simple (and literally what gave us the 65%). But it is also meant as a serious command line tool + research project, so we provide a Claude-code style UI & some utilities on top of that.

We have some team members from Princeton/Stanford here today, let us know if you have any questions/feedback :)

Oras•6mo ago
Is there an option to learn from mistakes? most coding agents I tried, including the Sonnet 4 based one will make same mistake again and again in a new chat.

It would be great to have the agent adding a memory (even locally) to avoid mistakes, checking for new versions of libraries, and write a list of tasks first before the execution (similar to Kiro and Trae SOLO).

lieret•6mo ago
Sorry, I missed that!

That's a little bit out of the scope of this project (because we were aiming for the bare minimum of what is needed to get a performative agent — and unfortunately learning from mistake also isn't measured by most benchmarks as they require tasks to be solved independently).

However, you can always add "memory" to agents by asking them to write and read from a file in your repo (Claude.md, cursorrules etc.) You can also try to automate this process and have a mechanism by which the LM decides itself when to put something in them. Similar to how memories work in chatGPT. I think Cursor also recently started doing that.

> checking for new versions of libraries, and write a list of tasks first before the execution

Just add it to the prompt! That's not always desired behavior for a command line helper, but I think it shouldn't be too hard to get it to do that just by prompting alone.

scottyeager•6mo ago
This is so cool—thanks for publishing it!

I was just starting to study coding agent implementation, specifically with tool use. Seeing the insight on the README that `bash` is all a modern LLM needs to solve coding tasks was very interesting, since the trend seems to be solidly toward tools.

Being able to read the entire agent code nearly on a single screen is very instructive and inspiring to start hacking.

One thing I'm curious about is API calling efficiency. Did you happen to compare request count or token consumption of the mini agent versus full sized? Is that data available generally for the SWE-bench results?