frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Open-Source Refact.ai Agent is #1 on SWE-bench Lite With a 59.7% Score

https://refact.ai/blog/2025/sota-on-swe-bench-lite-open-source-refact-ai/
3•kate_at_refact•1y ago

Comments

kate_at_refact•1y ago
Open-source Refact.ai achieves #1 on SWE-bench Lite with a 59.7% score. Our approach: fully autonomous Agent, no manual intervention needed.

How we did this:

• Prompt strategy: https://github.com/smallcloudai/refact/blob/swe-boosted-prom... • Claude 3.7 Sonnet as orchestrator • deep_analysis() tool (powered by o4-mini) for reasoning • Tool suite for repository exploration, code modification, and testing. Used dynamically based on task needs • One correct solution through iteration!

Autonomy = our core strength.

Refact.ai Agent completes the entire dev workflow independently: plans, executes, tests, self-corrects, and delivers a production-ready result. For each task, it made one multi-step run to generate a single correct solution, creating custom strategies rather than following rigid scripts.

You can read tech details on our SWE-bench approach: https://refact.ai/blog/2025/sota-on-swe-bench-lite-open-sour...

Your questions are welcome! Also, welcome to try Refact.ai Agent in VS Code and Jet Brains: https://linktr.ee/refactai

How AI Talks People Out of Conspiracy Theories–and What We Can Learn from That

https://www.wsj.com/tech/ai/ai-debunks-conspiracy-theories-92eff2c5
1•MilnerRoute•4m ago•0 comments

Honopinion

https://honopinion.com
1•mroshani20•9m ago•0 comments

We Built Secure, Scalable Agent Sandbox Infrastructure

https://twitter.com/larsencc/status/2027225210412470668
1•gmays•10m ago•0 comments

Mvm – a fast virtual machine for Go

https://mvm.sh/
1•birdculture•12m ago•0 comments

Teaching Codex to Test a Voice-First Calendar App

https://www.elicited.blog/posts/teaching-codex-to-test-a-voice-first-calendar
1•justanotheratom•14m ago•1 comments

What were your favorite classic iPod games?

1•wompapumpum•16m ago•0 comments

'What Matters Most'–Google Is Changing Your Gmail Inbox

https://www.forbes.com/sites/zakdoffman/2026/05/23/what-matters-most-google-is-changing-your-gmai...
1•healsdata•25m ago•0 comments

Lessons I Learned from Creating Searx

https://hister.org/posts/lessons-i-learned-from-creating-searx
1•xosc•27m ago•0 comments

How Google's Beta Tester Requirement Created a Fiverr Grey Market

https://danunparsed.com/p/googles-beta-tester-requirement
3•sambellll•35m ago•0 comments

The Black Hole Scientists Say Is Growing Too Fast

https://substack.com/profile/512907875-hamza-ashkar/note/c-264627457
2•hamzaashkar•36m ago•0 comments

Agent evals should feel like real work

https://www.zohaib.cc/blog/agent-evals
1•zed_labs_dev•56m ago•0 comments

Verifying a Caliptra Boot-FSM Bug with Mununu

https://marianocerrutti.substack.com/p/verifying-a-caliptra-boot-fsm-bug
1•hasheddan•56m ago•0 comments

The Densest (Urban) Environment in the World

https://oldurbanist.blogspot.com/2011/09/densest-urban-environment-in-world.html
3•Neuronaut•59m ago•1 comments

Poll: Test

1•sillysaurusx•1h ago•0 comments

The Green Side of the Lua

https://arxiv.org/abs/2601.16670
2•radiator•1h ago•0 comments

Star Citizen game has reached $1B in funding

https://robertsspaceindustries.com/en/funding-goals
7•speckx•1h ago•0 comments

Show HN: JavaScript Crossword – a crossword where the clue = eval(answer)

https://lyra.horse/fun/jscrossword/
1•rebane2001•1h ago•0 comments

No Asterisk Products Manifesto: hardware that works when the servers go down

https://noasteriskproducts.org/
2•brooklyntom•1h ago•0 comments

Built a small PR guardrail for token bloat, worth maintaining?

https://github.com/unloopedmido/contextlevy
1•nonlooped•1h ago•0 comments

Test

1•sillysaurusx•1h ago•0 comments

Cracked in under a minute: (nearly) every other password

https://www.kaspersky.com/blog/passwords-hacking-research-2026/55743/
1•gnabgib•1h ago•1 comments

The Enhanced Games: It's like the Olympics – except steroids are allowed

https://www.bbc.com/news/articles/cedpz1zqp8po
3•busymom0•1h ago•2 comments

Librarian: Tidy Up the Arcane Library

https://store.steampowered.com/app/4197610/Librarian_Tidy_Up_the_Arcane_Library/
2•doener•1h ago•0 comments

What Are Atoms Made Of?

https://johncarlosbaez.wordpress.com/2026/05/24/what-are-atoms-made-of/
2•mathgenius•1h ago•0 comments

Show HN: Tuie - A rich, performant TUI library for rust

https://github.com/jake-stewart/tuie
1•vim-god•1h ago•0 comments

TID: Linux kernelmodule–flushes CPU cache after wiping sensitive data CLFLUSHOPT

https://github.com/ahmaaaaadbntaaaaa-byte/TID-The-Instant-Destroyer
1•TID_Ahmad•1h ago•0 comments

Anthropic and OpenAI race to embed engineers inside Wall Street workflows

https://thenewstack.io/anthropic-openai-wall-street-ai-agents-developers/
1•dr_dshiv•1h ago•0 comments

What to know about the AI models that are jolting Washington

https://www.politico.com/news/2026/05/24/anthropic-openai-mythos-what-to-know-00934668
2•TMWNN•1h ago•1 comments

AI for Design Needs Solving

https://freedium-mirror.cfd/https://medium.com/@mini.1409/ai-for-design-needs-solving-db3f11af77d4
1•vinayak-shukla•1h ago•0 comments

AI in journalism: Live tracker of scandals and mistakes

https://pressgazette.co.uk/publishers/digital-journalism/ai-journalism-mistakes/
2•gnabgib•1h ago•0 comments