frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Open-Source Refact.ai Agent is #1 on SWE-bench Lite With a 59.7% Score

https://refact.ai/blog/2025/sota-on-swe-bench-lite-open-source-refact-ai/
3•kate_at_refact•1y ago

Comments

kate_at_refact•1y ago
Open-source Refact.ai achieves #1 on SWE-bench Lite with a 59.7% score. Our approach: fully autonomous Agent, no manual intervention needed.

How we did this:

• Prompt strategy: https://github.com/smallcloudai/refact/blob/swe-boosted-prom... • Claude 3.7 Sonnet as orchestrator • deep_analysis() tool (powered by o4-mini) for reasoning • Tool suite for repository exploration, code modification, and testing. Used dynamically based on task needs • One correct solution through iteration!

Autonomy = our core strength.

Refact.ai Agent completes the entire dev workflow independently: plans, executes, tests, self-corrects, and delivers a production-ready result. For each task, it made one multi-step run to generate a single correct solution, creating custom strategies rather than following rigid scripts.

You can read tech details on our SWE-bench approach: https://refact.ai/blog/2025/sota-on-swe-bench-lite-open-sour...

Your questions are welcome! Also, welcome to try Refact.ai Agent in VS Code and Jet Brains: https://linktr.ee/refactai

As NASA eyes lunar base, there's still much to learn about landing on the Moon

https://arstechnica.com/space/2026/05/as-nasa-eyes-lunar-base-theres-still-much-learn-about-landi...
1•rbanffy•43s ago•0 comments

Useful Memories Become Faulty When Continuously Updated by LLMs

https://dylanzsz.github.io/faulty-memory/
1•gmays•1m ago•0 comments

Evolved antennas, LLM-generated code, and a potential antifuture

https://ericwbailey.website/published/evolved-antennas-llm-generated-code-and-a-potential-antifut...
1•mooreds•1m ago•0 comments

It's not just drivers who hate high gas prices. So do gas station owners

https://www.cnn.com/2026/05/11/business/gas-station-owners-pain
1•mooreds•1m ago•0 comments

BBC shelves Verify blog as it learns truth about who reads it

https://www.thetimes.com/uk/media/article/bbc-shelves-verify-live-blog-deborah-turness-5mhvkwl68
1•thinkingemote•1m ago•0 comments

Show HN: QuickAcuity – visual acuity screening from your phone

https://quickacuity.app
1•billtonium•1m ago•1 comments

California cities seek to bless polyamorous unions: it will get messy in court

https://www.latimes.com/california/story/2026-04-25/california-west-hollywood-polyamorous-union-laws
1•PaulHoule•2m ago•0 comments

Strategies to find free or low-cost food when money is tight

https://text.npr.org/nx-s1-5599147
1•mooreds•2m ago•0 comments

Chess puzzle I found in my dad's old book

https://ardoedo.it/kempelen/
2•Eswo•3m ago•0 comments

Running My Agents in a VPS

https://crowdhailer.me/2026-05-11/running-my-agents-in-a-vps/
1•speckx•4m ago•0 comments

Show HN: Glance – Local Git diff review TUI, ported from VS Code

https://github.com/polyphilz/glance
1•polyphilz•6m ago•0 comments

MCP server can modify tool list mid-session; client has no mechanism to detect

https://mcpfw.dev/paper
1•mwaseem_d•7m ago•1 comments

Google TIG reports first example of AI used offensively for zero-day vulns

https://cloud.google.com/blog/topics/threat-intelligence/ai-vulnerability-exploitation-initial-ac...
1•thoughtpeddler•7m ago•0 comments

CUDA-oxide: Nvidia's official Rust to CUDA compiler

https://nvlabs.github.io/cuda-oxide/index.html
1•adamnemecek•8m ago•0 comments

Students Boo Commencement Speaker After She Calls AI Next Industrial Revolution

https://www.404media.co/ucf-ai-commencement-speaker-booed/
4•cdrnsf•10m ago•0 comments

Stx: Interactive Vector Graphics Environment

https://ctx.graphics/
1•volemo•10m ago•0 comments

What Zed IDE shipped in 10 days since 1.0

https://medium.com/@arthurpro/what-zed-shipped-in-the-first-ten-days-after-1-0-7e51f87b3f9f
1•arthurpro•10m ago•0 comments

Using Worktrees

https://www.natemeyvis.com/on-using-worktrees/
2•Brajeshwar•11m ago•0 comments

LockedCode – Security-hardened OpenCode fork

https://lockedcode.ai/
1•aallard•11m ago•0 comments

Show HN: Tokémon – a Pokédex for LLMs that got out of hand

https://tokemonlabs.com
2•isjustintime•13m ago•3 comments

How are we going to get out of Meta? (Or social media in general)

https://hugoib.beehiiv.com/p/how-are-we-going-to-get-out-of-meta-or-social-media-in-general
3•hugoib•13m ago•0 comments

Julia: Achieving C++ Speed in High-Level Code

https://thecodersblog.com/julia-language-performance-benchmarks-2026/
1•t0mpr1c3•14m ago•1 comments

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

https://github.com/uw-syfi/vibe-serve
1•matt_d•14m ago•0 comments

A quick tour of some of the datacenters in the Salt Lake Valley

https://discuss.systems/@ricci/116556373280824179
1•eatonphil•14m ago•0 comments

Show HN: Learn how AI benchmarks cheat

https://agent-benchmarks.com/
1•adamgold7•15m ago•0 comments

Coding might go the way of woodworking

3•mdgrech23•16m ago•0 comments

Dive into Deep Learning

https://d2l.ai/
2•moohaad•17m ago•0 comments

Technologies vs. Commodities

https://www.construction-physics.com/p/on-technologies-vs-commodities
1•surprisetalk•18m ago•0 comments

MOQ is lacking a compelling adoption reason

https://bloggeek.me/moq-adoption-problem/
2•dabinat•18m ago•0 comments

RAF Airdrop Response to Suspected Hantavirus Case on Remote Island

https://theaviationist.com/2026/05/10/raf-airdrop-response-suspected-hantavirus-case-on-remote-is...
1•speckx•19m ago•0 comments