frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Listen to Mixtapes from Before

https://intertapes.net/
1•poniko•1m ago•0 comments

My First Impressions of MeshCore Off-Grid Messaging

https://mtlynch.io/first-impressions-of-meshcore/
1•mtlynch•3m ago•0 comments

I built a tool to restore old family photos without ruining them with AI

https://forevi.ai
1•poznerd•3m ago•1 comments

Designing Electronics That Works

https://nostarch.com/designingelectronics
1•0x54MUR41•3m ago•0 comments

Most LLM cost isn't compute – it's identity drift (110-cycle GPT-4o benchmark)

https://github.com/sigmastratum/documentation/blob/main/sigma-runtime/SR-EI-03/benchmark_report_S...
1•teugent•4m ago•1 comments

Show HN: PlanEat AI, an AI iOS app for weekly meal plans and smart grocery lists

1•franklinm1715•4m ago•0 comments

A Post-Incident Control Test for External AI Representation

https://zenodo.org/records/17921051
1•businessmate•5m ago•1 comments

اdifference gbps overview find answers

1•shahrtjany•6m ago•0 comments

Measuring Impact of Early-2025 AI on Experienced Open-Source Dev Productivity

https://arxiv.org/abs/2507.09089
1•vismit2000•7m ago•0 comments

Show HN: Lazy Demos

http://demoscope.app/lazy
1•admtal•8m ago•0 comments

AI-Driven Facial Recognition Leads to Innocent Man's Arrest (Bodycam Footage) [video]

https://www.youtube.com/watch?v=B9M4F_U1eEw
2•niczem•9m ago•1 comments

Annual Production of 1/72 (22mm) scale plastic soldiers, 1958-2025

https://plasticsoldierreview.com/ShowFeature.aspx?id=27
1•YeGoblynQueenne•10m ago•0 comments

Error-Handling and Locality

https://www.natemeyvis.com/error-handling-and-locality/
1•Theaetetus•11m ago•0 comments

Petition for David Sacks to Self-Deport

https://form.jotform.com/253464131055147
1•resters•11m ago•0 comments

Get found where people search today

https://kleonotus.com/
1•makenotesfast•14m ago•1 comments

Show HN: An early-warning system for SaaS churn (not another dashboard)

https://firstdistro.com
1•Jide_Lambo•14m ago•1 comments

Tell HN: Musk has never *tweeted* a guess for real identity of Satoshi Nakamoto

1•tokenmemory•15m ago•2 comments

A Practical Approach to Verifying Code at Scale

https://alignment.openai.com/scaling-code-verification/
1•gmays•17m ago•0 comments

Show HN: macOS tool to restore window layouts

https://github.com/zembutsu/tsubame
1•zembutsu•19m ago•0 comments

30 Years of <Br> Tags

https://www.artmann.co/articles/30-years-of-br-tags
2•FragrantRiver•26m ago•0 comments

Kyoto

https://github.com/stevepeak/kyoto
2•handfuloflight•27m ago•0 comments

Decision Support System for Wind Farm Maintenance Using Robotic Agents

https://www.mdpi.com/2571-5577/8/6/190
1•PaulHoule•27m ago•0 comments

Show HN: X-AnyLabeling – An open-source multimodal annotation ecosystem for CV

https://github.com/CVHub520/X-AnyLabeling
1•CVHub520•30m ago•0 comments

Penpot Docker Extension

https://www.ajeetraina.com/introducing-the-penpot-docker-extension-one-click-deployment-for-self-...
1•rainasajeet•31m ago•0 comments

Company Thinks It Can Power AI Data Centers with Supersonic Jet Engines

https://www.extremetech.com/science/this-company-thinks-it-can-power-ai-data-centers-with-superso...
1•vanburen•34m ago•0 comments

If AIs can feel pain, what is our responsibility towards them?

https://aeon.co/essays/if-ais-can-feel-pain-what-is-our-responsibility-towards-them
3•rwmj•38m ago•5 comments

Elon Musk's xAI Sues Apple and OpenAI over App Store Drama

https://mashable.com/article/elon-musk-xai-lawsuit-apple-openai
1•paulatreides•41m ago•1 comments

Ask HN: Build it yourself SWE blogs?

1•bawis•41m ago•1 comments

Original Apollo 11 Guidance Computer source code

https://github.com/chrislgarry/Apollo-11
3•Fiveplus•47m ago•0 comments

How Did the CIA Lose Nuclear Device?

https://www.nytimes.com/interactive/2025/12/13/world/asia/cia-nuclear-device-himalayas-nanda-devi...
1•Wonnk13•47m ago•1 comments
Open in hackernews

New #1 open-source AI Agent on SWE-bench Verified

https://refact.ai/blog/2025/open-source-sota-on-swe-bench-verified-refact-ai/
28•laxyz•6mo ago

Comments

laxyz•6mo ago
The full pipeline used for SWE-bench Verified is open-source: https://github.com/smallcloudai/refact-bench
amarcheschi•6mo ago
I think the title doesn't make it clear that the results are obtained with closed models
nateburke•6mo ago
Am I correct in understanding that SWE-bench is limited to python?
babushkaboi•6mo ago
yeah, they're all python at the moment.
simonw•6mo ago
The core benchmark is only Python, but there is also SWE-bench Multimodal which uses JavaScript: https://arxiv.org/abs/2410.03859

And the new SWE-bench Multilingual (released a couple of weeks ago) which covers 9 programming languages - C, C++, Go, Java, JavaScript, TypeScript, PHP, Ruby and Rust: https://www.swebench.com/multilingual.html

brrrrrm•6mo ago
Open-source use of closed source models?
NicuCalcea•6mo ago
Looks like they support self-hosted models: https://docs.refact.ai/supported-models/#self-hosted-version
MukundMohanK•6mo ago
Between last April and now, swe-bench scores have gone up from 25%-70%.

Sure, they're being overfitted to the dataset. But with most performing similarly across even the hardest of 3rd party benchmarks, think frontier math back in Nov and now, we're closer than ever to a specialisation shift.

Hard to say at what % but once code reviews get better its likely 2025 is the last year SWE is a sought after job * demand and supply both

candiddevmike•6mo ago
SWE bench scores, like a lot of other metrics for LLMs, are pretty divorced from reality IMO. It's a lot like only learning to pass tests vs actual understanding.

Once GenAI companies stop hiring SWEs, I'll believe the doomers.

MukundMohanK•6mo ago
Reality is here whether we like it or not - https://fred.stlouisfed.org/graph/?g=1DEP0
hackeman300•6mo ago
Surely there are no other macroeconomic factors that could have played a role in this decline too
harshitaneja•6mo ago
I help hire for a few clients as well as for my own small organization. We are already seeing impact of these tools on our hiring. For the same responsibilities and tasks we are already requiring lesser resources. For clients with less complex problems we are able to manage similar work with 60% of the resources planned. And that's when most of our work is mathematical modelling, heuristics, constraint programming and such. However, I don't foresee at least for the next few years we would ever get to a scenario where we don't hire developers. Given that most hiring has shifted to only senior developers.
dingnuts•6mo ago
being able to do more things with fewer resources (which lowers costs) always increases demand enough to make up for the reduction of labor caused by the automation

Analogy: when the chainsaw was invented, we didn't stop having lumberjacks, they just learned to use chainsaws

grammarxcore•6mo ago
> Many samples have an issue description that is underspecified, leading to ambiguity on what the problem is and how it should be solved.

OpenAI apparently tuned _basic discovery and refinement_ out of the tests so I don’t think this is a benchmark of anything useful. It can’t replace a human but can possibly make a human more productive.

https://openai.com/index/introducing-swe-bench-verified/

predkambrij•6mo ago
I would like to know why this post got flagged. Is it misleading, or dangerous software? If it's truly #1 open-source on SWE-bench that's quite impressive.