news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

SWE-Bench Pro

https://github.com/scaleapi/SWE-bench_Pro-os

41•tosh•1h ago

Comments

siliconc0w•1h ago

Looks like the associated article is: https://scale.com/research/swe_bench_pro (link in the repo is wrong)

gpt5•1h ago

Slightly tangent question - they said that they have protected the public test set with a strong copyleft license to prevent training private models on them.

Does it actually work? Isn’t AI training so far simply ignores all license and copyright restrictions completely?

ej88•56m ago

https://scale.com/leaderboard/swe_bench_pro_commercial

I definitely trust the totally private dataset more.

stephendause•52m ago

This is a key question in my opinion. It's one of the things that make benchmarking the SWE capabilities of LLMs difficult. It's usually impossible to know whether the LLM has seen a problem before, and coming up with new, representative problem sets is time-consuming.

CuriouslyC•24m ago

You can just fuzz names and switch to a whitespace compact representation.

stri8ed•47m ago

Not a chance. Even if American companies did abide by it, there is no reason Chinese companies would. And good luck definitely proving that a model trained on it.

candiddevmike•43m ago

Sir, we've already ingested 503,377 copyleft licensed codebases, I don't think the training set can take anymore!

WhitneyLand•39m ago

Recently it was pointed out that models were sometimes finding SWE-Bench verified cheats by scanning parts of the repo not meant to be visible.

Hope they’re addressing that at the same time.

hereme888•12m ago

Is it possible to benchmark the GPT-5-Pro model?

Demystifying Agentic Memory

https://alexspyropoulos.com/posts/demystifying-agentic-memory/

1•alexspyr•42s ago•0 comments

How I Vibe Coding? (Sept 2025 Edition)

https://xuanwo.io/2025/06-how-i-vibe-coding-sept-2025-edition/

1•xuanwo•1m ago•0 comments

Model literals, semantic aliases, and preference-aligned routing for LLMs

https://docs.archgw.com/guides/llm_router.html

1•honorable_coder•2m ago•1 comments

Market design can feed the poor

https://worksinprogress.co/issue/how-market-design-can-feed-the-poor/

1•zdw•2m ago•1 comments

Automate User Interviews with AI

https://theproductfeedbackcompany.com/

1•bobcoi•3m ago•1 comments

AMD Ryzen AI Max+ "Strix Halo" Performance with ROCm 7.0

https://www.phoronix.com/review/amd-rocm-7-strix-halo

1•rbanffy•4m ago•0 comments

How Samin Nosrat Learned to Love the Recipe

https://www.newyorker.com/culture/persons-of-interest/how-samin-nosrat-learned-to-love-the-recipe

1•mitchbob•4m ago•1 comments

Canon updates a PowerShot with higher price and fewer features

https://m.dpreview.com/news/9212403257/canon-powershot-360-hs-a-announcement

2•PaulHoule•4m ago•0 comments

Show HN: Technical Interview for an Open Source Team (Grove Engineering)

https://github.com/orgs/buildwithgrove/discussions/456

1•Olshansky•5m ago•0 comments

Europe's cookie law messed up the internet. Brussels wants to fix it

https://www.politico.eu/article/europe-cookie-law-messed-up-the-internet-brussels-sets-out-to-fix...

2•c420•7m ago•2 comments

Ask HN: Which was the first 32bit DOS game?

1•DrNosferatu•7m ago•0 comments

Ready or not, the digital afterlife is here

https://www.nature.com/articles/d41586-025-02940-w

1•XzetaU8•7m ago•0 comments

Open-source security analysis with Gemini CLI

https://github.com/gemini-cli-extensions/security

1•evanotero•7m ago•0 comments

Aldrin Cycler Burials

1•Xorakios•9m ago•0 comments

Legal Lullabies: Narrations of tech giants' terms of service

https://www.zzzuckerberg.com/

1•gaws•9m ago•0 comments

Effect Systems vs. Print Debugging: A Pragmatic Solution

https://blog.flix.dev/blog/effect-systems-vs-print-debugging/

1•degurechaff•11m ago•0 comments

Tri Dao on Unsupervised Learning Podcast

https://youtu.be/xlSaoP0b90A?si=_HsLmZ3Vy1M37tdX

1•mdunnoconnor•11m ago•0 comments

Saga Distributed Transactions Pattern

https://learn.microsoft.com/en-us/azure/architecture/patterns/saga

1•mooreds•11m ago•0 comments

Elizabeth Stone on what's next for Netflix – and streaming itself

https://techcrunch.com/2025/09/22/elizabeth-stone-on-whats-next-for-netflix-and-streaming-itself-...

1•mathattack•11m ago•0 comments

Qwen3-Omni – the first natively AI unifying text, image, audio and video

https://twitter.com/Alibaba_Qwen/status/1970181599133344172

1•amrrs•12m ago•0 comments

OpenAI to launch ChatGPT for teens with parental controls

https://www.cnbc.com/2025/09/16/openai-chatgpt-teens-parent.html

1•gmays•13m ago•0 comments

Optical Chip Beats Counterparts in AI Power Efficiency 100 Fold

https://www.allaboutcircuits.com/news/optical-chip-beats-counterparts-in-ai-power-efficiency-100-...

1•giuliomagnifico•14m ago•0 comments

Ask HN: How much of your code is AI writing?

1•bmau5•14m ago•1 comments

Qwen3-Omni: Native Omni AI Model for Text, Image & Video

https://github.com/QwenLM/Qwen3-Omni

2•meetpateltech•15m ago•0 comments

AI Image Animator: Animate Any Image into Video Online

https://aiimageanimator.net/

2•wukongfine•15m ago•0 comments

Crypto theft booms to a record amid kidnappings, Bybit hack

https://www.cnbc.com/2025/07/17/crypto-theft-hits-record-in-2025.html

1•paulpauper•15m ago•0 comments

U.S. Seniors Lost More Money to Scammers in 2024 Than You Think

https://www.vice.com/en/article/u-s-seniors-lost-more-money-to-scammers-in-2024-than-you-think/

3•paulpauper•16m ago•0 comments

Qwen3-Omni

https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe

1•speedyboi•16m ago•0 comments

Qwen-Image-Edit-2509

https://huggingface.co/Qwen/Qwen-Image-Edit-2509

2•speedyboi•17m ago•0 comments

Topological fingerprints for audio identification (2023)

https://arxiv.org/abs/2309.03516

1•wslh•17m ago•0 comments