frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: We let GPT OSS 120B write and run Python and ARC AGI 2 jumped 4x

https://github.com/gutfeeling/arc-agi-2-submission
1•steinsgate•1h ago
Hi HN,

We are a team of independent researchers from Germany working on ARC AGI 2 since last summer. The general opinion on open-weight models is that they are too weak for this fairly difficult benchmark and score at near noise levels. We found that GPT OSS 120B is actually much more capable than previously thought, once the interleaved thinking regime is stabilized. We basically let the model use a stateful IPython based REPL via function calling and patched vLLM so that the model can reliably do interleaved thinking. The score jumped more than 4x.

Technical write-up: https://pivotools.github.io/posts/agentic_coding_arc_agi/ Code: https://github.com/gutfeeling/arc-agi-2-submission Data: https://huggingface.co/datasets/arcagi2/arcagi2-agentic-codi...

For safety, we support sandboxed execution using IPyBox (local Docker) and Daytona (cloud), so others can reproduce this without running untrusted code locally.

It gets more interesting: the effect seems to be general and translates seamlessly to other models without even changing prompts. We are not sure why agentic coding is so powerful in ARC AGI 2, which isn't traditionally thought of as an agentic benchmark. Perhaps code execution provides a stronger form of verification than COT, or perhaps it encourages a qualitatively different form of thinking.

We will be around for a while and would be happy to hear ideas / feedback and discuss infra issues / interleaved thinking / GPT OSS / ARC AGI 2.

Show HN: Awesome-Epstein-Files Catalogue

https://github.com/AGIBuilder/awesome-epstein-files
1•castalian•1m ago•0 comments

Show HN: YourSitee – a privacy-first link-in-bio (public beta)

1•czeizel•1m ago•0 comments

A local task state manager for your projects. Designed for humans and LLMs

https://github.com/BorisMolch/cli_todo
1•BorisMolch•2m ago•0 comments

Optimizing Quality vs. Latency in Real-Time Text-to-Speech AI Models

https://gradium.ai/blog/optimizing-quality-vs-latency
1•pain_perdu•4m ago•0 comments

YwinCap: Technical deconstruction of SEO-driven authority fraud

1•ReviewShield•8m ago•0 comments

Game Theory #7: America's Game [video]

https://www.youtube.com/watch?v=ijnkCt1QK6k
1•keepamovin•8m ago•0 comments

Show HN: I built a Chrome extension that finds edges on Polymarket

https://polypredict.ai/
1•Jilong121•10m ago•1 comments

Pax: The Cache Performance You're Looking For

https://mydbanotebook.org/posts/pax-the-cache-performance-youre-looking-for/
1•todsacerdoti•11m ago•0 comments

Why differential privacy is awesome

https://desfontain.es/blog/differential-privacy-awesomeness.html
1•mpcsb•12m ago•0 comments

Redefining GAN power devices for adoption in EVs and data centres

https://iisc.ac.in/events/redefining-gan-power-devices-for-adoption-in-evs-and-data-centres/
1•porridgeraisin•12m ago•0 comments

Do not apologize for replying late to my email

https://ploum.net/2026-02-11-do_not_apologize_for_replying_to_my_email.html
1•validatori•13m ago•0 comments

GLP-1

https://glp-1.com
2•bellamoon544•13m ago•1 comments

What Is Claude? Anthropic Doesn't Know, Either

https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either
1•bichonnages•15m ago•0 comments

Show HN: Hacker News for Songs

https://www.sonusly.com/
2•lorenzosch•17m ago•1 comments

The five stages of losing our craft

https://debuggingleadership.com/blog/the-five-stages-of-losing-our-craft
1•fpereiro•18m ago•0 comments

OpenMOQ Software Consortium – Advancing MOQ Protocol

https://openmoq.org/
1•mondainx•18m ago•0 comments

Alphabet sells rare 100-year bond

https://www.reuters.com/business/alphabet-sells-bonds-worth-20-billion-fund-ai-spending-2026-02-10/
4•kaycebasques•20m ago•0 comments

Flood Fill vs. The Magic Circle

https://www.robinsloan.com/winter-garden/magic-circle/
1•tobr•20m ago•0 comments

Ask HN: What conventions exist for declaring AI content online?

1•lukakopajtic•23m ago•0 comments

Show HN: Seedance.fast – Early Access to ByteDance's Seedance 2.0 via Volcengine

https://seedance.fast/
1•thenextechtrade•27m ago•0 comments

The big AI job swap: why white-collar workers are ditching their careers

https://www.theguardian.com/technology/2026/feb/11/big-ai-job-swap-white-collar-workers-ditching-...
3•n1b0m•29m ago•2 comments

Best of 2024 Data Center Podcast [video]

https://www.youtube.com/watch?v=bgggLTpyFPY
1•walterbell•31m ago•0 comments

Downsides to US-Canadian dual citizenship for US resident?

https://immigration.ca/claiming-canadian-citizenship-by-descent-under-canadas-new-citizenship-act...
1•jakedata•34m ago•1 comments

Why Y Combinator and Aaron Epstein Are Betting on AI-Native Agencies

http://ai-native-agency.com/blog/yc-ai-native-agency
1•victorgk_•36m ago•1 comments

OpenClaw Prompt Injection via Chat History Spoofing (Fixed)

https://twitter.com/marckohlbrugge/status/2021442885942702427
1•hanspagel•37m ago•0 comments

Row Polymorphism without the Jargon (2020)

https://jadon.io/blog/row-polymorphism/
1•bjourne•37m ago•0 comments

OpenClaw creator: "Netlify shares phone numbers"

https://twitter.com/steipete/status/2021495699586904083
2•mellosouls•37m ago•1 comments

Emergent: LLM-Native Python Framework

https://github.com/prostomarkeloff/emergent
1•notmarkeloff•39m ago•0 comments

Show HN: Chroma Master A premium Flutter color suite with 7 integrated games

1•Krishna_Avatar•41m ago•0 comments

Web Development Improvements

https://jameskilby.co.uk/2026/01/web-development-improvements/
1•taubek•42m ago•0 comments