frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Solving ARC AGI 2 with interleaved thinking and stateful IPython REPL

https://github.com/gutfeeling/arc-agi-2-submission
2•steinsgate•1h ago
My friends and I started this project in the summer of 2025 with the initial goal of participating in the ARC Prize Kaggle competition. Early on, we were exploring agentic coding with frontier reasoning models and found that models like o3 and o4-mini could generate high-quality synthetic ARC-style puzzles. Our plan was to use these synthetic puzzles to train a smaller model via agentic reinforcement learning (RLVR with interleaved thinking).

To bootstrap this process, we needed successful solution traces from an open-weight reasoning model for cold-start supervised fine-tuning. That requirement led us to investigate GPT-OSS-120B. While doing so, we noticed something unexpected: simply placing the model into the interleaved thinking regime produced large and consistent score improvements on ARC AGI 2 tasks. We were seeing scores that we didn’t think was possible for a medium sized OSS model.

This observation ultimately shifted the focus of our work as we wanted to find out how universally this observation applies while staying within our resource constraints. We concluded that it applies quite generally, with double digit gains in frontier models too.

Previously, I have read debates about whether ARC AGI 2 is primarily a reasoning benchmark or a visual benchmark. I guess we can now add agentic benchmark to the mix as well!

I Use Obsidian

https://stephango.com/vault
1•hisamafahri•1m ago•0 comments

Ask HN: Are compiler errors for unused code necessary?

1•qwool•2m ago•0 comments

Memories Family

https://familymemories.video
1•tareq_•3m ago•0 comments

Book a Meeting with a YC Founder

https://y-cal.vercel.app/
1•abrarmurad416•7m ago•0 comments

Ask HN: Can AI replace apps, or will economics keep the app market alive?

1•maccraft•7m ago•0 comments

Show HN: Preference-aware routing for OpenClaw via Plano

https://github.com/katanemo/plano/tree/main/demos/llm_routing/openclaw_routing
1•sparacha•11m ago•0 comments

The Servo project and its impact on the web platform ecosystem

https://servo.org/slides/2026-02-fosdem-servo-web-platform/
1•mmphosis•11m ago•0 comments

Mira: An agent that never forgets anything. Persistent, shared memory

https://www.co-span.com/
2•dvt•13m ago•0 comments

Python HTTP server using Erlang and BEAM

https://hornbeam.dev/
1•polyrand•13m ago•0 comments

Dual nationals face scramble for UK passports as new rules come into force

https://www.bbc.com/news/articles/cx2d9yk2kpjo
2•tartoran•14m ago•0 comments

GraphQLite: SQLite graph extension supporting Cypher

https://colliery-io.github.io/graphqlite/latest/
2•dude01•15m ago•0 comments

Show HN: AccessLint – Static accessibility analysis for iOS/Swift

https://accesslint.app
1•synctek•17m ago•0 comments

The Problem with Left Nationalism

https://jacobin.com/2026/01/left-nationalism-universalism-populism-melenchon/
1•PaulHoule•17m ago•1 comments

We're Measuring Data Center Sustainability Wrong

https://spectrum.ieee.org/data-center-sustainability-metrics
1•defrost•19m ago•0 comments

Ask HN: How can a non-technical founder prove they're more than an "idea guy"?

1•timsein•20m ago•4 comments

I swear the UFO is coming any minute

https://www.experimental-history.com/p/i-swear-the-ufo-is-coming-any-minute
3•Ariarule•21m ago•0 comments

What Neptune.ai Got Right (and How to Keep It)

https://www.trainy.ai/blog/what-neptune-got-right-and-how-to-keep-it
2•roanakb•22m ago•0 comments

Show HN: Turn Claude Code or Codex into proactive, autonomous 24/7 AI agents

https://github.com/suitedaces/dorabot
2•alternateman•24m ago•0 comments

The Case for Duolingo

https://josephblumenfeld.substack.com/p/the-case-for-duolingo
1•AzariaK•24m ago•1 comments

The 24-Day Notice That Was a 7-Month Signal

https://medium.com/@platformpolicy/the-24-day-notice-that-was-actually-a-7-month-signal-55c4b3726fce
1•ppolicyco•24m ago•1 comments

Space Station returns to a full crew complement after a month

https://arstechnica.com/space/2026/02/space-station-returns-to-a-full-crew-complement-after-a-month/
1•rbanffy•25m ago•0 comments

Can Opus 4.6 Do Category Theory in Lean?

https://www.stephendiehl.com/posts/lean-opus-blog/
1•macleginn•26m ago•0 comments

Bankruptsy

https://lightward.com/bankruptsy
2•isaacbowen•26m ago•0 comments

Architecture of Consoles

https://www.copetti.org/writings/consoles/
2•lopespm•29m ago•0 comments

Updated Thoughts on AI Risk

https://www.noahpinion.blog/p/updated-thoughts-on-ai-risk
1•paulpauper•29m ago•0 comments

Show HN: ChessGrammar – API that detects tactical patterns in chess positions

1•stevejvv•29m ago•0 comments

AI Eats the World, and Most of Its Flash Storage

https://www.nextplatform.com/2026/02/17/ai-eats-the-world-and-most-of-its-flash-storage/
3•rbanffy•32m ago•0 comments

Diagnosing a PET Video Fault from One Photograph

http://blog.tynemouthsoftware.co.uk/2026/02/diagnosing-a-pet-video-fault-from-one-photo.html
1•WaluigiBSOD•33m ago•0 comments

Show HN: FolioDoc – I built a tool to stop chasing clients for documents

3•Foliodoc•35m ago•0 comments

Phishing Detection NLP Heuristic: Prototype Achieves 60% Detection Rate

https://horeszko.ca/blog/phishing-detection.html
1•horeszko•36m ago•0 comments