news newest ask show jobs

Open Source @Github

fp.

Open in hackernews

Simplicity always wins:SOTA on swe-pro,tb2,-verif on 21 models with simple-agent

https://github.com/strands-labs/benchmark-harnesses

2•gaurav71531•1h ago

Comments

gaurav71531•1h ago

We show that a simple harness fixing 'intent-execution gap' achieves SOTA pass@1 on 21 models (across diverse model providers of Claude, GPT, Gemini, Grok, Qwen) on agentic benchmarks (SWE-Pro, -verif, tb2). This is first time a single open-source harness reproduce/improve results on popular benchmarks for modern LLMs! The code is public to try and build-on: Code: https://github.com/strands-labs/benchmark-harnesses

More importantly, we also generated 138k high-quality agent trajectories (SOTA pass@1) and present a detailed study on them "Dissecting model behavior through agent trajectories" https://arxiv.org/abs/2606.17454

Models that achieve similar pass@1 behaves very different internally and we quantize it using several metrics (such as code state-spaces)

Eater, the Verge and SB Nation Sold to Penske Media

https://www.hollywoodreporter.com/business/business-news/eater-verge-sbnation-sold-penske-media-1...

1•speckx•2m ago•0 comments

Show HN: Jumpjet – a WASM runtime for game developers

https://jumpjet.dev

1•lwansbrough•2m ago•0 comments

A terminal Markdown editor that links like Obsidian – editxr

https://editxr.org/blog/obsidian-in-the-terminal/

1•mromanuk•4m ago•0 comments

Illegal excavation reveals grand Roman villa with elaborate mosaics

https://www.cnn.com/2026/06/19/science/roman-villa-mosaics-illegal-dig-scli-intl

1•1659447091•7m ago•0 comments

The Rise of Single-Node Processing: Challenging the Distributed-First Mindset

https://www.pracdata.io/p/the-rise-of-single-node-processing

1•b-man•7m ago•0 comments

AI Agent on Android

https://github.com/ExTV/rikkahub-agent

1•excp•7m ago•0 comments

My agentic engineering workflow as someone who doesn't write code

https://shreyasprakash.com/my-agentic-engineering-workflow/

1•mondo_daemon•8m ago•1 comments

Building multilingual AI with a new open dataset

https://github.blog/ai-and-ml/llms/accelerating-researchers-and-developers-building-multilingual-...

1•gmays•10m ago•0 comments

Show HN: CWC scans your Claude Code history and auto-builds agent workflows

https://github.com/fayzan123/claude-workflow-composer

1•FayzanMalik•13m ago•0 comments

Show HN: I built a bookmarks manager with a social cut

https://linkraider.com/login

2•Sotty75•18m ago•2 comments

US tells ASML it is concerned China may have top chip tool

https://www.bloomberg.com/news/articles/2026-06-19/us-tells-asml-it-s-concerned-china-may-have-to...

1•dgellow•18m ago•0 comments

Still Human Here

https://halit.alptekin.im/posts/still-human-here/

1•crimedisruptor•19m ago•0 comments

Bobby Prince, composer for Doom, Wolfenstein 3D, and Duke Nukem 3D, has died

https://www.legacy.com/legacy/robert-bobby-prince-lll

2•pgrote•20m ago•0 comments

Running local AI on AMD RX 580 (2017 GPU) using Vulkan – no CUDA, no ROCm

https://setup-ia-local-rx580-vulkan.web.app/

1•aivisionslab•21m ago•0 comments

Moving a 6-node NETLAB+ cluster off VMware to Proxmox

https://solomonneas.dev/blog/vmware-to-proxmox-migration

1•solomonneas•21m ago•0 comments

Show HN: Kivvy – Divvy/Magnet-style window snapping for KDE Plasma

https://github.com/pavel-teramips/kivvy

1•bugpower•23m ago•0 comments

An open-source space MMO

https://claudecitizen.com/

1•echocrest•23m ago•0 comments

Show HN: ClikDeo – browser-based video editor, all processing local, no upload

https://clikdeo.com/

1•Clikdeo•25m ago•2 comments

Where to Find the Colors Your Screen Can't Show You

https://moultano.wordpress.com/2026/06/19/where-to-find-the-colors-your-screen-cant-show-you/

1•Tomte•25m ago•0 comments

Show HN: Chrome extension to grey out unwanted videos

https://youtubeangel.com/

1•ltrinchini•26m ago•0 comments

Using GenAI to translate a game is as bad as using it to make assets

https://www.whateverthewindbrings.com/using-genai-to-translate-a-game-is-as-bad-as-using-it-to-ma...

2•speckx•27m ago•0 comments

400B Parameter Model: Consortium "Europa" Wins AI Competition

https://www.heise.de/en/news/400-Billion-Parameter-Model-Consortium-Europa-Wins-AI-Competition-11...

1•frb•28m ago•0 comments

On the trail of the dotcom queen

https://www.theguardian.com/business/2026/jun/19/julie-meyer-dotcom-queen-unpaid-bills-missing-fu...

1•kawera•32m ago•0 comments

Five Chinese AI Labs Cut Token Prices Up to 99%

https://aiweekly.co/alerts/five-chinese-ai-labs-cut-token-prices-up-to-99

2•fittingopposite•32m ago•1 comments

Crash Blossom

https://www.merriam-webster.com/wordplay/crash-blossom-words-were-watching

1•thunderbong•34m ago•0 comments

Show HN: Fate – a joke horoscope generator utility for Linux pids

https://github.com/cjd8/fate

1•cjd8•35m ago•0 comments

Audacity 4.0 beta lets you test its new (nicer) Qt interface

https://www.omgubuntu.co.uk/2026/06/audacity-4-0-beta

1•birdculture•37m ago•0 comments

StoryLab: Brand positioning – free. Full strategy, less than lunch

https://story-lab.ai/

1•Aftermidn8•38m ago•1 comments

Rent out your Mac for inference

https://console.darkbloom.dev/earn

1•gmays•38m ago•1 comments

Eyeball – a curious eye in your Chrome toolbar

https://chromewebstore.google.com/detail/eyeball-a-curious-eye-in/hepaijoaliamcipeoajfgdiajfaammei

1•kka•38m ago•0 comments