news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The First Agent Skills Benchmark

https://huggingface.co/papers/2602.12670

1•xdotli•1h ago

Comments

xdotli•1h ago

We collected 86 tasks from 105 domain experts across 11 domains, every task is verifiable, human created and has verified Skills. SOTA model without skills score ~30% without skills.

We found a few interesting things: 1. Skills substitute for model scale — Haiku 4.5 with Skills (27.7%) beats Opus 4.5 without (22.0%). The right procedural knowledge can be worth more than a bigger model. 2. Skills' improvement has nothing to do with LLMs' internal knowledge. We have an ablation where no Skills provided for the agent, but the agent is prompted to generate relevant procedural knowledge before solving the task. This isolates the impact of LLMs' latent domain knowledge. The result is: Curated Skills: +16.2pp average improvement across all 7 agent configs Self-generated Skills: -1.3pp: models can't write their own procedural knowledge pre-trajectory feedbacks. This is used to isolate the impact of LLMs' latent domain knowledge.

Book a Meeting with a YC Founder

https://y-cal.vercel.app/

1•abrarmurad416•2m ago•0 comments

Ask HN: Can AI replace apps, or will economics keep the app market alive?

1•maccraft•2m ago•0 comments

Show HN: Preference-aware routing for OpenClaw via Plano

https://github.com/katanemo/plano/tree/main/demos/llm_routing/openclaw_routing

1•sparacha•6m ago•0 comments

The Servo project and its impact on the web platform ecosystem

https://servo.org/slides/2026-02-fosdem-servo-web-platform/

1•mmphosis•6m ago•0 comments

Mira: An agent that never forgets anything. Persistent, shared memory

https://www.co-span.com/

2•dvt•8m ago•0 comments

Python HTTP server using Erlang and BEAM

https://hornbeam.dev/

1•polyrand•8m ago•0 comments

Dual nationals face scramble for UK passports as new rules come into force

https://www.bbc.com/news/articles/cx2d9yk2kpjo

2•tartoran•9m ago•0 comments

GraphQLite: SQLite graph extension supporting Cypher

https://colliery-io.github.io/graphqlite/latest/

2•dude01•10m ago•0 comments

Show HN: AccessLint – Static accessibility analysis for iOS/Swift

https://accesslint.app

1•synctek•12m ago•0 comments

The Problem with Left Nationalism

https://jacobin.com/2026/01/left-nationalism-universalism-populism-melenchon/

1•PaulHoule•12m ago•1 comments

We're Measuring Data Center Sustainability Wrong

https://spectrum.ieee.org/data-center-sustainability-metrics

1•defrost•14m ago•0 comments

Ask HN: How can a non-technical founder prove they're more than an "idea guy"?

1•timsein•15m ago•4 comments

I swear the UFO is coming any minute

https://www.experimental-history.com/p/i-swear-the-ufo-is-coming-any-minute

3•Ariarule•16m ago•0 comments

What Neptune.ai Got Right (and How to Keep It)

https://www.trainy.ai/blog/what-neptune-got-right-and-how-to-keep-it

2•roanakb•17m ago•0 comments

Show HN: Turn Claude Code or Codex into proactive, autonomous 24/7 AI agents

https://github.com/suitedaces/dorabot

2•alternateman•19m ago•0 comments

The Case for Duolingo

https://josephblumenfeld.substack.com/p/the-case-for-duolingo

1•AzariaK•19m ago•1 comments

The 24-Day Notice That Was a 7-Month Signal

https://medium.com/@platformpolicy/the-24-day-notice-that-was-actually-a-7-month-signal-55c4b3726fce

1•ppolicyco•20m ago•1 comments

Space Station returns to a full crew complement after a month

https://arstechnica.com/space/2026/02/space-station-returns-to-a-full-crew-complement-after-a-month/

1•rbanffy•20m ago•0 comments

Can Opus 4.6 Do Category Theory in Lean?

https://www.stephendiehl.com/posts/lean-opus-blog/

1•macleginn•21m ago•0 comments

Bankruptsy

https://lightward.com/bankruptsy

2•isaacbowen•21m ago•0 comments

Architecture of Consoles

https://www.copetti.org/writings/consoles/

2•lopespm•24m ago•0 comments

Updated Thoughts on AI Risk

https://www.noahpinion.blog/p/updated-thoughts-on-ai-risk

1•paulpauper•24m ago•0 comments

Show HN: ChessGrammar – API that detects tactical patterns in chess positions

1•stevejvv•25m ago•0 comments

AI Eats the World, and Most of Its Flash Storage

https://www.nextplatform.com/2026/02/17/ai-eats-the-world-and-most-of-its-flash-storage/

3•rbanffy•27m ago•0 comments

Diagnosing a PET Video Fault from One Photograph

http://blog.tynemouthsoftware.co.uk/2026/02/diagnosing-a-pet-video-fault-from-one-photo.html

1•WaluigiBSOD•28m ago•0 comments

Show HN: FolioDoc – I built a tool to stop chasing clients for documents

3•Foliodoc•30m ago•0 comments

Phishing Detection NLP Heuristic: Prototype Achieves 60% Detection Rate

https://horeszko.ca/blog/phishing-detection.html

1•horeszko•31m ago•0 comments

Lessons from building the best Deep Research and how you can build better agents

https://www.onyx.app/blog/building-the-best-deep-research

1•yuhongsun•33m ago•0 comments

Algorithm-based tool for home support funding is 'cruel' and 'inhumane'

https://www.theguardian.com/australia-news/2026/feb/17/australian-aged-care-algorithm-tool-home-s...

1•novemp•33m ago•0 comments

U.S. releases new details on alleged secret Chinese nuclear test

https://www.npr.org/2026/02/17/nx-s1-5716046/u-s-releases-new-details-on-alleged-secret-chinese-n...

3•ironyman•35m ago•0 comments