frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ark – AI agents waste ~30% of context on tool schemas.I built runtime that learn

https://github.com/atripati/ark
1•atripat6•1h ago

Comments

AlexC04•49m ago
this is a pretty important piece and the research backs you up. Moving that context out of your system prompt dynamically is going to help reduce your lost in the middle effect. Context rots almost immediately. I've got a project that is being built to address this directly as well, but I'm still very early days.

Keep it up! you're on the right track.

Hong, K., & Chroma Research Team. (2025). Context rot: How increasing input tokens impacts LLM performance. Chroma Research. https://research.trychroma.com/context-rot

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://doi.org/10.1162/tacl_a_00638

atripat6•32m ago
really appreciate the pointer to the Chroma research, the context rot framing matches what i've been seeing. Even with large context windows, the signal-to-noise drops quickly, especially when tool schemas are included upfront but not actually used.

with ARK, i’m trying to treat context more like a constrained working set rather than something static. It starts minimal and only expands when there is signal (failures, ambiguity, etc.), so the model isn’t reasoning over stale context.

Curious about your approach — are you leaning more toward:

restructuring how context is stored/retrieved (external memory, RAG, etc.), or dynamically controlling what actually enters the prompt at each step?

Feels like a fundamental bottleneck for production agent systems, so would love to compare how you're thinking about the latency vs accuracy tradeoff.

AlexC04•18m ago
so - my approach is still being built and I'm still very hand wavy around how it is going to come together, but effecively I'm building pipelines of prompts. Rather than running our LLM sequences as long running sessions where the entire context gets loaded on every turn (a recipe for rot), we unlock the ability to introduce a thinking layer at each step in between the process.

So before each turn is sent into the LLM we (potentially) run a local process to assemble a bespoke context of only what is required for that specific turn.

If a tool call is not going to be needed on the prompt, we don't include it in the system prompt on that round.

I'm still formalizing the spec at the moment and think I'm about six months to a year out before I have a full human ready UI running.

This is the foundational paper I'm basing the tool on: https://github.com/AlexChesser/ail/blob/main/docs/blog/the-y... while the spec starts here: https://github.com/AlexChesser/ail/blob/main/spec/core/s01-p...

Essentially I'm trying to build an artificial neocortex and frontal lobe to provide a complete layer of Executive Function that operates on top of our agents - like Claude Code (or whatever else).

I'm basing the roadmap on the about 100 years of cognitive science. We've legitimately had names for all these failure modes (in humans) since the 1960's. We have observations of what we're witnessing in agents from 1848.

We have the roadmap from Psychology.

AlexC04•11m ago
to directly answer this bit:

> Feels like a fundamental bottleneck for production agent systems, so would love to compare how you're thinking about the latency vs accuracy tradeoff.

I'm really not focusing on latency right now. My short term goal is to prove the thesis that `ail` can improve same-model performance on SWEBench Pro vs. their own published results.

Can I run swebp with GLM-4.6 and get a score better than their published `68.20` https://www.swebench.com/?

The argument is that the latency right now just isn't the part we should worry about. If we're reducing the time to code something from ~6 weeks to 1 hour... then does it really matter tha we add an other 30 minutes of tool calls if we get it 100% right vs. 80% right?

Make it work -> Make it right -> make it fast.

I'm still on the first one tbh :rofl-emoji:

Tell HN: Hawaiian Airlines app showing someone else's flight info

1•HoldOnAMinute•1m ago•0 comments

Paper Tape Is All You Need – Training a Transformer on a 1976 Minicomputer

https://github.com/dbrll/ATTN-11
1•rahen•1m ago•0 comments

The Brigade System Helps Restaurants Succeed. Does It Also Lead to Abuse?

https://www.nytimes.com/2026/03/25/dining/noma-brigade-system.html
1•ripe•2m ago•0 comments

Before Leon AI 2.0, I want to say this

https://blog.getleon.ai/before-leon-ai-2-0-i-want-to-say-this/
1•Louistiti•3m ago•0 comments

Show HN: Let Me Emoji That for You

https://letmeemojithatforyou.com
1•kilroy123•4m ago•0 comments

I said code review was dead. Here's what I got wrong – and right

https://www.aviator.co/blog/code-review-dead/
1•tonkkatonka•4m ago•1 comments

Agent-browser – Browser automation CLI

https://agent-browser.dev/commands
2•kristianpaul•6m ago•0 comments

Building an AI Data Analyst Sucks

https://getbruin.com/blog/build-your-own-ai-data-analyst/
1•karakanb•8m ago•0 comments

Code Review usage will count toward Codex limit instead of having separate limit

https://help.openai.com/en/articles/12642688-using-credits-for-flexible-usage-in-chatgpt-freegopl...
2•jeremyg22•8m ago•0 comments

Show HN: Building an open-source product demo platform

https://livedemo.ai
1•gapostolov•9m ago•2 comments

Children of Heaven

https://spontaneousoddities.substack.com/p/children-of-heaven
2•surprisetalk•12m ago•0 comments

Microscope Light

https://mitxela.com/projects/microlight
1•surprisetalk•12m ago•0 comments

"Roadrunner": a bipedal, wheeled robot for multi-modal locomotion [video]

https://www.youtube.com/watch?v=9kae-UAME1U
1•surprisetalk•12m ago•0 comments

macOS Tips (2024)

https://blog.xoria.org/macos-tips/
2•surprisetalk•12m ago•1 comments

Show HN: New Hacker News Watchlists Crome Extension

https://chromewebstore.google.com/detail/hn-watchlists-hacker-news/hecdejfkdohajgcfmgekpbjhkoaohplg
1•losalah•13m ago•0 comments

The Casino That's Eating the World

https://www.nytimes.com/2026/03/23/opinion/prediction-markets-gambling.html
1•ChrisArchitect•14m ago•1 comments

NASA Sets Out New Plans and Timelines for Moon Base and Nuclear Mars Mission

https://www.nytimes.com/2026/03/24/science/nasa-moon-base-mars-spacecraft.html
1•Brajeshwar•15m ago•0 comments

Show HN: I build a free feed reader that reimagines what is possible with RSS

https://hypertexting.com/blog/introducing-hypertexting/
1•calebhailey•15m ago•0 comments

An Extensive Benchmark of C and C++ Hash Tables

https://jacksonallan.github.io/c_cpp_hash_tables_benchmark/
3•klaussilveira•15m ago•0 comments

My quest to preserve VHS- era gaming culture, one eBay bid at a time

https://www.theguardian.com/games/2026/mar/25/my-quest-to-preserve-vhs-era-video-culture-one-ebay...
1•toomuchtodo•15m ago•0 comments

The best design style extraction and reuse skill on the market.

https://github.com/zanwei/design-dna
1•Johnson8053•17m ago•0 comments

I Put a Full JVM Inside a Browser Tab. It "Works". Technically. Eventually

https://bmarti44.substack.com/p/i-put-a-full-jvm-inside-a-browser
1•PaulHoule•18m ago•0 comments

Supreme Court Sides with Cox in Copyright Fight over Pirated Music

https://www.nytimes.com/2026/03/25/us/politics/supreme-court-cox-music-copyright.html
19•oj2828•18m ago•4 comments

I built a game where an AI judges whether things deserve each other

https://yoursoulmateis.wtf
1•edgardou•19m ago•1 comments

Malicious Litellm 1.82.8: Credential Theft and Persistent Backdoor

https://safedep.io/malicious-litellm-1-82-8-analysis/
1•alokDT•19m ago•0 comments

I built a YAML DSL for Temporal workflows

https://zigflow.dev/articles/why-i-built-a-yaml-dsl-for-temporal-workflows/
1•mrsimonemms•19m ago•0 comments

A Rare Blog

https://andys.blog/rare/
1•andytratt•19m ago•0 comments

Comprehensive C++ Hashmap Benchmarks (2022)

https://martin.ankerl.com/2022/08/27/hashmap-bench-01/
2•klaussilveira•19m ago•0 comments

I made a college punching bag for rejected highschoolers

https://re.ject.ing
1•skillseeddev•20m ago•1 comments

Elon Musk demands judge's recusal over LinkedIn post after $2B verdict

https://nypost.com/2026/03/25/business/elon-musk-seeks-recusal-of-delaware-judge-over-linkedin-su...
1•1vuio0pswjnm7•20m ago•0 comments