The Token Compression Illusion: Why I'm Skeptical of RTK

https://mroczek.dev/articles/the-token-compression-illusion-why-im-skeptical-of-rtk/

16•lackoftactics•1h ago

Comments

breadislove•1h ago

slop complaining about other slop

lackoftactics•1h ago

thank you, author here. I will stay civil here and focus on the rtk as that was the goal of article.

So do you think rtk cli is ai slop? I had some suspicions looking at their repo and number of issues and their style. The prettier issue with running successfully while binary wasn't even installed was quite entertaining

grey-area•1h ago

Did you use an LLM for the blog post? it reads like it in places.

lackoftactics•1h ago

I have raycast shortcut for fix grammar, it might done more damage than adding a, an, the or changing tenses.

gowld•1h ago

A content-free 2nd "paragraph" like this turned me off immediately.

> But in the current dev tools gold rush, if something sounds too good to be true, it almost always is.

The people who are interested in RTK and in criticism of RTK aren't interested in pablum like this.

lackoftactics•59m ago

ok, this one is all mine. So that's even more hurtful as this is 100% me

SubiculumCode•1h ago

I feel like what is needed is not compression, but aggressive context management with subagents.

lackoftactics•1h ago

I am the author the text.

What do you mean by aggresive context management with subagents? Would you add a lopp that would trim the context?

Both of those tasks seem even more difficult

skinfaxi•1h ago

I believe they mean aggressive delegation to minimize context bloat in the coordinating agent.

lackoftactics•1h ago

that would make more sense, trimming context with subagents sounds like an overkill

SubiculumCode•56m ago

First, I only say this because of what I learned as a phD inhuman memory, not as someone who authors agentic workflows or does AI.

How human cognition tends to work by simultaneously utilizing and combining/separating multiple frequency scales of information. A simple way of thinking about is this: We tend to encode and retrieve both the gist of what is happening, and the verbatim details of what happened. The gist can be thought of as low frequency information, almost like bullet points, that contain the big overview goal, keypoints). The verbatim traces, are the high resolution memory that contains all the details. The gist helps encoding and recall by providing encoding and retrieval context cues. There are also levels in between those two, but I was keeping it simple. During human development, verbatim memory capacity increases first, but then hits a wall/plateau. Further performance increases begin to depend on the ability to utilize and gain from gist-like representations that can guide encoding and retrieval of verbatim details within contexts.

You don't need to keep everything in the context window. My untested, perhaps naive hypothesis is that what is needed is that sub-agents dealing with verbatim tasks (actually writing code), their context window should be managed by an agent above that is tuned to information at a lower frequency, and it by another above it on even lower frequency information. Lowest frequency information context windows feel up slowly. High-frequency information fills up fast. Use the low frequency information to retrieve the needed high frequency information.

iam-TJ•1h ago

Am I the only one that thought RTK was Real-Time Kinematics used for precision with satellite navigation?

dayjaby•1h ago

No. I clicked here for the same reason.

lackoftactics•1h ago

I might have picked better title, but they are literally called rtk https://github.com/rtk-ai/rtk

and it stands for Rust Token Killer

arcanemachiner•1h ago

I've been trying out RTK and it seems kinda alright. I doubt it's saving much, but the quality of the work feels similar.

But if it's making a dent in token usage (which I have not personally measured), then that's great.

I had to add some system prompt instructions to Pi to help it work (GPT 5.5 initially got confused when `git status` looked different than expected). The Claude Code extension appears to do a proper job of informing the agent about the unexpected shape of the output without any extra work on my part.

lackoftactics•1h ago

so how do you justify it's usage if it's not saving much and the work feels similiar. They have 664 issues open and some of them are quite funny, the tools are called and return success even though they aren't even installed.

My take is that handling so many versions and so many different tools shouldn't be the work of any single repo. The responsibility should be either on coding agent to compress or best case scenario people who are responsible for cli tool

arcanemachiner•1h ago

I'm not justifying its usage, and I don't have to.

I've been trying it out for a couple days and it seems kinda OK or whatever. If that upsets you, then that's your problem.

I might dump it later on if it doesn't provide much if a benefit. I typically try out new things, then cull whatever doesn't work. This tool seems pretty neutral for now, at least.

lackoftactics•1h ago

no, it doesn't upset me. I am open for discussion, there might be things I miss and don't understand. I am just trying to get why it's been pushed so hard lately and if the benefits are really there. Sorry, if I sounded upset to you, but I am trying to be really civil and just genereally curious

old_sysadmin•1h ago

I feel like the state of the art is baked into the compaction logic, and I've had a lot of problems with compaction (absent other prompting) losing key bits of state.

https://github.com/toon-format/toon is another interesting one, and I feel like it takes on a much more achievable goal - reduce whitespace and verbosity of JSON, not overall context compression.

arcanemachiner•1h ago

Personally, I find compaction to be unreliable, which forces me to rely heavily on session-specific planning documents and inter-agent handoff messages.

compuficial•1h ago

> 1. Gamified Savings vs. Your Actual API Bill

Tool use output represents a large amount of my output. I'll take 3.7M tokens saved on 3.9M tokens of input. Tokens saved are tokens saved.

> 3. Where Are the Accuracy Benchmarks?

As a user of RTK, it would be nice to see accuracy benchmarks. However, I've seen no evidence of the model missing anything critical as a result of the compression. As part of their design philosophy they are very strict about preserving correctness to the point that if a filter fails they fall back to raw output. For my most frequently used commands I've inspected the source, was happy with what I saw, they've earned my trust thus far.

> The day git, cargo, npm, or grep updates its terminal formatting by a few spaces or changes an error layout, RTK's regex and parsing filters will break. And returning to the silent failure trap, it won't throw an explicit error; it will fail quietly, feeding corrupted or partial text to your agent.

Again, any filter that fails simply falls back to the raw output. One of their core pillars is avoiding this exact scenario you described. RTK should never feed corrupted or partial text to an agent.

Your concerns are fair but I'd like to see your criticism backed up with evidence. Have you used RTK? Have you found evidence that they are failing to preserve correctness?

lackoftactics•1h ago

I was looking through the issues as investigation. Some issues that caught my attention are looking quite bad https://github.com/rtk-ai/rtk/issues/2494 https://github.com/rtk-ai/rtk/issues/2462 https://github.com/rtk-ai/rtk/issues/2395

compuficial•1h ago

Fwiw, I just ran the steps to reproduce and got `Error: prettier produced no output` on rtk (0.42.2). Not saying this isn't valid for the users environment but I could not reproduce on linux.

cityofdelusion•1h ago

I am glad articles like this are finally starting to get some momentum around what I call the LLM magic box industry. From caveman mode to RTK to semantic search and everything in between. Developers have become magicians that cast spells instead of engineers. It sucks at work especially with everyone so sure that their magic spell is the one for ultimate token savings.

My criteria are: if it’s not in a harness it’s probably not that good (the best ideas float up to Codex/Claude imo) and any GitHub advertising some percent of token savings is not to be trusted.

It’s hard to avoid the snake oil and I hope people start thinking critically on this stuff.

blubber•1h ago

There is a conflict of interest, though.

arcanemachiner•1h ago

The idea itself is sound: If you can reduce the signal-to-noise ratio in the context window, then that's a good thing.

Whether or not RTK actually does this has not been established. I would be glad to see some proper benchmarks done on the actual difference this tool makes (not some meaningless "up to 90%" type of language).

lackoftactics•1h ago

I was wondering if that impacts the accuracy, obviously the rtk output wasn't in the training dataset, but maybe it doesn't matter at the end

striking•22m ago

I mean it kind of already is in harnesses. Codex and Claude Code both have subagent tools. You could probably get a similar token output cut just by asking Claude Code to run all commands with Haiku as a summarizing subagent.

Catloafdev•1h ago

I don't agree with the conclusion at all. I can see the value of RTK - whether it is buggy or vibe coded is kind of secondary. That basically comes down to how severe and often the bugs are.

There's no gamification of savings here. Tool output can be meaty.

Is the author skeptical of the concept, or the implementation? Because only one of those is worth critiquing.

lackoftactics•46m ago

Hey, author here, I am skeptical of implementation starting from Rust Token Killer and looking to monetize on Rust love by other developers.

Concept is fine to me and I believe we should optimize, but a repo that will handle all tools sounds like Sisyphus rolling a rock up the hill.

tlarkworthy•1h ago

I tried it and it does not compress messages which was 90% of my context, so it only compresses a small part of my token usage. If you read it carefully you will realize that is exactly stated. If you look at /context you will probably see that tool calls are not where you are spending token on, so a proxy that compresses tool calls will not make much impact, whilst still being true that it compresses tool calls by 8x. Its just not that important for long coding sessions for me.

"native/built-in Read or cat tools, the data is not intercepted by RTK's shell hook"

lackoftactics•1h ago

Author of the text here. I will be honest with why I wrote it, the rtk ai looks very odd to me as software engineer, the number of stars, no mention of accuracy and how management is pushing that stuff to optimize costs. Now people are wrapping every possible command in rtk and trying to handle all major possible commands and decide which output you should get.

blubber•1h ago

"Where Are the Accuracy Benchmarks?"

I wish the author would have provided one.

graphememes•9m ago

I don't disagree with the article, but I also don't disagree with RTK. The output of these commands is not optimized for agents (or humans) for that matter.

A website that lists websites to submit your website to

I found 10k GitHub repositories distributing Trojan malware

Swiss parliament lifts ban on new nuclear power plants

Migrating from GNU Stow to Chezmoi

Hospitals and universities repurposing drugs at 90% lower cost

Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps

The Harajuku Moment

Advanced Compilers: The Self-Guided Online Course

The founder of Craigslist has given away half a billion dollars

TerraPower in Deal with Meta for Eight Natrium 345 MW Advanced Nuclear Plants

Modos Color Monitor Pushes E-Paper Displays Further

Agentic Resource Discovery Specification

Show HN: Gerrymandle - Daily puzzle game where you redraw electoral districts

Emacs, how it all started (for me)

Emacs 31 is around the corner: The changes I'm daily driving

DeepSeek Introduces Vision

My LSM tree was slower than a B-tree. Then I profiled it

.gitignore Isn't the Only Way to Ignore Files in Git

Has W Social switched to closed source?

How Alberta Eradicated Rats

Local Qwen isn't a worse Opus, it's a different tool

Ask HN: Am I being advertised an ARG via user agent logs?

The Token Compression Illusion: Why I'm Skeptical of RTK

Ask HN: Is anyone using the A2A protocol?

The Korean telecom giant at the center of Anthropic's Mythos controversy

Microsoft new Outlook takes 10 seconds to do what Outlook Classic does instantly

Show HN: Run Agent Skills with mistral.rs v0.8.10: /v1/skills support and more

We built a persistent agent memory layer on Elasticsearch with 0.89 recall

Midjourney Medical

I need your clothes, your boots, and your motorcycle

A website that lists websites to submit your website to

I found 10k GitHub repositories distributing Trojan malware

Swiss parliament lifts ban on new nuclear power plants

Migrating from GNU Stow to Chezmoi

Hospitals and universities repurposing drugs at 90% lower cost

Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps

The Harajuku Moment

Advanced Compilers: The Self-Guided Online Course

The founder of Craigslist has given away half a billion dollars

TerraPower in Deal with Meta for Eight Natrium 345 MW Advanced Nuclear Plants

Modos Color Monitor Pushes E-Paper Displays Further

Agentic Resource Discovery Specification

Show HN: Gerrymandle - Daily puzzle game where you redraw electoral districts

Emacs, how it all started (for me)

Emacs 31 is around the corner: The changes I'm daily driving

DeepSeek Introduces Vision

My LSM tree was slower than a B-tree. Then I profiled it

.gitignore Isn't the Only Way to Ignore Files in Git

Has W Social switched to closed source?

How Alberta Eradicated Rats

Local Qwen isn't a worse Opus, it's a different tool

Ask HN: Am I being advertised an ARG via user agent logs?

The Token Compression Illusion: Why I'm Skeptical of RTK

Ask HN: Is anyone using the A2A protocol?

The Korean telecom giant at the center of Anthropic's Mythos controversy

Microsoft new Outlook takes 10 seconds to do what Outlook Classic does instantly

Show HN: Run Agent Skills with mistral.rs v0.8.10: /v1/skills support and more

We built a persistent agent memory layer on Elasticsearch with 0.89 recall

Midjourney Medical

I need your clothes, your boots, and your motorcycle

The Token Compression Illusion: Why I'm Skeptical of RTK

Comments