Show HN: USST – A protocol to reduce LLM context redundancy by 98.5%

https://gist.github.com/maverick069/06d6f6e89947d621b4905765245a220a

2•mgopanna•2mo ago

I’ve been working on a primitive called User-Segmented Session Tokens (USST).

The Problem: Currently, if a teacher (or lead dev) wants 50 students (or junior devs) to use an LLM with a specific, deep context (e.g., a 50-page curriculum or a complex repo), all 50 users have to re-upload and re-tokenize that context. It’s redundant, expensive, and forces everyone to have a high-tier subscription.

The Solution: USST allows a "Sponsor" (authenticated, paid account) to run a Deep Research session once and mint a signed Context Token. Downstream users (anonymous/free tier) pass this token in their prompt. The provider loads the pre-computed KV cache/context state without re-processing the original tokens.

Decouples payment from utility: Sponsor pays the heavy compute; Users pay the inference. Privacy: Users don't need the Sponsor's credentials, just the token. Efficiency: Removes the "Linear Bleed" of context re-computation.

I wrote up the full architecture and the "why" here: https://medium.com/@madhusudan.gopanna/the-8-6-billion-oppor...

The Protocol Spec / Repo is the main link above.

Would love feedback on the abuse vectors and how this fits with current provider caching (like Anthropic’s prompt caching).

Comments

mgopanna•2mo ago

I wanted to share the economic model that drove me to build this. I call it the "Redundancy Tax."

When you look at the hidden costs of "Per-Seat" architecture in an education setting, the numbers get large very quickly. I broke down the cost of redundant context re-processing:

The Baseline:

    Target: ~20M connected classrooms (secondary/tertiary globally).

    Volume: 1,000 high-value interactions per year (a conservative estimate for active AI tutoring).

    The Waste: Re-processing a 35k context window for every single student query instead of reusing the cached state.

The USST Math: By shifting from "Raw Mode" (everyone tokenizes everything) to "USST Mode" (Sponsor tokenizes once, students reuse):

    We see a ~98.5% reduction in incremental token load.

    That saves roughly $0.432 per interaction in compute costs.

    0.432×1,000 interactions×20M classrooms≈$8.6 Billion annually.

The Grid Impact: Beyond the money, this is an infrastructure stability issue. A simultaneous classroom start (e.g., 10:05 AM) currently looks like a 1 Megawatt spike on the grid. With shared context tokens, that drops to a 15 Kilowatt blip (just the inference delta).

We don't need 100x more chips to solve this; we just need a protocol that stops treating every user session as a blank slate.

Moltbook isn't real but it can still hurt you

Take Back the Em Dash–and Your Voice

Show HN: 289x speedup over MLP using Spectral Graphs

Teaching Mathematics

3D Printed Microfluidic Multiplexing [video]

Abstractions Are in the Eye of the Beholder

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

We didn't ask for this internet – Ezra Klein show [video]

The Real AI Talent War Is for Plumbers and Electricians

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

I Maintain My Blog in the Age of Agents

The Fall of the Nerds

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

How close is AI to taking my job?

You are the reason I am not reviewing this PR

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

How Meta Made Linux a Planet-Scale Load Balancer

A Turing Test for AI Coding

How to Identify and Eliminate Unused AWS Resources

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

CLI for Common Playwright Actions

Would you use an e-commerce platform that shares transaction fees with users?

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

The Evolution of the Interface

Azure: Virtual network routing appliance overview

Seedance2 – multi-shot AI video generation

Πfs – The Data-Free Filesystem

Go-busybox: A sandboxable port of busybox for AI agents

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]