Show HN: Headroom (OSS): Cuts LLM costs by 85%

https://github.com/chopratejas/headroom

3•chopratejas•3w ago

Comments

chopratejas•3w ago

What is it?

- Context Compression (with Reversibility - this part is the difference) for LLMs

- very different than any compression or summarization tools that promise cost savings and speed!

- claude code costs / cursor costs - reduced by 50-60%

- ideal for startups and Enterprises!!

- integration with LangChain

- Memory as a first class citizen

- its OSS! So Free!

Give it a try, Its OSS - if you love it, star it. If you don't, lets make it better, together!

chopratejas•2w ago

Some results from real world data so far:

  ┌─────────────────┬─────────────┬──────────────────────────────┐
  │    Data Type    │ Compression │             Why              │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Server logs     │ 90%+        │ Highly repetitive patterns   │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ MCP tool output │ 70%+        │ JSON structure overhead      │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Database rows   │ 50-70%      │ Same schema, many records    │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ File trees      │ 40-50%      │ Repeated metadata            │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Code diffs      │ 0%          │ Every line unique            │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Dense prose     │ -0.3%       │ No patterns, slight overhead │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Encrypted       │ 0%          │ Incompressible               │
  └─────────────────┴─────────────┴──────────────────────────────┘

niux•2w ago

Not a single example of what it does or how it works

chopratejas•2w ago

Fair enough. Trying to keep it concise here - This is how you install it:

pip install "headroom-ai[proxy]"

headroom proxy --port 8787

It will:

* Check all the data going into the LLM and apply intelligent compression based on the content type - different for JSONs, code etc.

* If the LLM is not getting what it is seeking, there is reversible compression - so the LLM will not lose accuracy

* When you think of MCP tools, code function calls etc. that fill up the context window and cause needle in haystack problems - they get eliminated.

There is also an SDK which works like this:

from langchain_openai import ChatOpenAI from headroom.integrations import HeadroomChatModel

# Wrap your model - that's it!

llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))

# Use exactly like before response = llm.invoke("Hello!")

Ive personally used it with Claude Code and Cursor and seen the benefits.

goeb1•2w ago

Seems very useful. I tried it on my Claude code and it was saving approximately 50% Do you know how I can push it to save more? Do you also have plans to make it Enterprise ready?

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]