Local LLM compresses long prompts before they reach Claude – MCP server

https://github.com/base76-research-lab/token-compressor

1•base76•1h ago

Comments

base76•1h ago

I built a two-stage prompt compressor that runs entirely locally before your prompt hits any frontier model API.

  How it works:
  1. llama3.2:1b (via Ollama) compresses the prompt to its semantic minimum
  2. nomic-embed-text validates that the compressed version preserves the original meaning (cosine ≥ 0.85)
  3. If validation fails → original is returned unchanged. No silent corruption.

  When it actually helps:
  The effect is meaningful only on longer inputs. Short prompts are skipped entirely — no cost, no risk.

  ┌─────────────────────────────────┬────────────┬────────┐
  │              Input              │   Tokens   │ Saving │
  ├─────────────────────────────────┼────────────┼────────┤
  │ < 80 tokens                     │ skipped    │ 0%     │
  ├─────────────────────────────────┼────────────┼────────┤
  │ Academic abstract (207t)        │ 207 → 78   │ 62%    │
  ├─────────────────────────────────┼────────────┼────────┤
  │ Structured research doc (1116t) │ 1116 → 275 │ 75%    │
  ├─────────────────────────────────┼────────────┼────────┤
  │ Short command (4t)              │ skipped    │ 0%     │
  └─────────────────────────────────┴────────────┴────────┘

  If you're sending short one-liners, this won't help. If you're injecting long context, research text, or system prompts — it pays off from the first call.

  Known limitation:
  Cosine similarity is blind to negation. "way smaller" vs "way larger" scores 0.985. The LLM stage handles this by explicitly preserving negations and conditionals, but it's an open
  research question — tracked in issue #1.

  Install as MCP (Claude Code):
  {
    "mcpServers": {
      "token-compressor": {
        "command": "python3",
        "args": ["/path/to/token-compressor/mcp_server.py"]
      }
    }
  }

  Requires: Ollama + llama3.2:1b + nomic-embed-text

  Repo: https://github.com/base76-research-lab/token-compressor-

base76•1h ago

would love to hear what you say abot it

California Becomes Latest State to Weigh Balcony Solar Legislation

Show HN: Audio Toolkit for Agents

Archiving my tweets in my own blog

Show HN: Chromectl – CLI to give an AI agent its own Chrome session

Cursor built this 5 min 3 round Wordle

Show HN: SkillMesh (role-based tool routing for Claude/Codex)

Living with Hyperphantasia

Ask HN: What can people do that intelligent machines will not be able to do?

Show HN: Delta – A disk space analyzer that tracks where your disk space went

Think of BigConfig Package as 'Helm for Everything'

The Epstein Files and the Epstein Class

Background Agents

Show HN: Videolyti – Free video downloader with built-in AI transcription

NIST to introduce restrictions on non-US citizens

Ask HN: Vibecoding feels like playing golf, wdyt?

Is Nvidia's post-Rubin roadmap shifting toward inference-first architectures?

My Favorite 39C3 Talks

Bolt.gives Introduces Free, Agentic AI Coding Platform

Bad Thing Insurance – Coverage for alien abduction, rogue black holes, and AGI

Fast-Servers: An Interesting Pattern?

Reverse engineering "Hello World" in QuickBasic 3.0

Driftwood – friendly AppImage manager for Linux

Cielab Color Space

Show HN: Belora.ai – Generative AI Platform for Images, Art

Foods destroying rainforests, in one simple chart

Show HN: Veracity-Cryptographic data integrity proofs for AI compliance

Show HN: Build a Website for DevOps Learning

Show HN: Colnade – Type-Safe DataFrames for Python

How I approach vibe coding projects to make it not suck

Lil' Fun Langs' Guts

Local LLM compresses long prompts before they reach Claude – MCP server

Comments

California Becomes Latest State to Weigh Balcony Solar Legislation

Show HN: Audio Toolkit for Agents

Archiving my tweets in my own blog

Show HN: Chromectl – CLI to give an AI agent its own Chrome session

Cursor built this 5 min 3 round Wordle

Show HN: SkillMesh (role-based tool routing for Claude/Codex)

Living with Hyperphantasia

Ask HN: What can people do that intelligent machines will not be able to do?

Show HN: Delta – A disk space analyzer that tracks where your disk space went

Think of BigConfig Package as 'Helm for Everything'

The Epstein Files and the Epstein Class

Background Agents

Show HN: Videolyti – Free video downloader with built-in AI transcription

NIST to introduce restrictions on non-US citizens

Ask HN: Vibecoding feels like playing golf, wdyt?

Is Nvidia's post-Rubin roadmap shifting toward inference-first architectures?

My Favorite 39C3 Talks

Bolt.gives Introduces Free, Agentic AI Coding Platform

Bad Thing Insurance – Coverage for alien abduction, rogue black holes, and AGI

Fast-Servers: An Interesting Pattern?

Reverse engineering "Hello World" in QuickBasic 3.0

Driftwood – friendly AppImage manager for Linux

Cielab Color Space

Show HN: Belora.ai – Generative AI Platform for Images, Art

Foods destroying rainforests, in one simple chart

Show HN: Veracity-Cryptographic data integrity proofs for AI compliance

Show HN: Build a Website for DevOps Learning

Show HN: Colnade – Type-Safe DataFrames for Python

How I approach vibe coding projects to make it not suck

Lil' Fun Langs' Guts