frontpage.

Show HN: PTS Library – Analyze LLM reasoning through "thought anchors"

2•codelion•6h ago

I built PTS (Pivotal Token Search), an open-source library for mechanistic interpretability analysis of language models. The core feature is generating "thought anchors" - identifying which specific sentences in a model's reasoning chain significantly impact task success.

What it does:

- Generates chain-of-thought reasoning traces from any LLM

- Uses counterfactual analysis to measure impact of each reasoning step

- Identifies critical sentences that make-or-break task completion

- Exports semantic embeddings for clustering analysis

- Provides systematic failure mode categorization

Example use case:

I used PTS to compare Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B on math problems and discovered they have fundamentally different reasoning architectures:

- DeepSeek: concentrated reasoning (fewer, high-impact steps)

- Qwen3: distributed reasoning (impact spread across multiple steps)

Quick start:

# Generate thought anchors

pts run --model="your-model" --dataset="gsm8k" --generate-thought-anchors

# Export for analysis

pts export --format="thought_anchors" --output-path="analysis.jsonl"

The library implements the thought anchors methodology from Bogdan et al. (2025) with extensions for:

- Comprehensive metadata collection

- 384-dimensional semantic embeddings

- Causal dependency tracking

- Systematic failure analysis

Why this matters: Most interpretability tools focus on individual tokens or attention patterns. Thought anchors operate at the sentence level, revealing which complete reasoning steps actually matter for getting correct answers.

Limitations: Currently focused on mathematical reasoning tasks. Planning to extend to other domains and larger models.

Links:

- GitHub: https://github.com/codelion/pts

- Research example: https://huggingface.co/blog/codelion/understanding-model-rea...

- Generated datasets: Available on HuggingFace

Would appreciate feedback on extending this to other reasoning domains or interpretability approaches.

Funding for program to stop next Stuxnet from hitting US expired Sunday

Big Tech enters the war business

Qwen3‑Coder Unleashed – Agentic Coding's New Powerhouse

Show HN: E2EE Messaging with a Decentralized Microfrontend Architecture

Why is it so hard to export Markdown from Gemini?

Cerebras Launches Qwen3-235B, Achieving 1,500 Tokens per Second

How the Application and Request Contexts Work in Python Flask

Victim of an NFT Scam or Cryptocurrency Investment Fraud? Take Action Now

Short Google

Show HN: Limit – Android content blocker which can't be bypassed

We built fast UPDATEs for ClickHouse – Part 1: Purpose-built engines

A minimal ASCII art editor, place characters like pixels in a grid

Chinese Car Giants Rush into Brazil with Dreams of Dominating a Continent

Nearly 50% of the container images misconfigure the main process (PID 1)

Show HN: Made my first iOS app free offline currency converter

China Flexes Muscles at U.N. Cultural Agency, Just as Trump Walks Away

Unsloth – Dynamic 4-bit Quantization

Lumo, the AI where every conversation is confidential

How ant queens are made

Open Sauce is a confoundingly brilliant Bay Area event

What is X-Forwarded-For and when can you trust it? (2024)

Has Brazil Invented the Future of Money?

I tried vibe coding for 30 days (YouTube)

AI Sandboxes: Daytona vs. Microsandbox

A new TUI for managing app store reviews

SV AI Startups Are Embracing China's Controversial '996' Work Schedule

Choosing the rijght .NET container image for your workload

Show HN: Breakout game from GitHub contributions graph

Sparse Matrix Library with Compressed Row Storage

Checking Out CPython 3.14's remote debugging protocol