What it does:
- Generates chain-of-thought reasoning traces from any LLM
- Uses counterfactual analysis to measure impact of each reasoning step
- Identifies critical sentences that make-or-break task completion
- Exports semantic embeddings for clustering analysis
- Provides systematic failure mode categorization
Example use case:
I used PTS to compare Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B on math problems and discovered they have fundamentally different reasoning architectures:
- DeepSeek: concentrated reasoning (fewer, high-impact steps)
- Qwen3: distributed reasoning (impact spread across multiple steps)
Quick start:
# Generate thought anchors
pts run --model="your-model" --dataset="gsm8k" --generate-thought-anchors
# Export for analysis
pts export --format="thought_anchors" --output-path="analysis.jsonl"
The library implements the thought anchors methodology from Bogdan et al. (2025) with extensions for:
- Comprehensive metadata collection
- 384-dimensional semantic embeddings
- Causal dependency tracking
- Systematic failure analysis
Why this matters: Most interpretability tools focus on individual tokens or attention patterns. Thought anchors operate at the sentence level, revealing which complete reasoning steps actually matter for getting correct answers.
Limitations: Currently focused on mathematical reasoning tasks. Planning to extend to other domains and larger models.
Links:
- GitHub: https://github.com/codelion/pts
- Research example: https://huggingface.co/blog/codelion/understanding-model-rea...
- Generated datasets: Available on HuggingFace
Would appreciate feedback on extending this to other reasoning domains or interpretability approaches.