frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: PTS Library – Analyze LLM reasoning through "thought anchors"

2•codelion•6h ago
I built PTS (Pivotal Token Search), an open-source library for mechanistic interpretability analysis of language models. The core feature is generating "thought anchors" - identifying which specific sentences in a model's reasoning chain significantly impact task success.

What it does:

- Generates chain-of-thought reasoning traces from any LLM

- Uses counterfactual analysis to measure impact of each reasoning step

- Identifies critical sentences that make-or-break task completion

- Exports semantic embeddings for clustering analysis

- Provides systematic failure mode categorization

Example use case:

I used PTS to compare Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B on math problems and discovered they have fundamentally different reasoning architectures:

- DeepSeek: concentrated reasoning (fewer, high-impact steps)

- Qwen3: distributed reasoning (impact spread across multiple steps)

Quick start:

# Generate thought anchors

pts run --model="your-model" --dataset="gsm8k" --generate-thought-anchors

# Export for analysis

pts export --format="thought_anchors" --output-path="analysis.jsonl"

The library implements the thought anchors methodology from Bogdan et al. (2025) with extensions for:

- Comprehensive metadata collection

- 384-dimensional semantic embeddings

- Causal dependency tracking

- Systematic failure analysis

Why this matters: Most interpretability tools focus on individual tokens or attention patterns. Thought anchors operate at the sentence level, revealing which complete reasoning steps actually matter for getting correct answers.

Limitations: Currently focused on mathematical reasoning tasks. Planning to extend to other domains and larger models.

Links:

- GitHub: https://github.com/codelion/pts

- Research example: https://huggingface.co/blog/codelion/understanding-model-rea...

- Generated datasets: Available on HuggingFace

Would appreciate feedback on extending this to other reasoning domains or interpretability approaches.

Funding for program to stop next Stuxnet from hitting US expired Sunday

https://www.theregister.com/2025/07/22/lapsed_cisa_funding_cybersentry/
1•throw0101d•49s ago•0 comments

Big Tech enters the war business

https://english.elpais.com/economy-and-business/2025-07-21/big-tech-enters-the-war-business-how-silicon-valley-is-becoming-militarized.html
1•belter•50s ago•0 comments

Qwen3‑Coder Unleashed – Agentic Coding's New Powerhouse

https://algogist.com/qwen3-coder-unleashed-agentic-codings-new-powerhouse/
1•jainilprajapati•2m ago•0 comments

Show HN: E2EE Messaging with a Decentralized Microfrontend Architecture

https://positive-intentions.com/blog/decentralised-architecture/
1•Screen8774•3m ago•0 comments

Why is it so hard to export Markdown from Gemini?

https://sundaystopwatch.eu/ai-md/
1•dominicq•4m ago•0 comments

Cerebras Launches Qwen3-235B, Achieving 1,500 Tokens per Second

https://www.cerebras.ai/press-release/cerebras-launches-qwen3-235b-world-s-fastest-frontier-ai-model-with-full-131k-context-support
2•mihau•4m ago•0 comments

How the Application and Request Contexts Work in Python Flask

https://blog.appsignal.com/2025/07/23/how-the-application-and-request-contexts-work-in-flask.html
1•amalinovic•6m ago•0 comments

Victim of an NFT Scam or Cryptocurrency Investment Fraud? Take Action Now

1•charityjonathan•8m ago•0 comments

Short Google

https://tompccs.github.io/blog/2025/07/23/short-google.html
1•tompccs•10m ago•0 comments

Show HN: Limit – Android content blocker which can't be bypassed

https://limitphone.com/
1•richardgill88•11m ago•0 comments

We built fast UPDATEs for ClickHouse – Part 1: Purpose-built engines

https://clickhouse.com/blog/updates-in-clickhouse-1-purpose-built-engines
2•sdairs•16m ago•0 comments

A minimal ASCII art editor, place characters like pixels in a grid

https://glypheditor.com
2•snekcaseenjoyer•24m ago•0 comments

Chinese Car Giants Rush into Brazil with Dreams of Dominating a Continent

https://www.nytimes.com/2025/07/21/climate/china-brazil-electric-vehicles.html
3•bookofjoe•29m ago•1 comments

Nearly 50% of the container images misconfigure the main process (PID 1)

https://twitter.com/kqueue_io/status/1947966356172792103
2•kocyigityunus•29m ago•0 comments

Show HN: Made my first iOS app free offline currency converter

https://apps.apple.com/us/app/currency-converter-offline-cal/id6748880741
2•artiomyak•29m ago•0 comments

China Flexes Muscles at U.N. Cultural Agency, Just as Trump Walks Away

https://www.nytimes.com/2025/07/23/world/asia/unesco-china-us.html
1•JumpCrisscross•30m ago•0 comments

Unsloth – Dynamic 4-bit Quantization

https://unsloth.ai/blog/dynamic-4bit
2•gkbrk•33m ago•0 comments

Lumo, the AI where every conversation is confidential

https://proton.me/blog/lumo-ai
4•pentagrama•33m ago•1 comments

How ant queens are made

https://www.rockefeller.edu/news/38067-how-ant-queens-are-made/
1•hhs•33m ago•0 comments

Open Sauce is a confoundingly brilliant Bay Area event

https://www.jeffgeerling.com/blog/2025/open-sauce-confoundingly-brilliant-bay-area-event
2•rbanffy•34m ago•0 comments

What is X-Forwarded-For and when can you trust it? (2024)

https://httptoolkit.com/blog/what-is-x-forwarded-for/
3•ayoisaiah•38m ago•0 comments

Has Brazil Invented the Future of Money?

https://paulkrugman.substack.com/p/has-brazil-invented-the-future-of
20•Qem•40m ago•7 comments

I tried vibe coding for 30 days (YouTube)

https://www.youtube.com/watch?v=PDMxbbejgcA
3•djaychela•44m ago•1 comments

AI Sandboxes: Daytona vs. Microsandbox

https://pixeljets.com/blog/ai-sandboxes-daytona-vs-microsandbox/
2•jetter•45m ago•0 comments

A new TUI for managing app store reviews

https://github.com/parthematics/rustpond
1•parthchopra•46m ago•1 comments

SV AI Startups Are Embracing China's Controversial '996' Work Schedule

https://www.wired.com/story/silicon-valley-china-996-work-schedule/
4•AndrewDucker•50m ago•0 comments

Choosing the rijght .NET container image for your workload

https://medium.com/@mfundo/all-the-net-core-opsy-things-37b2e21eabb4
1•mfund0•53m ago•0 comments

Show HN: Breakout game from GitHub contributions graph

https://github.com/cyprieng/github-breakout
2•cyprien_g•54m ago•0 comments

Sparse Matrix Library with Compressed Row Storage

https://github.com/uestla/Sparse-Matrix
1•klaussilveira•57m ago•0 comments

Checking Out CPython 3.14's remote debugging protocol

https://rtpg.co/2025/06/28/checking-out-sys-remote-exec/
1•ingve•59m ago•0 comments