frontpage.

Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining

Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai

Happy Chonking!

Silicon Valley, à la Française

Energy expenditure and obesity across the economic spectrum

TikTok Creator Sued by Sylvanian Doll Maker over Brand Promotions

Ask HN: Time to Pivot Out of Engineering?

Where's Firefox Going Next?

Views of the U.S. have worsened while opinions of China have improved in surveys

BlackRock hit by $52B withdrawal from single client

Tried Comet: Impressive AI Tool with Concerns About Future Risks

Power-seeking, by any person, may be equivalent to minimizing uncertainty

Ask HN: Why Marketing Software Hasn't Had Its 'Cursor Moment' Yet

Fighting Brandolini's Law with Sampling

Differential geometry of ML: a geometric interpretation of gradient descent

Sci-Fi, Fantasy, Fandom in the Norman vs. Lamb Gothic Fantasy Collection

Hacker Residency in Da Nang Vietnam with Tony Dinh

Cambridge academic James Orr: England should be ethnically English [video]

Product Innovation is Discovery not Creation

Mastodon 4.4 Is Here

After a Decade of Chaos, Google Is Finally Getting Its Act Together

How to Cool Down Computers Inside of an A320 [video]

Investigation confirms majority of community grievances in Socfin plantations

AWS launches Kiro, an agentic AI IDE

The FIPS 140-3 Go Cryptographic Module

Piccolo: Powerful async ORM, query builder, and admin GUI

Show HN: Voice AI Based Visa Interview Prep

Novo Nordisk ousts CEO after falling behind in weight loss market

The Enshittification of American Power

Starlink Network Update

How to Track Your Brand Visibility in Perplexity AI Search Engine [video]

Ask HN: Why don't subreddits adopt peer review instead of relying on bans?

CS6265, Summer 2025; Information Security Lab – Binary Exploitation

Show HN: Fast and Quality Code Chunking with Chonkie