frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•7mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Monoidal Hashing for Data Deduplication

https://www.scannedinavian.com/monoidal-hashing.html
1•shae•35s ago•0 comments

Hytale FAQ: Everything You Need to Know About Early Access 2026

https://hytaletop100.com/blog/hytale-faq-everything-you-need-to-know-about-early-access-2026
1•doobie12•1m ago•0 comments

Our Obsession with Statistical Significance Is Ruining Science

https://reason.com/2025/12/01/our-obsession-with-statistical-significance-is-ruining-science/
1•bookofjoe•2m ago•0 comments

Amazon Prime Video pulls eerily emotionless AI-generated anime dubs

https://arstechnica.com/gadgets/2025/12/prime-video-pulls-eerily-emotionless-ai-generated-anime-d...
1•busymom0•3m ago•0 comments

Cellebrite Completes Acquisition of Corellium

https://cellebrite.com/en/cellebrite-completes-acquisition-of-corellium-unveiling-the-industrys-m...
1•Fnoord•7m ago•0 comments

High-efficiency atmospheric water harvesting enabled by ultrasonic extraction

https://www.nature.com/articles/s41467-025-65586-2
1•PaulHoule•7m ago•0 comments

Is Male Infertility Contributing to Falling Birth Rates?

https://undark.org/2025/12/03/male-infertility-birth-rate/
1•EA-3167•9m ago•0 comments

Hopefully Soon" Is Killing Your Dreams

https://mindthenerd.com/hopefully-soon-is-killing-your-dreams-how-to-start-when-life-feels-too-me...
1•surprisetalk•9m ago•0 comments

Building a Copying GC for the Plush Programming Language

https://pointersgonewild.com/2025-11-29-building-a-copying-gc-for-the-plush-programming-language/
1•surprisetalk•10m ago•0 comments

Micron stops selling memory to consumers as demand spikes from AI chips

https://www.cnbc.com/2025/12/03/micron-stops-selling-memory-to-consumers-demand-spikes-from-ai-ch...
1•GeorgeWoff25•13m ago•1 comments

YouTuber Marques Brownlee shutting down phone wallpaper app

https://www.bbc.com/news/articles/cze81xe3r4yo
1•teleforce•13m ago•0 comments

Cellebrite to Acquire Corellium

https://www.corellium.com/blog/cellebrite-to-acquire-corellium
4•Fnoord•15m ago•0 comments

Vibes.vc: Prediction Markets for the YC Batch

https://www.vibetrading.vc/
1•blakeguo•17m ago•1 comments

Micron Is Exiting Its "Crucial" Consumer Business

https://www.servethehome.com/ai-data-center-markets-are-so-big-that-micron-is-exiting-its-crucial...
1•dannyobrien•25m ago•1 comments

Top Journal Retracts Study Predicting Catastrophic Climate Toll

https://www.nytimes.com/2025/12/03/business/economy/study-climate-damage-retracted.html
4•apparent•27m ago•1 comments

The Not So Short Introduction to LaTeX [pdf]

https://tobi.oetiker.ch/lshort/lshort.pdf
1•teleforce•29m ago•0 comments

The LLM Evaluation Guidebook

https://huggingface.co/spaces/OpenEvals/evaluation-guidebook
2•aratahikaru5•33m ago•0 comments

Tor: What we've learned from fighting censorship in Iran and Russia

https://blog.torproject.org/staying-ahead-of-censors-2025/
3•iamnothere•33m ago•0 comments

Ask HN: Which merge tool do you use?

2•axismundi•33m ago•1 comments

Hairstyle try-on landing pages without training your own model

https://www.ailabtools.com/docs/ai-portrait/effects/hairstyle-editor-pro
1•SkrKing•34m ago•0 comments

Drive with "SpongeBob" on Waze

https://blog.google/waze/waze-spongebob/
2•gnabgib•36m ago•0 comments

Alma Telescope engineering logs show spectral normalization is deleting outliers

https://zenodo.org/records/17808349
1•ryanbeem•37m ago•1 comments

Ask HN: Anyone writing code from scratch or mostly doing architecting and LLM?

2•mattfrommars•39m ago•0 comments

Linux 6.19 Goes Ahead and Enables Microsoft C Extensions Support

https://www.phoronix.com/news/Linux-6.19-Enables-MS-Ext
3•mikece•39m ago•0 comments

Lessons learned from the Rust Vision Doc process

https://blog.rust-lang.org/2025/12/03/lessons-learned-from-the-rust-vision-doc-process/
1•mikece•42m ago•0 comments

Framebuffer Modifiers Part 1

https://bwidawsk.net/blog/2021/2/modifiers/
1•jakogut•42m ago•0 comments

President DJT Said He Just Legalized Cheaper, Smaller 'Cute' Kei Cars in America

https://www.theautopian.com/president-trump-said-he-just-legalized-cheaper-smaller-cute-kei-cars-...
3•schmuckonwheels•43m ago•0 comments

From design patterns to category theory (2017)

https://blog.ploeh.dk/2017/10/04/from-design-patterns-to-category-theory/
1•bramadityaw•44m ago•0 comments

Transactions for Ghostty

https://hcb.hackclub.com/ghostty/transactions
1•susam•46m ago•0 comments

Trump Launches Largest Environmental Rollback in U.S. History

https://oilprice.com/Energy/Energy-General/Trump-Launches-Largest-Environmental-Rollback-in-US-Hi...
4•testrun•47m ago•0 comments