frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

New charter gives River Wye the right to be free from pollution

https://www.bbc.co.uk/news/articles/czx21820rn4o
1•susam•6m ago•0 comments

Yocto vs. Debian for building embedded Linux systems

https://sigma-star.at/blog/2026/05/you-probably-dont-need-yocto-and-thats-fine/
2•fanf2•12m ago•0 comments

Building a game engine for 20 years [video]

https://www.youtube.com/watch?v=4d-CKaBpLC4
1•AshleysBrain•13m ago•0 comments

Zig: Build System Reworked

https://ziglang.org/devlog/2026/#2026-05-26
3•tosh•15m ago•1 comments

Thunderbolt-Ibverbs: InfiniBand for Everyone

https://blog.hellas.ai/blog/thunderbolt-ibverbs/
2•grw_•16m ago•0 comments

Rsync 3.4.3 has hundreds of Claude commits

https://mastodon.gamedev.place/@JeremiahFieldhaven/116654345332213390
10•fooker•21m ago•1 comments

Apple working to cram Gemini model into iPhone to power new Siri

https://arstechnica.com/ai/2026/05/apple-reportedly-trying-to-distill-googles-multi-trillion-para...
2•TMWNN•21m ago•0 comments

How we run Gemini at scale across billions of posts

https://www.modash.io/engineering/how-we-run-gemini-at-scale-across-billions-of-posts
1•igarnedo•22m ago•0 comments

How many emails should be in the waitlist before launching an application?

1•dash_ai•22m ago•1 comments

Microsoft wants you to share your health symptoms with its new Copilot tool

https://www.xda-developers.com/microsoft-wants-you-to-share-your-symptoms-with-its-new-copilot-he...
1•01-_-•27m ago•0 comments

ICE to keep an eye on your eyes under $25M biometric scanner deal

https://www.theregister.com/public-sector/2026/05/29/ice-awards-bi2-25m-contract-for-1570-biometr...
1•01-_-•28m ago•0 comments

Putin's $26B Quest for Longevity

https://www.wsj.com/world/russia/putin-longevity-antiaging-92dee6e8
1•kubami•30m ago•0 comments

Best OLM to PST Converter Tool to Convert Mac OLM to PST

https://apps.microsoft.com/detail/9n7jk7z3546j?hl=en-US&gl=US
1•tieanderson•30m ago•0 comments

Mercedes-Benz may be shut out of U.S. market due to Chinese ownership

https://www.cnbc.com/2026/05/29/mercedes-benz-ban-congressional-bill-china-ownership.html
1•KnuthIsGod•32m ago•0 comments

Meta Lays Off 8k Employees, as A.I. Casualties Mount

https://www.nytimes.com/2026/05/19/technology/meta-layoffs-ai.html
2•tagyro•35m ago•1 comments

The true power of regular expressions (2012)

https://www.npopov.com/2012/06/15/The-true-power-of-regular-expressions.html
1•downbad_•42m ago•1 comments

'Mind-blowing': Iron-rich immune cells help homing pigeons navigate

https://www.science.org/content/article/mind-blowing-iron-rich-immune-cells-help-homing-pigeons-n...
3•XzetaU8•49m ago•0 comments

The SLAX Scripting Language: An Alternate Syntax for XSLT

http://juniper.github.io/libslax/slax-manual.html
1•thefilmore•53m ago•0 comments

Danish pension fund excludes SpaceX citing governance and valuation

https://www.reuters.com/legal/transactional/danish-pension-fund-excludes-spacex-citing-governance...
26•vrganj•53m ago•4 comments

Tesla Self-Certifies Level 4 Autonomous Vehicles in Texas

https://www.notateslaapp.com/news/4216/tesla-self-certifies-l4-autonomy-in-texas
13•frankacter•55m ago•1 comments

Sana high-resolution image and video generation from NVidia

https://github.com/NVlabs/Sana
1•andsoitis•55m ago•0 comments

Privacy and security on computing devices need to become far stronger

https://xcancel.com/GrapheneOS/status/2044440381803069778#m
14•Cider9986•58m ago•0 comments

A $2k AI-generated film will make its debut at Tribeca

https://www.theverge.com/entertainment/939067/ai-film-dreams-of-violets-tribeca
2•fuzzythinker•1h ago•0 comments

xPrize Launches Hackathon with $2M Prize Pool, Backed by Google

https://www.xprize.org/news/xprize-launches-hackathon-with-2-million-prize-pool-backed-by-google
2•T-A•1h ago•0 comments

Stanford scientists just built a room-temperature quantum device

https://maketecheasier.com/stanford-scientists-just-built-a-room-temperature-quantum-device-that-...
1•SVI•1h ago•1 comments

An Excruciatingly Detailed Guide to SSH

https://grahamhelton.com/blog/ssh-cheatsheet
4•thunderbong•1h ago•0 comments

Virtual Railfan

https://virtualrailfan.com:443/
2•tkgally•1h ago•0 comments

Expanding the lifespan of solid-state batteries

https://www.mpg.de/26391218/how-dendrites-shorten-the-lifespan-of-solid-state-batteries
2•croes•1h ago•0 comments

LLM Paper Trading

https://gertlabs.com/spectate?game=trading
6•gertlabs•1h ago•4 comments

Explosives Synthesis, Ricin Production and Anatomical Neutralization Protocols

https://vostoktechnicalbureau.substack.com/p/red-team-technical-dossier-operational
1•VostocBuraeu•1h ago•0 comments