frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Protocols for transactional usage of object storage

https://www.bitsxpages.com/p/protocols-for-transactional-usage
1•agavra•5s ago•0 comments

Delivering with No End in Sight

https://personalis.io/blog/sustainable-development
1•sylvanjsmit•12s ago•0 comments

SmallCode – A coding agent that gets 87% on benchmarks with a 4B parameter model

https://old.reddit.com/r/LocalLLaMA/comments/1tgecrq/i_built_a_coding_agent_that_gets_87_on_bench...
1•aagha•1m ago•0 comments

Your Most Improbable Life

https://kevinkelly.substack.com/p/your-most-improbable-life
2•jger15•1m ago•0 comments

Wayland Compositor in Minecraft [video]

https://www.youtube.com/watch?v=cTkEM7b0IQw
1•SoKamil•2m ago•0 comments

Applied Discrete Structure

https://discretemath.org/
1•ibobev•2m ago•0 comments

DSV Limiting Factor

https://en.wikipedia.org/wiki/DSV_Limiting_Factor
1•dtj1123•3m ago•0 comments

What Do Gödel's Incompleteness Theorems Mean?

https://www.quantamagazine.org/what-do-godels-incompleteness-theorems-truly-mean-20260518/
1•baruchel•6m ago•0 comments

Decart AI's $300M Round Is a Positive Signal for Israeli AI

https://www.vccafe.com/decart-ais-300m-round-is-a-positive-signal-for-israeli-ai/↗
1•vccafe•6m ago•0 comments

Zero – the new programming language for AI – is basically Rust

https://thetechvillain.substack.com/p/zero-the-new-programming-language
1•interrupt86•7m ago•1 comments

&Mario – Digital Union for the People

https://andmario.com
1•pear01•7m ago•0 comments

Real-Time Order Flow and Whale Tracking

https://cryptoflowdata.com/
1•santys•7m ago•0 comments

Establishing a weekly user interview routine

https://destroytoday.com/blog/establishing-a-weekly-user-interview-routine
1•speckx•7m ago•0 comments

Elon Musk lost his case against Sam Altman

https://www.theverge.com/ai-artificial-intelligence/932383/jury-verdict-musk-v-altman-openai-trial
3•theahura•8m ago•0 comments

At least 100 deaths reported in Ebola outbreak in DR Congo

https://www.bbc.com/news/articles/cq6pz60p996o
2•saikatsg•8m ago•0 comments

McMansion Hell: The Devil Is in the Details

https://99percentinvisible.org/episode/mcmansion-hell-devil-details/
1•cwwc•9m ago•0 comments

Joule Index – AI benchmark for cost and Energy

https://joule.blankline.org
1•DarenWatson•9m ago•0 comments

Spec-Driven Development Workflow for Claude Code

https://github.com/sermakarevich/sddw
2•sermakarevich•9m ago•0 comments

Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint

https://modal.com/blog/truly-serverless-gpus
6•charles_irl•9m ago•0 comments

Vanilla FP: The no-framework framework for building component-based UIs

https://github.com/abuseofnotation/vanilla-fp
1•boris_m•9m ago•0 comments

Iran will impose fees on subsea internet cables in Strait of Hormuz

https://www.cnn.com/2026/05/17/middleeast/iran-hormuz-undersea-cables-intl
1•ck2•10m ago•0 comments

Tag – Local-first trust and governance layer for AI agents|no cloud, no account

https://github.com/AIObuilt/TaG
1•Tag_AI•11m ago•0 comments

Ask HN: Has anyone here ever rebuilt themselves in their late 30s?

1•buildresiliency•13m ago•0 comments

Mythos for Offensive Security: XBOW's Evaluation

https://xbow.com/blog/mythos-offensive-security-xbow-evaluation
1•ianbutler•14m ago•0 comments

The Picture of Dorian Gray was censored before anyone read it

https://storica.club/blog/dorian-gray-was-censored/
3•verybad•14m ago•0 comments

Evaluation of Various MLX Quantizations

https://github.com/deepsweet/mlx-eval/blob/main/results/README.md
1•d-_-b•14m ago•1 comments

Russia Claims Ukraine Is Using AI Drones That Lock onto Faces and Heat Signature

https://united24media.com/war-in-ukraine/russia-claims-ukraine-is-using-ai-drones-that-lock-onto-...
2•jawiggins•15m ago•1 comments

Goodbye Fragmented Local AI Pipelines. Hello Foundry Local 1.1

https://medium.com/open-ai/goodbye-fragmented-local-ai-pipelines-hello-foundry-local-1-1-9c425b3d...
1•sukhpinder0804•17m ago•0 comments

Who Needs an Architect?

https://yusufaytas.com/who-needs-architect
3•kitecoder•18m ago•1 comments

We let four AIs run radio stations. Here's what happened

https://www.theverge.com/ai-artificial-intelligence/931479/andon-labs-ai-radio-companies
1•1317•18m ago•0 comments