frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Why 'quantum proteins' could be the next big thing in biology

https://www.nature.com/articles/d41586-026-00662-1
1•bookofjoe•46s ago•1 comments

Implementing a Virtual Filesystem over Elasticsearch

https://leoniemonigatti.com/blog/virtual-filesystem-elasticsearch.html
1•eigenBasis•2m ago•0 comments

Sparse Cholesky Elimination Tree

https://www.reidatcheson.com/sparse/linear/cholesky/2026/04/09/etree.html
1•selimthegrim•5m ago•0 comments

Palette Masters: 46 Zed themes derived from master painter color palettes

https://github.com/regnull/palette-masters-zed
1•regnull•7m ago•1 comments

Abstract Machines for Logic Programs

https://chrisistyping.bearblog.dev/abstract-machines-for-logic-programs/
2•surprisetalk•9m ago•0 comments

Rex is a secure script execution engine that uses Cedar policies

https://github.com/trusted-remote-execution/trusted-remote-execution
1•mooreds•10m ago•0 comments

Do houseplants improve air quality?

https://www.economist.com/science-and-technology/2026/05/08/do-houseplants-improve-air-quality
2•andsoitis•11m ago•0 comments

Analysis points to a unexpected cause of reading difficulties

https://phys.org/news/2026-05-years-struggles-obvious-massive-analysis.html
1•wglb•13m ago•1 comments

Grok 4.3

https://docs.x.ai/developers/models
1•webninja•14m ago•0 comments

AI creates a fearsome cold-war-style dilemma

https://www.economist.com/china/2026/05/07/ai-creates-a-fearsome-cold-war-style-dilemma
1•andsoitis•14m ago•0 comments

Visit a Mosque in video game format and learn things about Islam

https://islamicsystems.itch.io/the-mosque-visit
1•JSLegendDev•18m ago•0 comments

Build Your Own ALU

https://virissimo.info/build-your-own-alu/
1•virissimo•22m ago•1 comments

Why saying hello to strangers can be good for you

https://text.npr.org/g-s1-119761
1•1659447091•22m ago•0 comments

DeepInfra raises $107M Series B

https://deepinfra.com/blog/deepinfra-series-b
1•didon•24m ago•0 comments

Some gene therapies no longer require clinical trials, thanks to new FDA rule

https://www.livescience.com/health/some-gene-therapies-no-longer-require-clinical-trials-thanks-t...
2•geox•25m ago•0 comments

What Software Engineers Can Learn from the Aviation Industry

https://mwalterskirchen.dev/blog/piloting-agentic-engineering/
1•JSLegendDev•26m ago•0 comments

Chasing Chicago's movable bridges (2014)

https://aresluna.org/seesaws-for-giants/
1•NaOH•27m ago•0 comments

Windows 11 is getting faster the lazy way

https://www.neowin.net/opinions/windows-11-is-getting-faster-the-lazy-way/
1•bundie•29m ago•0 comments

A multiplayer poll where you have 10 seconds to answer

https://polls.araoz.net/poll/pilot
1•maraoz•35m ago•0 comments

Did the Soviets Collude in the 1953 Candidates Tournament?

https://lichess.org/@/RuyLopez1000/blog/did-the-soviets-collude-in-the-1953-candidates-tournament...
2•fzliu•44m ago•0 comments

How to block unwanted outbound traffic from your containers

https://blog.dera.page/posts/dockerwall/
1•Mubelotix•44m ago•0 comments

Engineering Recovery

https://systemsthinkingcollection.substack.com/p/the-three-legged-stool
1•InputName•50m ago•0 comments

Mark Zuckerberg Told 8k Employees Their Layoffs Are a Line Item in AI Bill

https://247wallst.com/investing/2026/05/08/mark-zuckerberg-just-told-8000-employees-their-layoffs...
2•spankibalt•55m ago•3 comments

60fps Video on a CGA? – The GlyphBlaster

https://martypc.blogspot.com/2026/05/60fps-video-on-cga-glyphblaster.html
1•tambourine_man•57m ago•0 comments

Xi's Forever Purge

https://www.foreignaffairs.com/china/xis-forever-purge
2•areoform•57m ago•1 comments

Physics experiment hints at the existence of 'anyon' particles

https://www.sciencedaily.com/releases/2026/05/260508003131.htm
1•johndunne•1h ago•0 comments

Catatumbo Lightning

https://en.wikipedia.org/wiki/Catatumbo_lightning
1•nomilk•1h ago•0 comments

Deskbrid – Linux desktop control over a Unix socket, for agents and scripts

https://github.com/coe0718/deskbrid
3•coe0718•1h ago•1 comments

Kendrick Lamar to an Italian Poet

https://www.youtube.com/watch?v=q5dqCeNEIFU
2•lawgimenez•1h ago•0 comments

24/7 AI-powered radio station. Generates music, writes hosted breaks,speaks them

https://github.com/keltokhy/writ-fm
2•pseudolus•1h ago•0 comments