frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

APL Keyboard on iOS

https://github.com/ebanner/apl-keyboard/tree/main
1•meken•1m ago•0 comments

Show HN: Wind particles on Mapbox from a single EXIF JPEG

https://www.us-wind-particle-map-demo.mapbox-exif-layer.com
1•zifanw9•1m ago•0 comments

PostgreSQL Zero-Downtime Migration: On-Prem to AWS Cloud

https://github.com/JoeyAlpha5/postgres-on-prem-aws-cloud-migration
1•sunbirdLabs•1m ago•0 comments

Show HN: Core Rankings for DBLP Profiles

https://pubtier.com/
1•otrack•4m ago•0 comments

PRs and LLMs

https://gerdzellweger.com/engineering/2026/06/27/prs-and-llms.html
2•gz09•7m ago•0 comments

The feature in OxCaml that more languages should steal

https://theconsensus.dev/p/2026/06/27/the-feature-in-oxcaml-more-languages-should-steal.html
2•g0xA52A2A•10m ago•0 comments

NASA: Explore El Niño

https://science.nasa.gov/earth/explore/el-nino/
1•karakoram•13m ago•0 comments

Apple wants permission to buy memory from a blacklisted Chinese supplier

https://www.theverge.com/tech/958707/apple-ram-buy-memory-blacklisted-china-cxmt
1•ilreb•14m ago•0 comments

Koreeda's 'Sheep in the Box': The ethics of AI resurrection in film

https://www.nippon.com/en/japan-topics/c030329/
1•whiteblossom•16m ago•0 comments

Engineers Cram 100B Transistors onto a Microchip

https://www.science.org/content/article/engineers-cram-100-billion-transistors-microchip
1•karakoram•16m ago•0 comments

The Card That Made the Apple II Serious

https://www.wiseowl.com/articles/a2fpga-videx-01-the-card-that-made-the-apple-ii-serious/
1•js2•16m ago•0 comments

NestJS and Angular SaaS Starter Kit (MIT)

https://github.com/sayahweb2-png/saas-starter-lite
1•firas_sayah•19m ago•0 comments

The abundant but expensive energy source that's under your feet

https://www.bbc.com/news/articles/cj3gj1n8yz8o
1•ksec•19m ago•0 comments

Department of Machine Verification

https://departmentofmachineverification.com/
1•nembal•21m ago•0 comments

Heat Wave in Europe Shatters Records in Denmark, Switzerland and Czech Republic

https://www.cbc.ca/news/world/europe-heatwave-2026-temperature-records-broken-9.7251458
2•runeks•22m ago•0 comments

Chinese Hedge Funds Warn the AI 'Super Bubble' Is Ready to Burst

https://www.bloomberg.com/news/articles/2026-06-26/chinese-hedge-funds-warn-the-ai-super-bubble-i...
2•aggrrrh•26m ago•1 comments

Resilient Minds

https://unesdoc.unesco.org/ark:/48223/pf0000394849
2•jruohonen•29m ago•0 comments

How we made parallel pytest safe for multi-tenant agent swarms

https://equatorops.com/resources/blog/parallel-pytest-agent-swarms
1•bobjordan•29m ago•0 comments

How Europe Became the World Champion of Heat Deaths

https://maartenboudry.substack.com/p/how-europe-became-the-world-champion
2•paulpauper•30m ago•0 comments

Ancient Tablets Show Markets Worked 4k Years Before Economists Explained Them

https://thedailyeconomy.org/article/ancient-clay-tablets-show-markets-worked-4000-years-before-ec...
1•paulpauper•30m ago•0 comments

Do Not Rule Out What Scares You [video]

https://andreagibson.substack.com/p/commencement
1•mooreds•40m ago•0 comments

CarvePHP – Find service boundaries in Laravel monoliths

https://packagist.org/packages/carvephp/carve
1•mwaleedkhalil•41m ago•0 comments

Social Structure and Anomie [pdf]

https://selfteachingresources.pbworks.com/f/Social+Structure+and+Anomie+-+Merton.pdf
1•jruohonen•45m ago•0 comments

Judgment Cannot Be Prompted

https://www.tsoon.com/posts/judgment-cannot-be-prompted/
2•mooreds•45m ago•0 comments

Traditions of Islantilla

https://www.islantilla.es/en/tradiciones/
1•mooreds•46m ago•0 comments

Gavin Newsom opposes a California wealth tax. He's proposing a national tax

https://www.cnn.com/2026/06/26/politics/gavin-newsom-billionaire-tax-california
2•Bender•51m ago•3 comments

Napster is now "AI agents you can see, talk to, and create with."

https://www.napster.com
2•romanhn•51m ago•1 comments

The Defender's Dilemma

https://markferraz.com/perspective
1•markferraz•53m ago•0 comments

How to find an investor and do it right?

https://www.goglobal.world/
1•MaxPopovggw•54m ago•0 comments

Linux MD RAID5 Seeing Scalability Improvements Up to 17%

https://www.phoronix.com/news/Linux-MD-RAID5-Scalability-Work
2•Bender•56m ago•0 comments