frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

The High School Pipeline to South Korea's Chip-Making Fortunes

https://www.nytimes.com/2026/06/26/business/korea-chip-high-school.html
1•ripe•5m ago•0 comments

Writing an Immutable WASM VM

https://astledsa.substack.com/p/writing-an-immutable-wasm-vm
1•astledsa•8m ago•0 comments

The Search for El Dorado: How DS and ML Unlocked the Markets of Latin America

https://medium.com/@alanscottencinas/how-data-science-unlocked-latin-americas-hidden-markets-8e6c...
1•encinas88•9m ago•0 comments

Is the U.S. Labor Force Nearing Its Peak?

https://fedinprint.org/item/fedkeb/103458
1•toomuchtodo•11m ago•1 comments

Technological Involution

https://rohan.ga/blog/technological-involution/
2•ocean_moist•12m ago•0 comments

When Impressive Performance Gains Do Not Matter

https://blog.colinbreck.com/when-impressive-performance-gains-do-not-matter/
2•matheusmoreira•18m ago•0 comments

Caution: Content Warnings Do Not Reduce Distress, Study Shows (2023)

https://www.psychologicalscience.org/news/2023-october-content-warnings-distress.html
3•Teever•19m ago•1 comments

'Perfumed Palaces: Purgatory Simulator' "found" roguelike vibe-coded in 30 days

https://store.steampowered.com/app/4802100/Perfumed_Palaces_Purgatory_Simulator/
2•MUTHRI•21m ago•0 comments

Plaintiffs allege the use of scents discriminates against them under the Ada

https://www.law.com/corpcounsel/2026/06/28/hotels-face-a-new-kind-of-disability-lawsuit-over-frag...
1•cnst•22m ago•1 comments

Markit – a fast, native Markdown editor (Go and Wails)

https://markit-md.vercel.app/
1•bilal-shemsu•26m ago•0 comments

Tech Morality Is Hard

https://forkingmad.blog/tech-morality-is-hard/
2•birdculture•27m ago•1 comments

GrapheneOS cites Hyundai, Kia as it pressures Volkswagen over app block

https://cyberinsider.com/grapheneos-cites-hyundai-kia-as-it-pressures-volkswagen-over-app-block/
1•Cider9986•28m ago•1 comments

Show HN: Vibe zsh, turn natural language into shell commands

https://github.com/skymoore/vibe-zsh
2•iamsky•28m ago•0 comments

Show HN: Crudio – Turn an OpenAPI spec into a stateful mock back end

https://github.com/enricodeleo/crudio
1•enricodeleo•29m ago•0 comments

WhatsApp opens username reservations ahead of feature rollout

https://cyberinsider.com/whatsapp-opens-username-reservations-ahead-of-feature-rollout/
1•Cider9986•30m ago•0 comments

Gino Bartali – Some medals are made to hang on the soul, not the jacket

https://en.wikipedia.org/wiki/Gino_Bartali
1•lifeisstillgood•31m ago•0 comments

Compiler-Assisted Floating-Point Error Analysis and Profiling with FPChecker

https://fpanalysistools.org/ISC26/
1•matt_d•35m ago•0 comments

OpenClaw Is Now on iOS and Android

https://twitter.com/openclaw/status/2071688039114342592
2•minimaxir•37m ago•0 comments

Meta Posed as Teens to Prompt Rival Chatbots About Suicide, Sex, and Drugs

https://www.wired.com/story/meta-contractors-pretending-to-be-teens-chatbot-testing/
5•meander_water•38m ago•1 comments

Show HN: Hawkeye – local code search for MS/Linux, fast 500k+ file codebase

https://www.zaragsoft.se
2•hawkeye_zarag•38m ago•0 comments

macOS Golden Gate icon comparison

https://basicappleguy.com/basicappleblog/macos-golden-gate-icon-comparison
3•tzmlab•44m ago•0 comments

Super Micro Raided as Taiwan Expands Chip Smuggling Probe

https://www.bloomberg.com/news/articles/2026-06-29/super-micro-office-raided-as-taiwan-expands-ch...
4•voxadam•46m ago•0 comments

HamsterOS – Mean Hamster Software

https://www.meanhamster.com/products/hamsteros
4•colinprince•51m ago•1 comments

I Built a Voice API So My Mother Could Book a Taxi in Lomé(Togo)

https://kuma.compeel.com/
2•catdieng•52m ago•0 comments

Basha256.sh – Pure Bash 3.2 implementation of sha256

https://gist.github.com/ozkatz/dc7606ea68138b75999cbb1b271072a6
2•ozkatz•52m ago•0 comments

Apple iPhone 18 Pro supplier list, parts and photos exposed in Tata data leak

https://www.reuters.com/business/media-telecom/apple-iphone-18-pro-supplier-list-parts-photos-exp...
3•neilfrndes•53m ago•0 comments

Gemini's personalized AI image generation is now free for US users

https://techcrunch.com/2026/06/29/geminis-personalized-ai-image-generation-is-now-free-for-u-s-us...
2•haritha1313•54m ago•0 comments

Show HN: OpenSfM v1.0

https://github.com/OpenSfM/OpenSfM
2•AlgerianSam•57m ago•0 comments

Show HN: An Institutional Terminal for Retail Investors

https://marketterminal.com/chart
2•adamfontan•57m ago•0 comments

Ask HN: What's SOTA for AI Voice Narration

2•JimsonYang•59m ago•0 comments