frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

I made realtime raytraced reflections for my portfolio site

https://taggartmaher.com/projects/this-website
1•xialo•41s ago•0 comments

Show HN: Agetor - An open-source Harness Orchestrator

https://github.com/alamops/agetor
1•drakochack•2m ago•0 comments

Do LLMs hold the opinions they give you?

https://twitter.com/pandya_marut/status/2056151597642838487
1•mwiki•10m ago•0 comments

Does Wellington have the most beautiful commute in the world?

https://www.rnz.co.nz/news/regions_wellington/595529/cyclists-commute-at-sunrise-on-the-beautiful...
1•colinprince•11m ago•0 comments

Do less so you can do better (2020)

https://www.indiehackers.com/post/do-less-so-you-can-do-it-better-3ea77e92b4
1•chr15m•14m ago•0 comments

WriteUp: 16 Bytes of x86 that turn Matrix rain into sound

https://hellmood.111mb.de//wake_up_16b_writeup.html
1•HellMood•17m ago•0 comments

Scientists have invented a way to erase bad memories. But should we?

https://www.sciencefocus.com/future-technology/erase-painful-memories
2•amichail•25m ago•1 comments

Rust Async and the Arm Generic Timer

https://thejpster.org.uk/blog/blog-2026-05-17/
1•hasheddan•25m ago•0 comments

Dealing with a Fake a World

1•morpheos137•25m ago•0 comments

Design posters showcasing your country's electrical grid

https://github.com/open-energy-transition/grid2poster
2•lyoncy•40m ago•0 comments

100% Vibe Code

https://humansvsai.io
1•creatorcuffee•41m ago•0 comments

Why ML is a metaphor for life

https://adeshpande3.github.io/Why-Machine-Learning-is-a-Metaphor-For-Life
2•_josh_meyer_•44m ago•0 comments

Finding the Time on AArch32

https://thejpster.org.uk/blog/blog-2026-05-16/
1•hasheddan•44m ago•0 comments

Meta AI Incognito Mode Chats

https://about.fb.com/news/2026/05/incognito-chat-whatsapp-meta-ai/
1•hdjY28•44m ago•0 comments

Financial Services Hackathon for Autism

https://fsi-hack4autism.github.io
1•hbcondo714•47m ago•0 comments

I reduced my OpenClaw token usage by 10x

https://brtkwr.com/posts/2026-05-17-reducing-openclaw-token-usage/
1•brtkwr•50m ago•0 comments

I want to talk population panic and understanding demography

https://bsky.app/profile/karenguzzo.bsky.social/post/3lor4lsga4c2c
2•doener•59m ago•0 comments

EPI – forensic evidence containers for AI agentSCITT-compatible, EU AI Act-ready

https://github.com/mohdibrahimaiml/epi-recorder
1•afridi_epilabs•59m ago•0 comments

Dealing with Incomplete Copyleft Source That Doesn't Correspond

https://sfconservancy.org/blog/2026/may/17/incomplete-corresponding-source-code-copyleft-agpl/
2•jacquesm•1h ago•0 comments

ElliQ is a surprisingly helpful companion robot for older adults

https://www.theverge.com/gadgets/928806/elliq-intuition-robotics-hands-on
1•sohkamyung•1h ago•0 comments

Just make your own usage brake for Cloudflare

https://github.com/eap5662/serverless-sentinel
1•es0teric5662•1h ago•1 comments

Court grants Musk's bid to add Craig Federighi to Apple/OpenAI lawsuit

https://9to5mac.com/2026/05/15/court-grants-musks-bid-to-add-craig-federighi-to-apple-openai-laws...
4•hdjY28•1h ago•1 comments

Fabricked: Misconfiguring Infinity Fabric to Break AMD SEV-SNP

https://xca-attacks.github.io/fabricked/
9•negura•1h ago•1 comments

So you want to deploy FN-DSA

https://keymaterial.net/2026/05/13/so-you-want-to-deploy-fn-dsa/
1•contact9879•1h ago•0 comments

Grounding AI shopping agents using personas learned from raw clickstream data

https://arxiv.org/abs/2605.14205
1•memoriesdotzip•1h ago•0 comments

The Law of Leaky Abstractions (2002)

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/
1•oldmanrahul•1h ago•0 comments

Anthropic's $1.5B Settlement with Publishers

https://abhishek-shankar.com/posts/the-pirated-corpus-was-always-a-balance-sheet-item
1•catstyler•1h ago•0 comments

Prowl: Native macOS codings agent orchestrator

https://tangled.org/onev.cat/Prowl
1•nerdypepper•1h ago•0 comments

LeakyLM: AI Assistants Are Leaking Your Conversations

https://leakylm.github.io/
1•lucasluitjes•1h ago•0 comments

Review: Cultivated Salmon

https://justismills.substack.com/p/review-cultivated-salmon
1•networked•1h ago•0 comments