frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

2Day – A minimal, dark-themed web journal and daily timeline

https://play.google.com/store/apps/details?id=com.loftai.twodaylive&hl=en_US
1•pgjeon•1m ago•0 comments

AI memory systems break at scale

https://tenureai.dev/writing/how-ai-memory-breaks-at-scale/
1•decorner•3m ago•0 comments

UK unveils social media ban for users under 16

https://techcrunch.com/2026/06/15/uk-unveils-sweeping-social-media-ban-for-users-under-16/
1•JimsonYang•3m ago•0 comments

Taleb: Time to Retire Standard Deviation (2014)

https://www.edge.org/response-detail/25401
1•aragonite•4m ago•0 comments

The feedback loops behind Kubernetes

https://planetscale.com/blog/the-feedback-loops-behind-kubernetes
2•farslan•5m ago•0 comments

An AI widget for support, payments, and scheduling

https://www.chatrai.app/
1•sammyjoze1•5m ago•0 comments

Cybersecurity 2026

https://stephengbarr.substack.com/p/the-cybersecurity-landscape-in-2026
1•SGBmedia•7m ago•0 comments

UFC Terror Plot

https://www.youtube.com/watch?v=5nnhalquFbU
1•soupspaces•7m ago•0 comments

Trump invokes Defense Production Act for munitions, supply chains

https://www.reuters.com/world/us/trump-invokes-defense-production-act-munitions-supply-chains-202...
4•petethomas•11m ago•0 comments

Binance set to lose permission to operate in EU

https://www.reuters.com/business/finance/binance-set-lose-eu-licence-bid-permission-offer-service...
1•petethomas•13m ago•0 comments

UK is banning children's social media use. Here's what other countries are doing

https://apnews.com/article/social-media-ban-children-global-glance-40595c56b1431880bd9a50857408ee83
2•1vuio0pswjnm7•16m ago•1 comments

Microsoft Weighs DeepSeek for Copilot Cowork

https://www.axios.com/2026/06/16/microsoft-copilot-cowork-tokenmaxxing-cowork
2•somenameforme•19m ago•0 comments

US holds off blacklisting China's DeepSeek, +100 firms deemed security risks

https://www.reuters.com/world/china/us-holds-off-blacklisting-chinas-deepseek-more-than-100-firms...
2•giuliomagnifico•20m ago•0 comments

Florida AG sues TikTok in latest move against tech giants

https://www.politico.com/news/2026/06/15/florida-lawsuit-tiktok-uthmeier-childrens-safety-00962084
2•1vuio0pswjnm7•21m ago•0 comments

Can gzip be a language model?

https://nathan.rs/posts/gzip-lm/
2•asasidh•25m ago•1 comments

Greenville TX "Welcome Sign" – The Story Behind

https://dallasgateway.com/greenville-tx-welcome-sign/
1•thunderbong•26m ago•0 comments

AmigaOS 2: The Greatest Upgrade

https://www.datagubbe.se/os20up/
1•HotGarbage•29m ago•0 comments

How agentic AI is rewiring Amazon's teams and upending its traditions

https://www.geekwire.com/2026/how-agentic-ai-is-rewiring-amazons-teams-and-upending-its-traditions/
2•Gaishan•30m ago•1 comments

How the UK Plans to Keep Children Off Social Media

https://www.bloomberg.com/news/articles/2026-06-15/social-media-ban-for-uk-s-under-16s-how-will-p...
1•1vuio0pswjnm7•31m ago•0 comments

DOJ claims xAI's gas turbines are a matter of 'national and energy security'

https://techcrunch.com/2026/06/16/doj-claims-xais-unpermitted-gas-turbines-are-a-matter-of-nation...
2•dlgeek•33m ago•1 comments

My router said sonnet. The invoice said fable

https://ax.necmttn.com
1•necmttn•39m ago•0 comments

Algorithmic Information Theory Data Compression Challenge

https://arxiv.org/abs/2606.17712
1•ahsillyme•40m ago•0 comments

Diary of a Disabled DJ – Entry 2

https://samhenrycliff.medium.com/diary-of-a-disabled-dj-entry-2-c64e31907ee5
1•6stringmerc•44m ago•0 comments

Chrome Extension That Disguises Claude as a Google Doc

https://twitter.com/om_patel5/status/2066011171967127643
3•vantareed•44m ago•0 comments

Made verifiable auth with no central credential store to breach. roast it

https://tidesupreme.github.io/
2•sashyo•45m ago•0 comments

Show HN: Loomcycle – a sidecar runtime for AI agents (Go binary, Apache-2.0)

https://github.com/denn-gubsky/loomcycle
2•denn-gubsky•49m ago•0 comments

Semiclassical Gravity Efficiently Solves NP-Complete Problems

https://arxiv.org/abs/2606.14806
5•ascarshen•49m ago•0 comments

Mastra compromised in supply chain attack

https://www.endorlabs.com/learn/mastra-npm-org-compromised-multiple-packages-trojanized-to-drop-a...
3•bugvader•51m ago•2 comments

Token spend is not token capital

https://twitter.com/TJWXF3/status/2067080353081356313
3•tomjwxf•52m ago•0 comments

Not everything needs to be groundbreaking

https://pipsthoughts.substack.com/p/not-everything-needs-to-be-groundbreaking
3•PenPressman•52m ago•1 comments