frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

TI Calculator Monopoly Offers Lessons for Educators in the Age of Generative AI

https://www.promarket.org/2024/04/08/tis-calculator-monopoly-offers-lessons-for-educators-in-the-...
1•jamesgill•4m ago•0 comments

Show HN: CD-DA Reader, Rust library to read audio CD data

https://github.com/Bloomca/rust-cd-da-reader
1•bloomca•5m ago•0 comments

Keyboard Sounds Pro – Now Available on macOS

https://keyboardsounds.pro/
1•fisc•7m ago•0 comments

Beijing Is Not Playing a Long Game

https://sharptext.net/2026/beijing-is-not-playing-the-long-game/
1•skmurphy•9m ago•1 comments

Amazon Powers ICE. Its Workers Aren't Happy

https://www.motherjones.com/politics/2026/05/amazon-powers-ice-its-workers-arent-happy/
4•cdrnsf•9m ago•0 comments

Ask HN: When will GitHub allow CoPilot AI programmer for new customers again?

1•roschdal•9m ago•0 comments

Does Trade Cause Peace?

https://www.ft.com/content/d141cf41-e47a-4657-9f6b-ed26e6c59ac6
1•paulpauper•10m ago•0 comments

Should We Separate the Art from the Artist?

https://opentodebate.substack.com/p/should-we-separate-art-from-the-artist
1•paulpauper•11m ago•0 comments

Rich People Didn't Look Like This Before

https://www.nytimes.com/2026/04/30/opinion/plastic-surgery-rich-face.html
1•paulpauper•11m ago•0 comments

Show HN: Building self-evolving AI Agents without training

https://getreflect.starlight-search.com
2•akshayballal95•12m ago•0 comments

It's Not a Values Crisis, It's a Housing Crisis

https://maxmautner.com/2026/04/30/housing-crisis-not-values-crisis.html
1•mslate•12m ago•0 comments

Show HN: Aide-memory – persistent memory for AI coding agents and teams

https://www.aide-memory.dev/blog/launch
2•ahmedmeky•16m ago•0 comments

Flue Sandbox Agent Framework

https://flueframework.com
2•kalendos•16m ago•0 comments

Ask HN: If there're so many advanced vibecoders mad at GitHub, where's everyone?

3•foundatron•17m ago•1 comments

When your board member / VC partner leaves?

https://www.motivenotes.ai/p/when-your-vc-leaves-you-are-fundraising
1•swapniljain•18m ago•0 comments

Federation Has a European Legal Problem

https://connectedplaces.online/federation-has-a-european-legal-problem/
2•HotGarbage•19m ago•0 comments

My $5K smart bed needs to shut the hell up

https://www.theverge.com/column/921654/optimizer-eight-sleep-ai-summaries-health-wellness
2•jerlam•22m ago•0 comments

Ask HN: Who wants to be fired? (May 2026)

3•evo_9•23m ago•2 comments

Where to buy a non-Apple, non-Google smartphone

https://www.theregister.com/2026/05/01/buy_a_foss_fondleslab/
2•Bender•23m ago•0 comments

Metropolis 1998 Brings Classic SimCity-style City Building back to life

https://www.generationamiga.com/2026/05/01/metropolis-1998-brings-classic-simcity-style-city-buil...
2•YesBox•24m ago•0 comments

Kanye West Bought an Architectural Treasure Then Gave It a Violent Remix (2024)

https://www.newyorker.com/magazine/2024/06/17/kanye-west-tadao-ando-beach-house-malibu
2•tolerance•24m ago•0 comments

Man versus Horse Marathon

https://en.wikipedia.org/wiki/Man_versus_Horse_Marathon
1•gmays•26m ago•0 comments

Credit Cards Are Vulnerable to Brute Force Kind Attacks

https://metin.nextc.org/posts/Credit_Cards_Are_Vulnerable_To_Brute_Force_Kind_Attacks.html
10•kodbraker•28m ago•4 comments

Seals Detox After a Long Deep Dive

https://nautil.us/how-seals-detox-after-a-long-deep-dive-1280367
3•Brajeshwar•28m ago•0 comments

Will Canada go for a split F-35-Gripen fighter jet fleet?

https://www.espritdecorps.ca/feature/will-canada-go-for-a-split-f-35-gripen-fighter-jet-fleet
1•cf100clunk•29m ago•0 comments

Vulkan 1.4.350 Released with Three New Extensions

https://www.phoronix.com/news/Vulkan-1.4.350
1•Bender•29m ago•0 comments

Consensus Hardening Protocol

https://github.com/Cubiczan/consensus-hardening-protocol
2•cubiczan•30m ago•0 comments

Reflections on the Right Use of School Studies (1951) [pdf]

https://www.themathesontrust.org/papers/christianity/Weil-Reflections.pdf
1•kkoncevicius•31m ago•0 comments

Leaderbored

https://benn.substack.com/p/leaderbored
1•twoodfin•32m ago•0 comments

Teams Python SDK is now GA

https://devblogs.microsoft.com/microsoft365dev/python-support-for-the-microsoft-teams-sdk-is-now-...
1•umangsehgal93•33m ago•0 comments