frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Open-Source Agentic QA Harness with Memory

https://github.com/vostride/agent-qa
1•pranshuchittora•19s ago•0 comments

I built a free Open Sourced, local audio stem separation

https://github.com/stemdeckapp/stemdeck
1•thclpr•1m ago•0 comments

More Tagged Union Subsets with Comptime in Zig

https://sinclairtarget.com/blog/2026/05/18/even-more-tagged-union-subsets-with-comptime/
1•xngbuilds•2m ago•0 comments

The small sample trap in A/B testing

https://hadid.dev/posts/averages-lie/
1•mustaphah•3m ago•0 comments

Secure Boot Certificate Expiry (Windows and Linux)

https://www.youtube.com/watch?v=_AwzaZmRNsI
1•nullpwr•6m ago•0 comments

The Windows DLL loader lock: how a Rust thread can hang your JVM

https://questdb.com/blog/windows-dll-loader-lock-rust-jni-deadlock/
1•bluestreak•7m ago•0 comments

Prejudice and truth about the effect of testosterone on bargaining behaviour

https://www.nature.com/articles/nature08711
1•mpweiher•11m ago•0 comments

MCP Tool Routing Has a Security Problem Nobody Is Talking About

https://medium.com/@will.jh75/the-hidden-flaws-of-mcp-routing-and-why-we-need-to-talk-about-them-...
1•rogueparticle•12m ago•0 comments

Show HN: Blog post and slideshow automatic generator

https://slidio.xyz/
1•oyaa52•13m ago•0 comments

Bournegol???

https://oldhome.schmorp.de/marc/bournegol.html
1•greyface-•20m ago•0 comments

Blog post: why and how we built local-first with Zero (prev. Replicache)

https://ano.chat/blog/why-we-built-ano-on-zero
1•bill-cupid•21m ago•0 comments

What changes when AI reads you first

https://onomeokajevo.substack.com/p/stop-telling-ai-to-sound-like-you
1•snoren•22m ago•0 comments

One Mars spacecraft, two senators, and a cloud of questions

https://arstechnica.com/space/2026/05/one-mars-spacecraft-two-senators-and-a-cloud-of-questions/
1•rbanffy•22m ago•0 comments

Do you value tight machining in everyday carry knives?

https://www.paragon-knives.com/
1•bgzlsxaz•22m ago•0 comments

Show HN: Resilient, A composable async resilience toolkit for rust

https://github.com/resilient-rs/resilient
2•yofabr•23m ago•0 comments

Extensy – turn any prompt into a monetizable browser extension in 2 minutes

https://extensy.dev/
5•truetemir•26m ago•1 comments

Unprecedented 19 Day Type IV Radio Burst as a Corotating Electron Reservoir

https://iopscience.iop.org/article/10.3847/2041-8213/ae5537
1•fodmap•31m ago•0 comments

Social Media Zero (2017)

https://leejo.github.io/2017/09/27/social_media_zero/
2•chistev•32m ago•0 comments

The US space enterprise is desperately waiting for Starship–will it deliver?

https://arstechnica.com/space/2026/05/the-us-space-enterprise-is-desperately-waiting-for-starship...
2•rbanffy•32m ago•1 comments

I've created a platform where sites get paid not to show ads

https://medium.com/@laurynas.karvelis_95228/is-monetising-your-site-possible-without-serving-ads-...
2•luggage_bazooka•34m ago•0 comments

Elon Musk Loses OpenAI Lawsuit After Jury Finds It Was Filed Too Late

https://firethering.com/elon-musk-openai-lawsuit-lost/
1•steveharing1•36m ago•2 comments

I built a Slack client because:wave: was lagging

https://grant.dev/posts/built-a-slack-client
1•figmert•37m ago•0 comments

AI-driven development – It's a spectrum

https://avohq.io/blog/ai-driven-development-it-s-a-spectrum
1•adrianthedev•38m ago•0 comments

List of price of medieval items (2006)

https://medieval.ucdavis.edu/120D/Money.html
1•downbad_•41m ago•0 comments

Ask HN: How to enforce engineers to understand the code they are shipping

2•hchua•41m ago•2 comments

Why Education Startups Do Not Succeed (2011)

https://avichal.com/2011/10/07/why-education-startups-do-not-succeed/
1•downbad_•42m ago•0 comments

Show HN: Cervantes yet Another HN Reader

https://github.com/nhdez/cervantes
2•pelagicAustral•43m ago•0 comments

The highest ROI activity in AI isn't on your screen

https://layerx.xyz/blog/sim-recap
4•supermalvo•48m ago•0 comments

Hold-to-talk voice input for Pi Coding Agent

https://github.com/codexstar69/pi-listen
1•ankitg12•49m ago•0 comments

Department of Energy ends ALARA

https://nationalinterest.org/blog/energy-world/the-department-of-energy-ends-alara
1•leonidasrup•50m ago•0 comments