frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•7mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Hyper-Util Composable Pools

https://seanmonstar.com/blog/hyper-util-composable-pools/
1•todsacerdoti•45s ago•0 comments

The case for taking the giving what we can pledge

https://benthams.substack.com/p/a-life-that-cannot-be-a-failure
1•paulpauper•1m ago•0 comments

A Governance Innovation Crisis

https://www.overcomingbias.com/p/a-governance-innovation-crisis
1•paulpauper•3m ago•0 comments

The Scramble for the Seafloor

https://www.nybooks.com/online/2025/12/10/the-scramble-for-the-seafloor/
1•mitchbob•6m ago•1 comments

Hashcards: A Plain-Text Spaced Repetition System

https://borretti.me/article/hashcards-plain-text-spaced-repetition
1•thomascountz•6m ago•0 comments

Ask HN: What Are You Working On? (December 2025)

1•david927•6m ago•0 comments

Elon Musk Is Wrong About Basic Income and Crime: Here Is the Evidence He Ignored

https://scottsantens.substack.com/p/elon-musk-is-wrong-about-universal-basic-income-ubi-and-crime
1•2noame•7m ago•0 comments

Nippon Steel's Acquisition of US Steel: A $15B Deal

https://imaa-institute.org/blog/nippon-steels-acquisition-of-us-steel/
1•eatonphil•8m ago•0 comments

Job apocalypse? Humbug AI is creating new occupations

https://www.economist.com/business/2025/12/14/job-apocalypse-humbug-ai-is-creating-brand-new-occu...
1•edward•8m ago•0 comments

The Twelve Slices of Christmas: How Vasco Chained the Chaos

https://perladvent.org/2025/2025-12-14.html
1•oalders•11m ago•1 comments

Inside The Dark and Predatory World of Crypto Casinos

https://www.nytimes.com/interactive/2025/12/09/us/crypto-casinos-gambling-streamers.html
1•thm•12m ago•0 comments

The next version of the web will be built for machines, not humans

https://www.economist.com/interactive/science-and-technology/2025/12/10/the-next-version-of-the-w...
1•edward•12m ago•0 comments

The best software podcast episodes I ever heard

https://thundergolfer.com/ten-best-software-podcast-episodes
2•jonobelotti•13m ago•0 comments

I added native time awareness to CrewAI to fix LLM date hallucinations

https://github.com/crewAIInc/crewAI/pull/4082
1•sherwin27•13m ago•1 comments

What Does Hadolint Do?

https://hadolint.com/what-does-hadolint-do/
1•mooreds•14m ago•0 comments

The Creation of America's Car Culture [audio]

https://thewaroncars.org/2025/11/11/episode-161-the-creation-of-americas-car-culture/
1•mooreds•15m ago•0 comments

Show HN: Llmwalk – explore the answer-space of open LLMs

https://github.com/samwho/llmwalk
1•samwho•16m ago•0 comments

Record $4.4B flows into Israeli cybersecurity as global VCs outpace locals in 25

https://www.ynetnews.com/business/article/rjggjusz11g
1•myth_drannon•19m ago•0 comments

Rust Coreutils 0.5.0 Release: 87.75% compatibility with GNU Coreutils

https://github.com/uutils/coreutils/releases/tag/0.5.0
3•maxloh•21m ago•1 comments

Carlito's Way

https://zmef.freeshell.org/carlitoway.html
2•zmef•23m ago•1 comments

Could a 5-day RTO be around the corner for Big Tech?

https://blog.pragmaticengineer.com/the-pulse-could-a-5-day-rto-be-around-the-corner-for-big-tech/
3•srijan4•24m ago•0 comments

A basic implementation of a virtual continuum fingerboard

https://continuum.awalgarg.me
1•todsacerdoti•25m ago•0 comments

Kaniko – Build Container Images in Kubernetes

https://github.com/osscontainertools/kaniko
1•bixilon•30m ago•0 comments

In Defense of Papyrus

https://designforhackers.com/blog/papyrus-font/
1•thimabi•30m ago•0 comments

FamFS Hopes to Go Upstream in 2026

https://www.phoronix.com/news/FamFS-2026-Upstream-Hopes
1•Bender•32m ago•0 comments

Transmutation Challenge

https://vinyasi.substack.com/p/transmutation-challenge
1•vinyasi•33m ago•0 comments

Show HN: CodeContext – Cut developer onboarding time from months to weeks

https://github.com/sonii-shivansh/CodeContext
1•shivanshsonii•33m ago•0 comments

FDA drug trials exclude a widening slice of Americans

https://medicalxpress.com/news/2025-12-fda-drug-trials-exclude-widening.html
3•bikenaga•33m ago•1 comments

I wrote JustHTML using coding agents

https://friendlybit.com/python/writing-justhtml-with-coding-agents/
1•simonw•35m ago•1 comments

Misinformation is an inevitable biological reality across nature

https://phys.org/news/2025-12-misinformation-inevitable-biological-reality-nature.html
1•Brajeshwar•37m ago•3 comments