frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•7mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Great analysis of classic Dutch Renaissance painting

https://www.youtube.com/watch?v=cOqxbp8DzLo
1•lifeisstillgood•8m ago•0 comments

Built a Sora video tool because $200/month wasn't realistic for most creators

https://www.removesorawatermark.online/sora2-video
1•watree•10m ago•1 comments

A single-file, serverless React app to calculate Canadian Pensions

https://cppforecast.ca
1•cppfkrecast•12m ago•0 comments

Nimble Commander: free dual-pane file manager for macOS

https://github.com/mikekazakov/nimble-commander
1•ingve•16m ago•0 comments

A guide to effective "vibe" coding

https://sleuthdiaries.substack.com/p/guide-to-effective-vibe-coding
1•nisalperi•17m ago•0 comments

Virtual Scrolling

https://www.nicbarker.com/virtual-scrolling
1•dsego•19m ago•0 comments

Optical Context Compression Is Just (Bad) Autoencoding

https://arxiv.org/abs/2512.03643
2•unclefuzzy•22m ago•0 comments

Ask HN: What does it take to dodge the cloud?

2•zwnow•23m ago•0 comments

The Analysis of Matter by Bertrand Russell (1927)

https://gutenberg.org/cache/epub/77427/pg77427-images.html
1•petethomas•24m ago•0 comments

Socialism AI goes live on December 12, 2025

https://www.wsws.org/en/articles/2025/12/08/jfjv-d08.html
1•spariev•25m ago•1 comments

The Chinese finance whizz whose DeepSeek AI model stunned the world

https://www.nature.com/articles/d41586-025-03845-4
1•sherlockxu•28m ago•0 comments

Show HN: SafeGrub – Verify food restrictions using Gemini 2.5 and Grounding

https://safegrub-ai.com
1•exxoooz•29m ago•0 comments

Teaching rhythm, not rules: free browser "timer" that helps people enjoy Dota 2

https://dotaguide.net/
2•sunshiney0992•31m ago•1 comments

Mithridatism

https://en.wikipedia.org/wiki/Mithridatism
3•thunderbong•31m ago•1 comments

The DC-ROMA II is the fastest RISC-V laptop and is odd

https://www.jeffgeerling.com/blog/2025/dc-roma-ii-fastest-risc-v-laptop-and-odd
3•ingve•37m ago•0 comments

Ask HN: Numerical Implementation in New Language?

1•keepamovin•38m ago•0 comments

Sam Altman issues 'code red' at OpenAI as ChatGPT contends with rivals

https://www.theguardian.com/technology/2025/dec/02/sam-altman-issues-code-red-at-openai-as-chatgp...
1•PaulHoule•43m ago•2 comments

Infomaniak launches a free and sovereign AI

https://euria.infomaniak.com/
1•pm-security•43m ago•1 comments

Contestants fight polar bears and snakes in new AI reality show

https://www.thetimes.com/uk/media/article/non-player-combat-ai-reality-show-mh95fkkmj
1•petethomas•44m ago•0 comments

AI Dominance Plans Threatened by Administration Own Attacks on Solar, Wind Power

https://www.bloomberg.com/news/articles/2025-12-04/trump-s-ai-dominance-plans-threatened-by-his-o...
2•zekrioca•46m ago•1 comments

Orderzup best shipping aggregator in India launching soon

1•orderzup•47m ago•0 comments

Current tech scenario in Miami and future potential?

1•samyadn12•48m ago•0 comments

Over 12,000 Startup Ideas Right Here

https://startupideasdb.com
6•suhaspatil101•50m ago•0 comments

Trump greenlights Nvidia H200 AI chip sales to China if U.S. gets 25% cut

https://www.cnbc.com/2025/12/08/trump-nvidia-h200-sales-china.html
1•mgh2•52m ago•0 comments

Swipe File: Save and Share Ad Inspirations Efficiently

https://denote.net/swipe-file
1•MiaTaylor•52m ago•0 comments

A thousand-year-long composition turns 25 (2024)

https://longplayer.org/news/2024/12/31/a-thousand-year-long-composition-turns-25/
5•1659447091•54m ago•1 comments

Show HN: Zonformat– 35–60% fewer LLM tokens using zero-overhead notation

https://zonformat.org
2•ronibhakta•1h ago•0 comments

The reality of living overseas [video]

https://www.youtube.com/watch?v=tvcWuqKhsGg
2•keepamovin•1h ago•1 comments

I built a tiny API that detects champion loss in B2B SaaS

https://github.com/malukutty/champion-drift-detector-api
1•bhaviav100•1h ago•1 comments

It Is Possible to Import PST Contacts into My Android Phone

1•JonaScott•1h ago•0 comments