frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•7mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Nvidia can sell H200 chips to China for 25% U.S. share

https://www.axios.com/2025/12/08/trump-nvidia-200-chips
1•gmays•4m ago•0 comments

Prompt Engineering for Vibecoding MVP Quicker

https://chromewebstore.google.com/detail/promptify/gbdneaodlcoplkbpiemljcafpghcelld
2•Krish-mal15•8m ago•1 comments

Is there any value from "coming soon" placeholders?

1•sshadmand•9m ago•0 comments

Prediction: AI will make formal verification go mainstream

https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html
1•raphlinus•9m ago•0 comments

Best Japanese Learning Tools 2025 Award Show

https://skerritt.blog/best-japanese-learning-tools-2025-award-show/
2•wahnfrieden•17m ago•0 comments

Israel Pumps Desalinated Water into Depleted Sea of Galilee

https://humanprogress.org/in-world-first-israel-begins-pumping-desalinated-water-into-depleted-se...
1•geox•17m ago•0 comments

Show HN: Chrome Extension and Spreadsheet that replaced our $10k/y support desk

https://tatomo.com
1•mareksotak•17m ago•0 comments

I Know Why Lying about AI Water Use Is So Easy [video]

https://www.youtube.com/watch?v=H_c6MWk7PQc
1•Topfi•20m ago•0 comments

Hamas Rejects Disarmament, Threatens Another October 7 – Media Silence

https://honestreporting.com/hamas-rejects-disarmament-threatens-another-october-7-media-silence/
3•mhb•25m ago•0 comments

A.I. Videos Have Flooded Social Media. No One Was Ready

https://www.nytimes.com/2025/12/08/business/ai-slop-sora-social-media.html
3•xnx•29m ago•0 comments

Plaintext Casa – A decentralized social network

https://plaintext.casa/
2•koehr•30m ago•1 comments

Multibase CLI

http://www.chriswarbo.net/blog/2025-12-07-multibase_cli.html
1•chriswarbo•30m ago•0 comments

This Century, Child Mortality Is Likely to Rise

https://time.com/7338791/childhood-mortality-increasing-gates-foundation/
4•gok•30m ago•1 comments

Microsoft wants to fix app updates – new orchestrator to make updates invisible

https://www.windowscentral.com/microsoft/windows-11/microsoft-wants-to-fix-app-updates-on-windows...
1•zathan•32m ago•0 comments

Guardian Editorial on Geoengineering

https://www.theguardian.com/commentisfree/2025/dec/08/the-guardian-view-on-solar-geoengineering-a...
1•dr_dshiv•33m ago•0 comments

Why the Sanitizer API is just `setHTML()`

https://frederikbraun.de/why-sethtml.html
1•birdculture•33m ago•0 comments

A battle against arsenic toxicity by Earth's earliest complex life forms

https://www.nature.com/articles/s41467-025-59760-9
1•QueensGambit•35m ago•0 comments

Poland arrests Ukrainians utilizing 'advanced' hacking equipment

https://www.bleepingcomputer.com/news/security/poland-arrests-ukrainians-utilizing-advanced-hacki...
3•c420•37m ago•0 comments

Release Notes for Safari Technology Preview 233

https://webkit.org/blog/17635/release-notes-for-safari-technology-preview-233/
1•feross•37m ago•0 comments

CISA's Mobile Communications Best Practice Guidance [pdf]

https://www.cisa.gov/sites/default/files/2025-11/guidance-mobile-communications-best-practices-20...
1•embedding-shape•37m ago•0 comments

Multifunctional retinal phantom for standardizing ophthalmic imaging systems

https://www.nature.com/articles/s44172-025-00475-6
1•PaulHoule•40m ago•0 comments

Universal Probabilistic Daily Reminder Coordination System for Anything

https://github.com/TypicalHog/randevu
1•TypicalHog•40m ago•0 comments

Unix v4 tape found in closet at UofU

https://ksltv.com/science-technology/university-of-utah-discovers-rare-computer-relic/853296/
3•krupan•42m ago•1 comments

Social media, not gaming, tied to rising attention problems in teens

https://theconversation.com/social-media-not-gaming-tied-to-rising-attention-problems-in-teens-ne...
5•devonnull•43m ago•2 comments

LMArena Is a Plague on AI

https://surgehq.ai/blog/lmarena-is-a-plague-on-ai
3•cui•43m ago•1 comments

Show HN: Axis – A semantics-first logic language co-designed with AI

https://github.com/axis-foundation/axis-research
1•fixpointflow•43m ago•1 comments

Show HN: RamScout – Search eBay RAM Listings by Price per GB (US/UK)

https://www.ramscout.com/
3•chinskee•44m ago•0 comments

Show HN: I built a system for active note-taking in regular meetings like 1-1s

https://withdocket.com
4•davnicwil•49m ago•0 comments

DeepSeek-v3.2 Release

https://api-docs.deepseek.com/news/news251201
4•yihongs•55m ago•1 comments

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

https://cas-bridge.xethub.hf.co/xet-bridge-us/692cfec93b25b81d09307b94/2d0aa38511b9df084d12a00fe0...
4•yihongs•57m ago•1 comments