frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•11mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

I gave an AI autonomous tools 6 weeks ago. Here's the record she's been keeping

https://ravennest.science
1•larklaflamme•42s ago•1 comments

Sciwrite-Lint: Verification Infrastructure for the Age of Science Vibe-Writing

https://arxiv.org/abs/2604.08501
1•wb14123•1m ago•0 comments

Quiche Customizable Browser

https://quiche.industries/browser/
1•evo_9•2m ago•0 comments

Annotation is all you need

https://www.scorecard.io/blog/annotation-is-all-you-need
1•yash1hi•3m ago•0 comments

Show HN: Uninum – All elementary functions from a single operator in Python

https://github.com/Brumbelow/uninum
1•brumbelow•5m ago•1 comments

Gagan Biyani: how I feel about Udemy's sale to Coursera

https://twitter.com/gaganbiyani/status/2044092914582822936
1•nadis•6m ago•0 comments

BankToCSV – Convert bank statement PDFs to clean CSVs

https://banktocsv.vercel.app
1•pixelpushr•7m ago•0 comments

EIU Democracy Index 2025: democracy stabilises after eight years of decline

https://www.economistgroup.com/press-centre/economist-enterprise/eiu-democracy-index-2025-democra...
1•Bondi_Blue•7m ago•0 comments

Researchers Asked LLMs for Strategic Advice. They Got "Trendslop" in Return

https://hbr.org/2026/03/researchers-asked-llms-for-strategic-advice-they-got-trendslop-in-return
2•cdrnsf•8m ago•0 comments

Is GitHub Down Again?

1•codehead•8m ago•0 comments

The Wisdom of the People's Computer Company

https://arbesman.substack.com/p/the-wisdom-of-the-peoples-computer
1•arbesman•11m ago•0 comments

Tesla FSD Europe launch backlash: HW3 owners launch claim site

https://electrek.co/2026/04/14/tesla-fsd-europe-hw3-owners-dutch-claim/
1•breve•12m ago•0 comments

Show HN: ContextPack – CLI that maps any codebase into ranked context

https://github.com/Sashank006/Context-Engine
1•Sashank06•13m ago•0 comments

Zelensky: Ukraine's defense industry can produce FPV drones annually

https://www.ukrinform.net/rubric-defense/4112129-zelensky-ukraines-defense-industry-can-produce-m...
1•doener•16m ago•0 comments

Comparison of Payment Methods

https://eylenburg.github.io/payments.htm
1•Cider9986•16m ago•0 comments

Terminator: Code You See Onscreen [video]

https://www.youtube.com/watch?v=NebvccLHutQ
1•ingve•16m ago•0 comments

Data Discovery – plain-English to discovering and acquiring data using AI

https://datris.ai/videos/data-discovery-ingestion-consumption
1•tfearn•17m ago•1 comments

Patches for Linux 7.1 May Have Negative Impact on 32-Bit Systems

https://www.phoronix.com/news/Linux-7.1-VFS-Kino-32-bit
1•doener•18m ago•0 comments

How to diagnose RAG failures from traces

https://www.siquick.com/blog/diagnose-rag-failures-from-traces
1•siquick•25m ago•0 comments

Did games really get more costly to make?

https://newsletter.hushcrasher.com/p/did-games-really-get-more-costly
1•juliebelz•26m ago•1 comments

Stack Overflow moderator publicly leaks private flagger information

https://meta.stackoverflow.com/questions/438679/why-is-a-moderator-harassing-me-about-an-answer-i...
4•hskdididn•27m ago•1 comments

Are ClickHouse JOINs Slow? A 2026 PR-by-PR Analysis

https://dataanalyticsguide.substack.com/p/clickhouse-join-performance-2026
1•manveerc•30m ago•0 comments

Sandyaa: Recursive-LLM source code auditor that writes exploitable PoCs

https://github.com/securelayer7/sandyaa
1•sandeep_kamble•30m ago•1 comments

How Not to 'Pilet' a Kickstarter

https://c33tech.com/blog/2026/04/how_not_to_pilet_a_kickstarter/
1•mikeflynn•31m ago•0 comments

Michael O. Rabin has passed away

https://en.wikipedia.org/wiki/Michael_O._Rabin
2•statusreport•32m ago•1 comments

Connect iMessage to your Claude Code assistant

https://github.com/anthropics/claude-plugins-official/tree/main/external_plugins/imessage
1•rob•32m ago•0 comments

New (Twin) Dad Advice

https://hec.works/blog/new-twin-dad/
2•dividedcomet•34m ago•2 comments

Show HN: Turned a viral DevOps debugging tweet into a playable incident SIM

https://youbrokeprod.com/login?redirect=%2Fplay%2Frunaway-process-001
1•cdnsteve•35m ago•0 comments

Anthropic Redesigns Claude Code Desktop

https://twitter.com/claudeai/status/2044131493966909862
1•Nevin1901•36m ago•1 comments

Show HN: Start Using Claude Managed Agents Today – Posse

https://github.com/oguzbilgic/posse
1•obilgic•37m ago•0 comments