frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Jimmy is a tool to convert your notes from different formats to Markdown

https://marph91.github.io/jimmy/
1•CTOSian•2m ago•0 comments

Microsoft Bought a Nuclear Plant

https://moai.studio/blog/posts/microsoft-bought-a-nuke-plant.html
1•ionwake•4m ago•0 comments

France Is Too Hot for Shutters and Ceiling Fans

https://www.theatlantic.com/ideas/2026/06/france-air-conditioning-failure/687723/
2•paulpauper•5m ago•0 comments

Higher Ed Is Sorry

https://www.theatlantic.com/ideas/2026/06/higher-education-universities-public-trust/687714/
1•paulpauper•5m ago•0 comments

Show HN: Drift, write LLM agents in English and transpile to async Python

https://github.com/rileyq7/drift
1•rileyq12•7m ago•0 comments

The Memory Tax

https://bycig.substack.com/p/the-memory-tax
1•paulpauper•7m ago•0 comments

Token Capital Efficiency

https://kmad.ai/Token-Capital-Efficiency
1•kmad•9m ago•0 comments

Utility boss warns US faces blackouts due to power supply shortfall

https://www.ft.com/content/14d2e591-7cd5-4456-904f-1b7fdc5cbc1a
1•Geekette•10m ago•2 comments

Mel Brooks is 100 today

https://www.theatlantic.com/culture/2026/06/long-live-mel-brooks/687730/
4•shellback3•11m ago•0 comments

I made a tool to check out open source websites

https://github.com/Frenxys/portfolio-finder
1•Frenea•11m ago•0 comments

Visual Basic on the PC with Windows 3.1

https://stonetools.ghost.io/visualbasic-win31/
1•TMWNN•15m ago•0 comments

Rats and mice are mutating and becoming resistant to poison, researchers warn

https://www.independent.co.uk/news/world/americas/rats-mice-mutating-poison-resistance-warning-b3...
1•Vaslo•19m ago•0 comments

Show HN: NameSnag – Get alerted when a watched domain appears available

https://namesnag.io
1•pro_methe5•22m ago•0 comments

Strong Relationships, Loosely Held

https://www.jerry.wtf/posts/strong-relationships-loosely-held/
3•personjerry•23m ago•0 comments

There are 5.7M more childless women of prime child-bearing age than expected

https://carsey.unh.edu/publication/factors-contributing-demographic-cliff-more-us-women-childbear...
2•loughnane•26m ago•1 comments

My First Encounter with a Political Spambot

https://tombedor.dev/political-spam/
2•jjfoooo4•26m ago•0 comments

Question: Is matching fixed regexes with back-references in P?

https://branchfree.org/2019/04/04/question-is-matching-fixed-regexes-with-back-references-in-p/
1•fanf2•29m ago•0 comments

Ask HN: Books about Genetic Algorithms

5•andyjohnson0•31m ago•1 comments

POSIX Is Not a Shell

https://alganet.github.io/blog/2026-06-28-12-POSIX-Is-Not-A-Shell.html
2•gaigalas•31m ago•0 comments

Show HN: I reverse-engineered the RLF log format used by REMUS underwater drones

https://github.com/isaacgerg/remus-rlf-reader
1•ipunchghosts•33m ago•0 comments

Technology and Power

https://www.chrbutler.com/technology-and-power
3•delaugust•33m ago•0 comments

Attention is all we have: A conjectural theory of cognitive inequality

https://davidbessis.substack.com/p/attention-is-all-we-have
3•Luc•37m ago•0 comments

Startup Wants to Sell a U.S.-Built Tiny Truck for $21,500

https://www.roadandtrack.com/news/a71667299/reo-industries-runabout-aims-to-simplify-the-truck-ma...
3•rmason•40m ago•1 comments

Claude Code now uses dark UI patterns to gain Google account access via MCP

https://claude.com/docs/connectors/google/gmail
2•janpeuker•41m ago•1 comments

Duolicious – Open-source dating app

https://github.com/duolicious/duolicious
5•nietzscheese•42m ago•0 comments

The Last Museum: a search site for museum art

https://lastmuseum.com/
2•ohjeez•43m ago•0 comments

Why the Metaverse Failed

https://josh.earth/posts/metaverse-failed
4•joshmarinacci•43m ago•1 comments

Ask HN: What do SRE do at your company?

2•petemc_•45m ago•0 comments

Evolving Thoughts on AI in 2026

https://chriskiehl.com/article/evolving-thoughts-on-ai-2026
1•goostavos•52m ago•1 comments

Show HN: Gotaper – A minimalist, journal-inspired race planner for athletes

https://gotaper.app/
1•ezeoleaf•53m ago•0 comments