frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•12mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

AI is about to make the global e-waste crisis worse

https://restofworld.org/2026/global-ewaste-crisis/
1•i7l•3m ago•0 comments

Show HN: Lfk – a yazi inspired, Vim-like keyboard focused fast Kubernetes TUI

https://github.com/janosmiko/lfk
1•mixe3y•4m ago•0 comments

Washington DC on track for most volatile temperature year since 1959

https://www.williamangel.net/blog/2026/04/19/Washington_DC_On_Track_For_Stormy_2026.html
1•datadrivenangel•4m ago•0 comments

Theseus, a Static Windows Emulator

https://neugierig.org/software/blog/2026/04/theseus.html
1•zdw•5m ago•0 comments

Choice Against Cost: Sparse Autoencoder Findings in Three Small Language Models

https://substack.com/home/post/p-194758516
1•sourdoughbob•5m ago•0 comments

SoK: Security of Autonomous LLM Agents in Agentic Commerce

https://arxiv.org/abs/2604.15367
1•omer_k•8m ago•0 comments

Avoiding a Culture of Emergencies

https://blog.staysaasy.com/p/avoiding-a-culture-of-emergencies
1•walterbell•9m ago•0 comments

Debate grows over memory semiconductor bonuses

https://koreajoongangdaily.joins.com/news/2026-04-20/opinion/columns/Debate-grows-over-semiconduc...
1•walterbell•21m ago•0 comments

Brussels pushes remote working to ease energy crisis

https://www.ft.com/content/bbc9c31e-cc43-41a6-8fb7-057d44b25a21
2•petethomas•21m ago•0 comments

They Went Abroad to Save Money. Moving Back Seems Unaffordable

https://www.nytimes.com/2026/04/19/business/americans-abroad-cheaper-living-costs.html
2•toomanyrichies•38m ago•0 comments

Tinkerer transforms a filthy 1990s PlayStation into the 'ultimate PS1'

https://www.popsci.com/technology/transform-1990s-playstation/
1•Brajeshwar•43m ago•0 comments

The Infinite Machine Olto is part motorcycle, part bike, part Cybertruck

https://www.theverge.com/transportation/913008/infinite-machine-olto-ebike-review
1•walterbell•54m ago•0 comments

Show HN: OpenRegistry – MCP for global company registry search

https://github.com/sophymarine/openregistry
1•richardwong1•55m ago•0 comments

A cache-friendly IPv6 LPM with AVX-512 (linearized B+-tree, real BGP benchmarks)

https://github.com/esutcu/planb-lpm
2•debugga•55m ago•0 comments

Introductory Biology: Evolutionary and Ecological Perspectives

https://pressbooks.umn.edu/introbio/
2•rolph•1h ago•0 comments

Is AI pressure making developer burnout worse? (anonymous survey inside)

1•rechargedaily•1h ago•1 comments

WebUSB Extension for Firefox

https://github.com/ArcaneNibble/awawausb
3•luu•1h ago•0 comments

Autonomous Testing and AI‑Driven Tooling Are Redefining Developer Productivity

https://semanticed.online/editorial-why-autonomous-testing-and-ai-driven-tooling-are-redefining
2•alihassaanmug•1h ago•0 comments

Qwen releases Qwen3-Embedding-0.6B

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B
3•arabicalories•1h ago•1 comments

Box-shadow is no alternative to outline

https://www.matuzo.at/blog/2026/box-shadow-no-alternative-to-outline
2•salkahfi•1h ago•0 comments

Show HN: Ribbon, a Native iOS Linkding Client

https://apps.apple.com/us/app/ribbon-a-linkding-client/id6762416055
1•cdrnsf•1h ago•0 comments

Wonderful Life: The Burgess Shale and the Nature of History[pdf]

https://ia601309.us.archive.org/20/items/historyDEEPWEB/Wonderful%20Life_%20The%20Burgess%20Shale...
1•rolph•1h ago•0 comments

Detecting Gunshots with a Watch Accelerometer

https://humanparadox.org/garmin-fenix-shot-timer-app/
2•colingauvin•1h ago•0 comments

Unfortunate day for companies named Context

https://www.context.dev/blog/we-are-context-dev-not-context-ai
3•ICodeSometimes•1h ago•2 comments

Mac Mini and Mac Studio Supply Shortages

https://www.wsj.com/tech/personal-tech/apple-mac-mini-supply-3e7a7509
7•Brajeshwar•1h ago•2 comments

Henry David Thoreau 2026 film

https://www.pbs.org/show/henry-david-thoreau/
3•rasengan0•1h ago•0 comments

Scalable Fluxonium Quantum Processors via Tunable-Coupler Architecture

https://arxiv.org/abs/2604.13363
2•jonbaer•1h ago•0 comments

Anthropic installed a spyware bridge on my machine?

https://www.thatprivacyguy.com/blog/anthropic-spyware/
45•twapi•1h ago•11 comments

The Khan Ted Institute [video]

https://www.youtube.com/watch?v=kEhRi1tFlhs
1•apparent•1h ago•1 comments

Contra Benn Jordan, data center (and all) sub-audible infrasound issues are fake

https://blog.andymasley.com/p/contra-benn-jordan-data-center-and
3•logicprog•1h ago•2 comments