frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Berkshire Hathaway – It's essentially a pre-diversified empire

https://en.wikipedia.org/wiki/Berkshire_Hathaway
1•modinfo•3m ago•0 comments

Show HN: Sidequest is a better /btw for Pi

https://github.com/peterp/pi-sidequest
1•pistoriusp•4m ago•0 comments

LLM-free, layout-aware PDF chunker in pure Rust

https://github.com/matthiasnordwig/pdf-struct-chunker
1•MatthiasNordwig•5m ago•0 comments

Ukraine's newest strike weapon, Balloons

https://www.defensenews.com/global/europe/2026/06/25/ukraines-newest-strike-weapon-drifts-into-ru...
1•garyclarke27•5m ago•0 comments

SpecManager – a full agile team for founders, as a Claude Code plugin

https://github.com/joanseg/specmanager
1•joansg•12m ago•0 comments

Understanding Android's Project Treble, Project Mainline, APK Signature Schemes

https://medium.com/@Max_Sir/understanding-androids-project-treble-project-mainline-and-apk-signat...
1•thunderbong•12m ago•0 comments

Why did this journal retract two 1940s papers by Max Planck?

https://arstechnica.com/science/2026/06/why-did-this-journal-retract-two-1940s-papers-by-max-planck/
5•DR_MING•13m ago•0 comments

Life After Oligarchy

https://www.commonweal.scot/articles/magazine-zrell
1•robtherobber•13m ago•1 comments

War at the Final Frontier

https://medium.com/@firstfromreverse/war-at-the-final-frontier-2f9af096a297
1•WishingWisp•15m ago•0 comments

N8n Docker Compose stack with secrets, TLS, and a 16-check validator

https://github.com/empostigo/n8n-compose-field-guide
1•44_88•17m ago•1 comments

OctoPerf MCP – drive load tests from any LLM (OAuth 2.1, no API key)

https://api.octoperf.com/doc/mcp/
1•Jellly•18m ago•0 comments

Computer Networking: A Top Down Approach (9th Ed): Online Video Presentations

https://gaia.cs.umass.edu/kurose_ross/lectures.php
1•teleforce•23m ago•0 comments

Create sandboxed rich-text telegram agents with a single config file

https://github.com/montyanderson/007
2•montyanderson•27m ago•0 comments

The Race to Reliable Visual Understanding

https://cacm.acm.org/news/the-race-to-reliable-visual-understanding/
2•visha1v•28m ago•0 comments

Show HN: Closedtab: a shared record for human-agent teams

https://www.npmjs.com/package/closedtab
1•omnivore•30m ago•0 comments

About Latency-focused disk benchmarks for Linux VPS environments

https://github.com/haydenjames/VPS-Disk-Latency-Bench
2•ashitlerferad•30m ago•0 comments

Context engineering for analytics agents: six months of building and rebuilding

https://blog.getcassis.com/context-engineering-for-analytics-agents/
1•matthieu_bl•31m ago•0 comments

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine

https://blog.kog.ai/kog-laneformer-2b-the-latency-first-model-behind-kog-inference-engine/
1•thomasjb•31m ago•0 comments

Show HN: Claude Code plugin to draw feedback and send it back into the session

https://github.com/tomreinert/claude-annotate
1•tom2948329494•33m ago•0 comments

Show HN: Jerkarator (SFW): award-winning axial electromagnetic generator

https://github.com/rmit-wgbowley/Isopod
1•wgbowley•36m ago•2 comments

2026 GPU Price Report

https://cast.ai/reports/gpu-price-report/
3•BlackPlot•38m ago•2 comments

Using Local Coding Agents

https://magazine.sebastianraschka.com/p/using-local-coding-agents
4•mariuz•40m ago•1 comments

ProtonVPN is AI support only. 4 days no human, made me BOTNET. Begging for help

2•protonisafk•40m ago•1 comments

SimpleX: A messaging platform with no user identifiers

https://github.com/simplex-chat/simplex-chat
2•mpfect•42m ago•0 comments

The People Who Will Thrive in the AI Age

https://www.theatlantic.com/ideas/2026/06/ai-open-ai-anthropic/687689/
1•cplan•47m ago•2 comments

GPT-5.5 Instant (June 2026) Intelligence, Performance and Price Analysis

https://artificialanalysis.ai/models/gpt-5-5-instant-06-26
3•theanonymousone•49m ago•0 comments

Towards Automating Scientific Review with Google's Paper Assistant Tool

https://arxiv.org/abs/2606.28277
1•ilreb•49m ago•0 comments

Transformations

https://jauzo.com/2026/06/28/transformation/
1•kukkeliskuu•50m ago•1 comments

Linkedout: See how much data LinkedIn has on you

https://blog.alexewerlof.com/p/linkedout
4•hanifbbz•52m ago•1 comments

Metamorphic testing with Lean4-verified mutations finds compiler miscompilations

https://nowarp.io/blog/compiler-testing-part-2/
1•jubnzv_•52m ago•0 comments