frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Carl: The torrent client I always wanted but couldn't find

https://github.com/vincenzopalazzo/carl
1•vincenzopalazzo•4s ago•0 comments

Stop scanning QR codes, you dolts

https://www.dedoimedo.com/computers/qr-codes.html
1•pndy•32s ago•0 comments

Napkin Math

https://github.com/sirupsen/napkin-math
1•tosh•48s ago•0 comments

Warp thematic visual system – structured regional SVG theme atlas

https://github.com/moonshotuser001/warp-thematic
1•moonshotuser001•4m ago•0 comments

Felt need of some number to express the vibe?

https://play.google.com/store/apps/details?id=com.abj.humm&hl=en_US
1•abj1729•4m ago•1 comments

Flesh-eating screwworm arrives in US with first case in Texas

https://www.bbc.com/news/articles/c936r25grrlo
1•georgecmu•5m ago•0 comments

Wraplet vs. Web Components

https://wraplet.dev/blog/wraplet-vs-web-components/
1•enador•5m ago•0 comments

From Bytes to 3D: Virtua Racing PICO-8 Demake (2020)

https://freds72.itch.io/virtua-racing/devlog/100269/from-bytes-to-3d
1•atan2•6m ago•0 comments

Show HN: Chatcode – Remote Control for Claude Code and Codex

https://chatcode.dev/
1•borkasm•8m ago•0 comments

Managing Tasks with Todo.txt and Taskwarrior

https://lwn.net/Articles/824333/
1•ankitg12•10m ago•0 comments

I Made a Vibe Coded Project

1•annoymousperson•10m ago•1 comments

The Blackstone Graph

https://isaacus.com/blog/announcing-the-blackstone-graph
1•ubutler•11m ago•0 comments

Mastering Taskwarrior

https://wired.wasql.com/articles/
1•ankitg12•12m ago•0 comments

Low cost cloud service provider in India

https://www.cloudpe.com/blog/cost-and-performance-effective-cloud-platform-for-indian-startups-2026/
1•AbhiAmbad•13m ago•1 comments

Attention Economy

https://en.wikipedia.org/wiki/Attention_economy
1•throw0101a•18m ago•1 comments

xAI Asks Court to Strip Alleged Grok Deepfake Nudes Victims of Anonymity

https://www.wired.com/story/xai-asks-court-to-strip-alleged-grok-deepfake-nudes-victims-of-anonym...
1•petee•18m ago•1 comments

New Mystery Submarine Signals China's Rapid Undersea Expansion

https://www.navalnews.com/naval-news/2026/06/new-mystery-submarine-signals-chinas-rapid-undersea-...
2•rguiscard•26m ago•0 comments

2 Years of Programming Chess Apps: My Lessons

https://lichess.org/@/HollowLeaf/blog/2-years-of-programming-chess-apps-my-lessons/PoajTjaa
1•qznc•26m ago•0 comments

Minimal EU AI Act Article 50 (AI Disclosure) Banner in React and Tailwind

https://github.com/alfalf09/minimal-eu-ai-act-compliance-banner
1•alesalf•28m ago•0 comments

Exploration Got Cheap. Human Review Did Not

https://www.fbeeper.com/agentkitten/2026/06/04/Exploration-Got-Cheap-Human-Review-Did-Not/
1•fbeeper•30m ago•1 comments

"Family Guy" Creator Seth MacFarlane Class Day – Harvard Commencement 2006 [video]

https://www.youtube.com/watch?v=YOBK-xBOFcc
1•Cider9986•30m ago•1 comments

A thread-safe disk based persistent queue in Python

https://github.com/peter-wangxu/persist-queue
1•ankitg12•30m ago•0 comments

DiffusionBlocks: Training Neural Networks One Block at a Time

https://pub.sakana.ai/diffusionblocks/
3•sebg•32m ago•0 comments

LLVM: ZSTD-Compressed Binaries for "Significantly Reduced" Downloads

https://www.phoronix.com/news/LLVM-Zstd-Compressed-Binaries
3•doener•34m ago•0 comments

Autonomous cars will destroy jobs by 2025 (2015)

https://qz.com/403628/autonomous-cars-will-destroy-millions-of-jobs-and-reshape-the-economy-by-2025
5•mtuncer•39m ago•2 comments

Marjane Satrapi, author of 'Persepolis,' dies at 56

https://www.lemonde.fr/en/obituaries/article/2026/06/04/marjane-satrapi-author-of-persepolis-dies...
3•spankibalt•40m ago•1 comments

Lambda isn't leaking memory, your metrics are lying to you

https://engineering.taktile.com/blog/onnx-memory-usage-on-lambda/
2•tlarkworthy•41m ago•0 comments

Show HN: A built-in SQLite viewer for verifying your coding agents database work

https://lanes.sh/blog/whats-new-v042
4•s-xyz•41m ago•0 comments

Apple's Overhauled Siri Will Reportedly Run on Nvidia's Blackwell Chips

https://www.macrumors.com/2026/06/04/apple-siri-rely-on-google-nvidia-chips/
1•tosh•41m ago•0 comments

Show HN: ssh late.sh - a cozy command-line Clubhouse for computer people

https://late.sh
5•bl4ckbe4r•43m ago•3 comments