frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Devenv 2.1: Nix with zsh, fish, and nushell via libghostty

https://devenv.sh/blog/2026/05/07/devenv-21-nix-with-zsh-fish-and-nushell-via-libghostty/
1•zupo•1m ago•0 comments

AWS says acute server memory shortage is driving customers to the cloud

https://www.theregister.com/off-prem/2026/04/30/aws-says-server-memory-shortage-pushing-customers...
1•tcp_handshaker•4m ago•0 comments

Open-Source Models for Text Rendering and Image Editing

https://firethering.com/best-open-source-ai-image-text-rendering-models/
1•steveharing1•4m ago•0 comments

"Hypergravity" Rewires Biology over the Long Haul

https://www.universetoday.com/articles/hypergravity-rewires-biology-over-the-long-haul
1•tcp_handshaker•5m ago•0 comments

Watching for File Changes on macOS

https://alexwlchan.net/2026/watch-files-on-macos/
1•ingve•6m ago•0 comments

Confusion: just enough MDL to play Zork (2009)

https://rec.arts.int-fiction.narkive.com/pM8Kgfbw/confusion-just-enough-mdl
1•exvi•8m ago•0 comments

AI data centers face increasing complaints about inaudible but 'felt' infrasound

https://www.tomshardware.com/tech-industry/artificial-intelligence/data-centers-face-increasing-i...
1•tcp_handshaker•9m ago•0 comments

Who is Marcus Rodriguez? [video]

https://www.youtube.com/shorts/NG1zo-sR_y4
1•aragonite•9m ago•0 comments

Nick Bostrom Has a Plan for Humanity's 'Big Retirement'

https://www.wired.com/story/nick-bostrom-has-a-plan-for-humanitys-big-retirement/
1•danielmorozoff•12m ago•0 comments

Truth Social lays bare narrow obsessions of an online president

https://www.npr.org/2026/05/08/nx-s1-5749358/trump-truth-social-online-posts-iran-white-house-bal...
2•robtherobber•12m ago•0 comments

Show HN: Groxy – a Go library for building forward proxy servers

https://github.com/SalzDevs/groxy
2•SalzDevs•13m ago•0 comments

An AI‑enabled device code phishing campaign

https://www.microsoft.com/en-us/security/blog/2026/04/06/ai-enabled-device-code-phishing-campaign...
1•buccal•15m ago•1 comments

Programming the Commodore 128

https://retrogamecoders.com/programming-the-commodore-128/
1•ibobev•16m ago•0 comments

Big Tech's $725B AI spending spree sends free cash flow to a decade low

https://www.ft.com/content/b3dfaba9-17a2-4fac-90fe-4ab3ca7c9494
1•bram98•16m ago•0 comments

Emulating Old Junk from Yesteryear

https://themaister.net/blog/2026/05/09/emulating-old-junk-from-yesteryear-or-my-obsession-making-...
1•ibobev•16m ago•0 comments

Comparing an LZ4 Decompressor on Four Legacy CPUs

https://bumbershootsoft.wordpress.com/2026/05/09/comparing-an-lz4-decompressor-on-four-legacy-cpus/
1•ibobev•17m ago•0 comments

Jewish American pedophiles hide from justice in Israel (2020)

https://www.cbsnews.com/news/how-jewish-american-pedophiles-hide-from-justice-in-israel/
2•rdevilla•21m ago•0 comments

Show HN: AI Tool for Batch-Generating Multi-Platform Marketing Content

1•zzh030902•22m ago•0 comments

Dusk Is Now Available

https://twilitrealm.dev/posts/2026-05-09-dusk-v1-released/
1•novoreorx•22m ago•0 comments

The Adventure Family Tree

https://mipmip.org/advfamily/advfamily.html
1•exvi•25m ago•0 comments

RSSTranslate – Translate any RSS feed to JSON with one API call

https://rsstranslate.com/
1•kophazialmos•26m ago•0 comments

Crowther's Adventure for Linux (2007)

https://web.archive.org/web/20090831075301/http://www.russotto.net/~mrussotto/ADVENT/
1•exvi•26m ago•0 comments

The AI That Took a Sunday Off

https://debarshibasak.github.io/readables/blogs/eu-ai-right.html
1•debarshri•29m ago•1 comments

Meet the academics refusing to use generative AI

https://www.nature.com/articles/d41586-026-00508-w
2•XzetaU8•32m ago•1 comments

Sandbox Your Agents

https://philippkuhnhardt.de/blog/sandbox-your-agents/
1•Extasia785•33m ago•0 comments

ProgramBench (Meta) Repro: variance across runs and findings

https://nickcheng0921.github.io/2026/05/10/thoughts-on-programbench-part1.html
1•porterbaseball•36m ago•1 comments

Show HN: DialYourShot – interactive espresso parameter tool

https://dialyourshot.com/
3•pirotechnique•44m ago•0 comments

Show HN: Harper, a free ocean forecast for surfers

https://harper.surf/
1•fbenevides•45m ago•0 comments

Somewhere Nearby is Colossal Cave (2007)

https://dhq.digitalhumanities.org/vol/1/2/000009/000009.html
2•exvi•47m ago•0 comments

GitHub Copilot is deprecating Grok Code Fast 1

https://github.blog/changelog/2026-05-08-upcoming-deprecation-of-grok-code-fast-1/
1•whtsky•47m ago•0 comments