frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Asami: A flexible graph store, written in Clojure

https://github.com/quoll/asami
1•tosh•51s ago•0 comments

Blockchain Expansion Slowing Down? Try Solana Solutions

https://www.securitytokenizer.io/create-your-own-token-and-coin
1•ishariya•2m ago•0 comments

'Dirty Frag' exploit leaks out, gives root on most Linux machines

https://www.tomshardware.com/tech-industry/cyber-security/dirty-frag-exploit-gets-root-on-most-li...
1•lschueller•3m ago•0 comments

Anomalies

https://github.com/cognitect-labs/anomalies
1•tosh•4m ago•0 comments

8-Ball Game in Browser

https://www.karaqu.com/guest/billiard
1•hbi99•4m ago•0 comments

Elevated errors across Claude Models (May 8, 09:49 UTC)

https://status.claude.com/incidents/378dqscjgghp
1•pramodbiligiri•5m ago•0 comments

GeoJSON

https://geojson.org/
4•tosh•7m ago•0 comments

go-libghostty: Go bindings for libghostty-vt

https://tangled.org/mitchellh.com/go-libghostty
2•icy•7m ago•0 comments

Unique index failure on Postgres – my bad

1•robshep•8m ago•0 comments

Show HN: A Local-First Agentic Knowledge Manager

https://github.com/egroup-labs/kept
11•Mapika•9m ago•0 comments

Stop Using Yarn Classic

https://charpeni.com/blog/stop-using-yarn-classic
1•thunderbong•9m ago•0 comments

As NASA eyes lunar base, there's still much to learn about landing on the Moon

https://arstechnica.com/space/2026/05/as-nasa-eyes-lunar-base-theres-still-much-learn-about-landi...
1•rbanffy•10m ago•0 comments

Show HN: The agent which teaches you while you build

https://contral.ai
2•samagragune•12m ago•0 comments

Happy birthday, David Attenborough Famed naturalist marks 100 years

https://www.scientificamerican.com/article/david-attenborough-celebrates-his-100th-birthday/
1•yreg•13m ago•0 comments

Show HN: Airplane AI – Local NDA Safe AI Powered by Gemma

https://airplane-ai.franzai.com/
1•franze•22m ago•0 comments

Shopping for Happiness

https://putanumonit.com/2016/05/11/shopping-for-happiness/
1•jimsojim•23m ago•0 comments

Build the Shared Memory First

https://avwrm-5iaaa-aaaal-qdhcq-cai.icp0.io/blog/260505-agentic-org-transition/
1•gann_•23m ago•0 comments

Show HN: I built a dead simple App Store screenshot maker

https://ezscreenshots.com
3•abrowniejr•40m ago•1 comments

Salary isn't everything: Why flexibility to work remotely is the future of work

https://thehill.com/opinion/finance/5859902-hybrid-work-performance-retention/
2•robtherobber•42m ago•0 comments

Tesla's 4680 battery cells are underperforming and frustrating buyers – Electrek

https://electrek.co/2026/05/07/tesla-4680-battery-cell-performance-data-shows-cant-build-own-cells/
2•xbmcuser•44m ago•0 comments

Introductory Lectures on Black Hole Thermodynamics [pdf]

https://www.physics.umd.edu/grt/taj/776b/lectures.pdf
3•gone35•48m ago•0 comments

Rpow2: A tribute to the original RPOW by Hal Finney

https://github.com/frkrueger/rpow
1•janandonly•48m ago•0 comments

GTM Engineer Roles at WorkMotion, Supabase, SymphonyAI

https://gtmjobs.beehiiv.com/p/9-gtm-engineer-roles-this-week-workmotion-supabase-symphonyai-more
2•benchmarkapp•54m ago•0 comments

Claude Flags Hantavirus Vaccine Questions as Security Risk

5•pell•57m ago•4 comments

Syrian Tourist Map

https://alnashra.org/map11/gis_syria2/syria_tourism.php
2•altilunium•1h ago•0 comments

Data Centers in Space

https://nb1t.sh/data-centers-in-space/
3•freakynit•1h ago•0 comments

Google removes privacy assurances after stuffing devices with their AI model

https://www.thatprivacyguy.com/blog/google-quietly-removes-on-device-ai-privacy-claim/
3•AlexanderHanff•1h ago•1 comments

Show HN: Link_in_bio – Static HTML, no-back end Linktree alternative

https://github.com/p32929/link_in_bio
1•heliskyr2•1h ago•0 comments

Debian welcomes the 2026 GSoC interns

https://bits.debian.org/2026/05/welcome-gsoc2026-contributors.html
1•tannhaeuser•1h ago•0 comments

Mathematics Genealogy Project

https://www.mathgenealogy.org
1•ipnon•1h ago•0 comments