frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Linux '95

https://www.linuxjournal.com/article/2682
1•theanonymousone•35s ago•0 comments

Vulgar Materialism

https://borretti.me/article/on-vulgar-materialism
1•Tomte•1m ago•0 comments

Show HN: Teach your kids absolute (perfect) pitch

https://github.com/paytonjjones/bsharp
1•paytonjjones•1m ago•0 comments

Printing Gaussian Splats

https://www.patreon.com/DanyBittel/posts/printing-splats-161333338
1•ilnmtlbnm•2m ago•0 comments

Show HN: TermType – a terminal typing game where words fall like Space Invaders

https://github.com/GiovanniCst/termtype
1•J_cst•2m ago•0 comments

Anthropic to Require ID Verification for Certain Capabilities Starting July 8

https://old.reddit.com/r/ClaudeAI/comments/1ubm53n/official_anthropic_to_require_identity/
3•bathory•7m ago•1 comments

Why Mizoram has shops without shopkeepers (2024)

https://timesofindia.indiatimes.com/etimes/trending/why-mizoram-has-shops-without-shopkeepers/art...
1•susam•8m ago•0 comments

Show HN: A GitHub app that suggests code fixes for conversion failures

https://rejourney.co/demo/leaks
1•mrr7337•8m ago•0 comments

Smashing the NIMBYs created modern capitalism

https://worksinprogress.co/issue/how-abolishing-the-stakeholder-state-caused-the-industrial-revol...
1•momentmaker•9m ago•0 comments

Safe SIMD in Rust, Even on the Inside – By Sergey "Shnatsel" Davidoff

https://shnatsel.medium.com/safe-simd-in-rust-even-on-the-inside-c6f1ff381828
1•rbanffy•10m ago•0 comments

Why do sports stadiums have different names for the World Cup? Here's the reason

https://www.thv11.com/article/sports/soccer/world-cup/fifa-world-cup-stadiums-different-names/507...
1•RickJWagner•12m ago•0 comments

Neosolve – SolveSpace fork with OpenCASCADE CAD kernel

https://github.com/0xSeren/neosolve
1•nakedneuron•15m ago•1 comments

Creativity in the form of archived web pages from the dawn of the internet

https://www.cameronsworld.net
2•momentmaker•16m ago•0 comments

How the social media ban could reshape how all of us use the internet

https://www.bbc.co.uk/news/articles/c1jy512r19ro
1•mmarian•17m ago•0 comments

Where the sun stood at the 2026 summer solstice

https://pilgrimapp.org/sunpath/2026-summer-solstice/
1•momentmaker•17m ago•0 comments

Solvespace Web Version

http://orthogonal.cc/solvespace/solvespace.html
1•nakedneuron•17m ago•0 comments

Developer ends Fornjot (CAD kernel) development

https://www.fornjot.app/blog/shutting-down-fornjot/
1•nakedneuron•18m ago•0 comments

Ask HN: Future of Programming?

1•anujmehta•20m ago•0 comments

Pakistan: The solar revolution nobody planned

https://janrosenow.substack.com/p/pakistan-the-solar-revolution-nobody
1•leonidasrup•20m ago•0 comments

Rent Is So High, New Yorkers Are Living with Nuns

https://www.wsj.com/lifestyle/rent-is-so-high-new-yorkers-are-living-with-nuns-00dac324
1•jawns•21m ago•0 comments

Huffman tree compressor and decompressor written in Clojure

https://github.com/netb258/huffman-tree/
1•netb258•23m ago•0 comments

Masochistic YouTuber Punishes Himself by Writing a First Person Shooter in COBOL

https://gizmodo.com/masochistic-youtuber-punishes-himself-by-writing-a-first-person-shooter-entir...
2•rbanffy•24m ago•0 comments

AI Made Me Braver

https://neilkakkar.com/ai-made-me-braver.html
1•neilkakkar•24m ago•0 comments

Local Inference

https://av.codes/blog/on-local-inference/
2•everlier•24m ago•0 comments

Tokoscope – Automatic LLM token compression and cost monitoring in 2 lines

https://tokoscope.com
1•emekuns•26m ago•0 comments

A viral doomsday scenario aims to shake Europe out of its AI complacency

https://www.theguardian.com/technology/2026/jun/20/europe-sleepwalking-ai-disaster-us-china
2•thm•29m ago•0 comments

Counterfactual Quantum Computation

https://en.wikipedia.org/wiki/Counterfactual_quantum_computation
2•dtj1123•33m ago•0 comments

How do you trick your brain into avoiding impulse purchases?

https://play.google.com/store/apps/details?id=com.priceoftime&hl=en_US
1•buddy_game•34m ago•0 comments

AstroSat – A modern 3D satellite globe tracker using browser-side SGP4

https://astrosat.app/
1•CommanderBlend•35m ago•1 comments

He tried to make RAM Man a new currency in 2011 – how did it go?

https://www.bbc.co.uk/news/articles/cy4ekgwpq42o
1•zeristor•39m ago•0 comments