frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Touchscreen MacBook '100% Confirmed,' Says Reputable Leaker

https://www.macrumors.com/2026/06/11/touchscreen-macbook-confirmed-leaker/
1•arnejenssen•2m ago•0 comments

Cranelift

https://cranelift.dev/
1•tosh•3m ago•0 comments

Evaluate Your Agentic Tooling

https://www.peterbaumgartner.com/blog/e2e-evals-agents/
1•apwheele•3m ago•0 comments

Fun GIF facts learned while writing the GIF loader for Godot

https://vt.social/@ExpiredPopsicle/116663757966235817
1•mtmail•7m ago•0 comments

Project Brain2.0–curated project memory for ClaudeCode(+ any file-reading agent)

https://github.com/login
1•Slav_fixflex•8m ago•0 comments

Show HN: I built a WebAudio editor that coding agents can drive

https://audio.awsm.fun
1•dakom•9m ago•0 comments

World’s first crewed solid-state flight electrifies aviation's future

https://newatlas.com/aircraft/helios-horizon-first-crewed-solid-state-flight-aviation/
1•breve•10m ago•0 comments

Jjc: Non-interactive hunk-level operations for Jujutsu

https://tangled.org/akashina.tngl.sh/jjc
2•birdculture•12m ago•0 comments

Suunto Spark Review: The Perfect Pair for Runs and Rides

https://www.wired.com/review/suunto-spark/
1•joozio•12m ago•0 comments

Stack on a Budget (Free Tier Driven Development FTDD)

https://github.com/255kb/stack-on-a-budget
1•gslin•18m ago•0 comments

Does MSG Make Food Taste Better? [video]

https://www.youtube.com/watch?v=RCa43F2NQnY
1•Cider9986•19m ago•0 comments

UK police officer under criminal investigation over alleged use of AI

https://www.ft.com/content/514bab88-788c-4d48-a140-03597860bdb6
5•scrlk•20m ago•2 comments

Home Opus: Local Deployment of Frontier AI Weights (Post-Fable 5 Ban)

https://github.com/zanirou/home-opus-whitepaper
1•Zanirou•21m ago•0 comments

Relativity for Retired Engineers

https://arxiv.org/abs/2605.21660
1•raattgift•22m ago•0 comments

Silent but Deadly: Welrod Mk IIA

https://www.youtube.com/watch?v=d12AjvEsaHg
1•chistev•24m ago•0 comments

Free tool to download any YouTube video from inside YouTube itself

https://www.youtubexx.com/
1•freeinvoiceflow•27m ago•0 comments

Humans will soon be able to spend longer in space

https://www.telegraph.co.uk/news/2026/06/13/nasa-humans-spend-longer-in-space/
1•wjb3•28m ago•0 comments

There Is a Fake Job Scam Targeting Developers on Reddit

https://old.reddit.com/r/webdev/comments/1u5c2e9/there_is_a_fake_job_scam_targeting_developers_on/
1•speckx•28m ago•0 comments

Show HN: Wunjo Design as a browser vector editor with Git-style version control

https://wunjo.online/design
1•wladradchenko•29m ago•1 comments

Of Termites and Tokens

https://tomcritchlow.com/2026/06/08/termites-tokens/
1•kiyanwang•29m ago•0 comments

Wordsworth Way

https://www.ullswaterheritage.org/wordsworth-way
1•jruohonen•37m ago•0 comments

ContinuumNexus – Simple API monitoring with email alerts

https://continuumnexus.com/
1•fldivrus•38m ago•0 comments

Switching from Kagi Search to Uruky

https://rewiring.bearblog.dev/kagi-to-uruky/
1•Mossy9•39m ago•0 comments

Ask HN: Why does Deno Deploy block /wp-content?

1•tisizi•42m ago•2 comments

Post-Quantum Attestation in Production:Long-Lived Record Integrity – AffixIO

https://www.affix-io.com/whitepapers/post-quantum-attestation/
1•affixio•43m ago•0 comments

When asking for a date, aim for no

https://talk.bradwoods.io/blog/aim-for-no
1•bradwoodsio•46m ago•0 comments

ADHD Burnout vs. Normal Burnout: Why Rest Alone Doesn't Fix It

https://www.adhdwithsofia.com/blog/adhd-burnout-vs-normal-burnout-why-rest-alone-doesnt-fix-it
1•MelloS•56m ago•3 comments

Docfai.app is launched Get your 7 day free trial

https://docfai.app/features/
1•julien_base31•59m ago•1 comments

A Quick Primer on DuckDB, S3, and Plotly Studio

https://chris-parmer.com/duckdb-s3-and-plotly/
1•chriddyp•1h ago•0 comments

Oracle is reducing their free tier quota from 15th june

https://docs.oracle.com/en-us/iaas/Content/FreeTier/freetier_topic-Always_Free_Resources.htm
4•sairam_h•1h ago•9 comments