frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

The newest AI boom pitch: Host a mini data center at your home

https://arstechnica.com/ai/2026/05/the-newest-ai-boom-pitch-host-a-mini-data-center-at-your-home/
1•Bender•1m ago•0 comments

Show HN: Clodcapture – save and resume Claude.ai chats across sessions

https://chromewebstore.google.com/detail/clodcapture/mhapejnhlmepeinjmlppoopcoicmgojb
1•leo_agent•2m ago•0 comments

Cherry Kearton: The eccentric influence on a young Sir David Attenborough

https://www.bbc.com/future/article/20260507-cherry-kearton-the-eccentric-influence-on-a-young-sir...
1•breve•6m ago•0 comments

Two more public disclosures, it will never stop

https://deadeclipse666.blogspot.com/2026/05/two-more-public-disclosures-it-will.html
1•Animux•6m ago•0 comments

Fuck You, Bambu Lab. Go Ahead, Sue Us

https://gamersnexus.net/fk-you-bambu-lab
2•pabs3•10m ago•0 comments

Anything that is underneath the cursor gets fed into Google's surveillance AI

https://mastodon.social/@mcc/116563821063587689
3•doener•12m ago•0 comments

RealtimePokerCalculator

https://www.mik.lt/RealtimePokerCalculator.zip
1•reztart•12m ago•0 comments

The Walled Garden of the Surveilled Web

https://kirill.korins.ky/articles/the-walled-garden-of-the-surveilled-web/
1•catap•13m ago•0 comments

Snack giant switches to black and white packaging as Iran war hits ink supplies

https://www.bbc.com/news/articles/c78k405j8pdo
1•breve•13m ago•0 comments

Show HN: Dexgram – Telegram to Codex Desktop Bridge for Windows

https://github.com/yashau/dexgram
1•yashau•15m ago•0 comments

Not so dusty: How tech is changing woodworking

https://www.bbc.com/news/articles/c747n11933eo
1•breve•16m ago•0 comments

Waymo recalls U.S. robotaxi fleet after vehicle swept away in flood

https://www.expressnews.com/business/article/waymo-recall-san-antonio-flood-22254607.php
1•zzzeek•17m ago•1 comments

OpenAI Trial – Greg Brockman's Journal

https://www.wsj.com/tech/musk-openai-trial-greg-brockman-diary-journal-6950270e
1•ilarum•20m ago•0 comments

Could At-Home Brain Stimulation Reduce Psychiatry's Reliance on S.S.R.I.s?

https://www.nytimes.com/2026/04/28/health/depression-at-home-brain-stimulation-fda.html
2•bookofjoe•20m ago•1 comments

Open source rule based guardrails for coding agents

https://github.com/falcosecurity/prempti/tree/main
1•knoxa2511•21m ago•0 comments

America is experiencing a productivity miracle

https://www.economist.com/finance-and-economics/2026/05/11/america-is-experiencing-a-productivity...
1•mackmcconnell•27m ago•0 comments

Turritopsis Dohrnii

https://en.wikipedia.org/wiki/Turritopsis_dohrnii
1•thelastgallon•29m ago•0 comments

Loading/running every LLM with 4M ctx in 3 clicks

https://old.reddit.com/r/Hugston/comments/1tbgrbb/4_million_ctx_for_every_ai_llm_model/
1•trilogic•30m ago•0 comments

DuckDB Quack Announcement [video]

https://www.youtube.com/watch?v=RQBhuL9Ve8g
1•fredguth•32m ago•0 comments

The Unmet Needs Index

https://www.convoke.bio/blog/introducing-the-unmet-needs-index
3•ray__•35m ago•0 comments

How AI Is Making Us All Dumber [video]

https://www.youtube.com/watch?v=eSABedBwZjQ
2•mooreds•37m ago•0 comments

All the demons hiding in your AIs

https://drtompollak.substack.com/p/all-the-demons-hiding-in-your-ais
1•gmays•37m ago•0 comments

Companies start getting tariff refunds after Supreme Court decision

https://www.cnbc.com/2026/05/12/trump-tariff-refunds.html
2•tcp_handshaker•38m ago•0 comments

Apple will soon start using AI-generated presenters on its Sales Coach app

https://9to5mac.com/2026/05/12/apple-will-soon-start-using-ai-generated-presenters-on-its-sales-c...
1•cdrnsf•38m ago•0 comments

Twin brothers wipe 96 government databases minutes after being fired

https://arstechnica.com/tech-policy/2026/05/drop-database-what-not-to-do-after-losing-an-it-job/
5•jnord•39m ago•1 comments

The revolt against I-Ready: Private equity-backed education software faces fury

https://www.nbcnews.com/news/education/iready-school-software-faces-parent-teacher-student-fury-r...
1•Umofomia•40m ago•0 comments

I Bought a "Junk" PSP from Japan: Here's How It Went

https://gardinerbryant.com/i-bought-a-junk-psp-from-japan-heres-how-it-went/
1•Kate0CoolLibby•40m ago•0 comments

Subvert: The music platform owned by its community

https://www.subvert.fm/
1•vectordust•41m ago•0 comments

Preview bill is now available

https://copilot-billing-preview.github.com/
1•predkambrij•44m ago•0 comments

Empathy as Principal Computation Substrate

1•mimoos•49m ago•0 comments