frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Bounded Rationality

https://en.wikipedia.org/wiki/Bounded_rationality
1•thunderbong•36s ago•0 comments

The Neocloud Boom

https://cloudedjudgement.substack.com/p/clouded-judgement-52226-the-neocloud
1•gmays•2m ago•0 comments

I bet everything on eight weeks: solo #1 on MTEB English v2

https://sentimark.ai/blog/i-bet-everything-on-eight-weeks/
1•voxell_code•2m ago•0 comments

The Legitimation of Shareholder Primacy (2025)

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5120765
1•mooreds•4m ago•0 comments

Miasma worms its way onto GitHub as attack kit goes open source

https://www.theregister.com/cyber-crime/2026/06/09/miasma-supply-chain-attack-toolkit-goes-public...
1•Timofeibu•5m ago•0 comments

No Votes, No Sales, No Suits

https://www.businesslawprofessors.com/2026/05/no-votes-no-sales-no-suits/
1•mooreds•5m ago•0 comments

Master: A Rails-style full-stack framework for Node.js

https://masterjs.org
1•xbudik•12m ago•1 comments

Show HN: Zillow Image Downloader

https://chromewebstore.google.com/detail/zillow-image-downloader/bgfmccnkpdemklleoobkeghghcjakllh
1•qwikhost•20m ago•0 comments

AI Companies Investing Billions in Residential Proxy

https://nanog.org/events/nanog-97/content/5771
2•lakoshi•24m ago•1 comments

The Countdown to a Major Oil Price Surge Has Begun

https://oilprice.com/Energy/Energy-General/The-Countdown-to-a-Major-Oil-Price-Surge-Has-Begun.html
3•iamnothere•28m ago•2 comments

Ring Holders Club

https://www.ringholders.club
2•pipnonsense•29m ago•1 comments

RISC-V SpacemiT K3 Boot Process

https://blog.ludovic.dev/2026/06/08/spacemit-k3-boot-process.html
2•luyu_wu•30m ago•0 comments

"iNTERFACEWARE is now entering it's fifth phase"

https://www.interfaceware.com
2•nkrumm•32m ago•0 comments

Firefox adds Google Play Integrity checks for it's AI features

https://bugzilla.mozilla.org/show_bug.cgi?id=2015109
2•drewfax•34m ago•1 comments

CC-Ledger: Per-PR cost and token analyzer for devs tired of tokenmaxxing

https://github.com/delta-hq/cc-ledger
2•tejpal-diffuse•37m ago•0 comments

Show HN: Free summary, fact check, ELI5 of any text on CPU

https://github.com/kouhxp/fftext
2•mrkn1•38m ago•0 comments

FreezeTube – 200-line Chrome extension that disables YouTube's infinite scroll

https://chromewebstore.google.com/detail/freezetube/jbhihladabkggdgbhkgdgpbjjogehgnk
2•OmarElboray•38m ago•0 comments

Ubiquitination of glycogen and metabolites in cells and tissues

https://www.nature.com/articles/s41586-026-10548-x
2•PaulHoule•39m ago•0 comments

Vercel Settles with DOJ for Not Complying with Electronic Search Warrant

https://www.justice.gov/opa/pr/contempt-proceedings-failure-comply-search-warrant-conclude-vercel...
2•benjaminhays•40m ago•0 comments

GTM Is a Creative Act

https://atlas.attio.com/gtm-is-a-creative-act
2•tanishqkanc•41m ago•0 comments

Bus, cars, houses torched in Belfast anti-immigration riots

https://www.thetimes.com/uk/crime/article/bus-cars-protests-belfast-immigration-t59n3lsf8
2•alephnerd•41m ago•2 comments

Ax-engine: Native Apple Silicon ML inference runtime with a fast Rust core

https://github.com/defai-digital/ax-engine
2•automatosx•41m ago•0 comments

Tests suggest Russian satellites can jam GPS on a continental scale

https://arstechnica.com/space/2026/06/tests-suggest-russian-satellites-can-jam-gps-on-a-continent...
4•jonbaer•41m ago•0 comments

Show HN: KnowledgeMCP – Turn any docs into an MCP endpoint (0 LLM at query time)

https://github.com/hashwnath/KMCP
2•Hashwanths•45m ago•0 comments

Tesla fans, feeling duped, take the Elon Musk-owned company to court

https://www.rnz.co.nz/news/business/597777/tesla-fans-feeling-duped-take-the-elon-musk-owned-comp...
6•billybuckwheat•45m ago•0 comments

Ottawa's bill regulating social media, AI expected to include age restrictions

https://www.cbc.ca/news/politics/online-harms-social-media-ban-youth-teens-9.7228651
5•pseudolus•46m ago•0 comments

We had to build new evals for Fable

https://hex.tech/blog/fable-evals/
2•izzymiller•47m ago•0 comments

Task Force on the Declassification of 'Federal Secrets'

https://www.youtube.com/watch?v=4couUZtAkew
2•keepamovin•49m ago•0 comments

macOS Container Machines

https://github.com/apple/container/blob/main/docs/container-machine.md
53•timsneath•53m ago•12 comments

Claude Fable 5's system prompt leaked

https://twitter.com/elder_plinius/status/2064478648057610422
5•rob•56m ago•0 comments