frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Joe Rogan Podcast

https://opencontent.platphormnews.com/content/apple_podcasts-2494---chamath-palihapitiya-mot43kjw
1•bignerdlolz•36s ago•0 comments

OpenAI Sells Statsig to Amplitude

https://amplitude.com/statsig
1•babelfish•36s ago•0 comments

Show HN: Rival AI – AI compliance agents and regulatory corpus

https://tryrival.ai
1•estradanicolas•1m ago•0 comments

"Great Refactor" FRO to secure key OSS by rewriting C/C++ into memory-safe Rust

https://ifp.org/the-great-refactor/
1•thoughtpeddler•1m ago•0 comments

Show HN: CanvasGPT

https://canvasgpt.com
1•danielravina•1m ago•0 comments

Being Truthful and Precise About Revenue – YC's Official Advice

https://twitter.com/garrytan/status/2048017824895909901
1•nowflux•3m ago•0 comments

Rice cooker leads to £260k payout to sacked university cleaner

https://www.bbc.co.uk/news/articles/c3d25p3kpeno
1•alvis•4m ago•0 comments

Why Most Product Tours Get Skipped

https://productonboarding.com/articles/why-product-tours-get-skipped
1•pancomplex•7m ago•0 comments

'Microshifting' puts a new spin on 9-to-5 schedules

https://apnews.com/article/microshifting-work-time-flexible-schedule-balance-97a98519916b447cd60c...
1•CharlesW•9m ago•0 comments

The Camera-Shy Hoodie

https://www.macpierce.com/the-camera-shy-hoodie
2•gscott•9m ago•0 comments

TRON Project

https://en.wikipedia.org/wiki/TRON_project
1•haunter•11m ago•1 comments

Show HN: Lazycron

https://github.com/mc7h/lazycron
1•jinnko•14m ago•0 comments

Popular Kubernetes Networking Project Antrea Compromised

https://opensourcemalware.com/blog/antrea-compromise2
2•6mile•16m ago•1 comments

Issue tracking for AI-assisted software work

https://github.com/wesm/kata
2•dmpetrov•17m ago•0 comments

Galaxysandbox.app

https://galaxysandbox.app/
1•cocodill•19m ago•0 comments

Quadtrack – Groovebox for Mac and PC and Amiga

https://www.youtube.com/watch?v=HzHg-dnCEGM
1•devrundown•20m ago•0 comments

Microplastics have been found to interact with the gut microbiome

https://theconversation.com/microplastics-have-been-found-to-interact-with-the-gut-microbiome-her...
2•PaulHoule•22m ago•0 comments

Our AI started a cafe in Stockholm

https://andonlabs.com/blog/ai-cafe-stockholm
3•lukaspetersson•25m ago•0 comments

.de Domain Down

https://uberspace.social/@hallo/116523875425552144
2•riidom•25m ago•1 comments

HooliChat – ChatGPT, but you're Gavin Belson and it's run by Hooli

https://kouh.me/hoolichat
3•mrkn1•25m ago•1 comments

QBittorrent v5.2.0 Release

https://www.qbittorrent.org/news#sun-may-03rd-2026---qbittorrent-v5.2.0-release
3•neustradamus•26m ago•1 comments

Xbox winding down Copilot on mobile and will stop dev of Copilot on console

https://twitter.com/asha_shar/status/2051746410660593933
3•ceejayoz•27m ago•0 comments

RFK Jr. launches plan to curb 'overprescribing' of psychiatric drugs

https://www.cnn.com/2026/05/05/health/rfk-jr-overprescribing-psychiatric-drugs-wellness
3•Bender•27m ago•1 comments

Why every organization should make it easy to report security flaws

https://this.weekinsecurity.com/why-every-organization-should-make-it-easy-to-report-security-flaws/
1•fanf2•30m ago•0 comments

Elephant/Goldfish Pattern for Claude, Codex and Gemini

https://github.com/vshvedov/elephant-goldfish
1•vladcodes•32m ago•1 comments

10T samples a day: Scaling beyond traditional monitoring infra at Databricks

https://www.databricks.com/blog/10-trillion-samples-day-scaling-beyond-traditional-monitoring-inf...
3•jwbeyda•32m ago•0 comments

Let's not fool ourselves about AI taking jobs. It's humans, laying humans off

https://circuitbored.com/viewtopic.php?t=246
13•winternett•33m ago•0 comments

Please Stop Making Reductive Claims About Economic Hardship and Mental Health

https://freddiedeboer.substack.com/p/please-stop-making-reductive-claims
1•paulpauper•33m ago•0 comments

X user tricks Grok into sending them $200k

https://www.dexerto.com/entertainment/x-user-tricks-grok-into-sending-them-200000-in-crypto-using...
2•sjsdaiuasgdia•33m ago•0 comments

Why Stocks Keep Going Up

https://www.theatlantic.com/economy/2026/05/stock-market-iran-war-bullish/687041/
3•paulpauper•34m ago•1 comments