frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•11mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

You Are Not Your Job

https://jry.io/writing/you-are-not-your-job/
2•jryio•2m ago•0 comments

A $25B Moonshot: Tesla Prepares to Launch Terafab in Four Days

https://www.forbes.com/sites/jonmarkman/2026/03/17/a-25-billion-moonshot-tesla-prepares-to-launch...
1•surprisetalk•5m ago•0 comments

Spyware once used by governments is now spreading to cybercriminals

https://www.axios.com/2026/03/21/iphone-spyware-is-everyones-problem-now
2•Brajeshwar•6m ago•0 comments

Common Package Specification

https://cps-org.github.io/cps/
1•kergonath•10m ago•0 comments

An AI trainer for difficult conversations and social skills

https://nerveless.app
1•bugigas•11m ago•0 comments

Show HN: Actufeed

https://actufeed.com
1•dancode7•12m ago•0 comments

Reddit is weighing identity verification methods to combat its bot problem

https://www.engadget.com/social-media/reddit-is-weighing-identity-verification-methods-to-combat-...
1•pseudalopex•13m ago•0 comments

Show HN: Weekend Project, Built a Vanilla JavaScript Online Code Editor (2022)

https://codeeverywhere.ca/post.php?id=73&title=Weekend-Project:-Online-Code-Editor
1•coevcan•16m ago•0 comments

A Coherent Vision for the Future of Version Control

https://bramcohen.com/p/manyana
2•c17r•16m ago•0 comments

Micron says AI-driven memory crunch is unprecedented

https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/micron-says-ai-driven-memory...
1•teleforce•17m ago•0 comments

Show HN: ClawRun – Deploy AI agents into secure sandboxes with one command

https://clawrun.sh/?hn
1•afshinmeh•19m ago•0 comments

Nvidia to sell 1M chips to Amazon by end of 2027 in cloud deal

https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/nvidia-sell-1-million-chips-...
3•teleforce•21m ago•0 comments

The Longevity Scam

https://www.theatlantic.com/health/2026/02/longevity-medicine-profit-oversold/686049/
1•bookofjoe•22m ago•1 comments

People are selling their identities to train AI – but at what cost?

https://www.theguardian.com/technology/2026/mar/21/ai-trainers-identity-cost
1•Brajeshwar•23m ago•0 comments

NTLM and SMB go opt-in in curl

https://daniel.haxx.se/blog/2026/03/22/ntlm-and-smb-go-opt-in/
1•jandeboevrie•23m ago•0 comments

Zipf's Law and Sharding

https://nvartolomei.com/zipf-s-law-and-sharding/
2•Malp•23m ago•1 comments

Llamafile Reloaded: What's New in v0.10.0

https://blog.mozilla.ai/llamafile-reloaded-whats-new-in-v0-10-0/
1•thomascountz•24m ago•0 comments

I Hate: Programming Wayland Applications

https://www.p4m.dev/posts/29/index.html
2•dwdz•26m ago•0 comments

L.A. County CEO, who got $2M settlement, is resigning

https://www.latimes.com/california/story/2026-03-21/la-county-ceo-who-got-2-million-settlement-is...
1•silexia•27m ago•0 comments

AI Interview Trolling [video]

https://www.youtube.com/watch?v=n4lzOYAJmDc
1•lopespm•27m ago•0 comments

Show HN: Writermark – Open protocol to help prove text is human-written

https://www.writermark.org/
1•jhyolm•32m ago•0 comments

OpenAI to introduce ads to all ChatGPT free and Go users in US

https://www.reuters.com/business/media-telecom/openai-expand-ads-chatgpt-all-free-low-cost-users-...
1•tlogan•33m ago•0 comments

Side-stepping the Secretary Problem, unwittingly

https://www.evalapply.org/posts/side-step-secretary-problem-hiring/index.html
2•adityaathalye•33m ago•0 comments

The Quantum Decade A playbook for achieving awareness, readiness, and advantage

https://www.ibm.com/downloads/documents/us-en/107a02e97dc8fd16
2•hkhn•35m ago•0 comments

Ask HN: Will LLM-commodification cause growth to continue without a correction?

1•AbstractH24•40m ago•9 comments

Trump is showing Beijing how to seize Taiwan

https://www.japantimes.co.jp/commentary/2026/03/22/world/trump-shows-how-to-take-taiwan/
4•geox•40m ago•0 comments

Customizing the Emacs Email Experience with Mu4e

https://brainbaking.com/post/2026/01/customizing-the-emacs-email-experience-with-mu4e/
1•mrtz•41m ago•0 comments

Separating the hype from the real in AI assisted development

https://mlolson.github.io/blog/2026/03/21/separating-the-hype-from-the-real-in-ai-assisted-develo...
1•LordHumungous•42m ago•0 comments

Simple attachment lets your camera mine the skies for lightning

https://newatlas.com/photography/bolt-hunter-lightning-camera-trigger/
1•Brajeshwar•42m ago•0 comments

Show HN: I built an outdoors data API for developers

https://outdex.dev/
2•caderosche•44m ago•0 comments