frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

The four horsemen behind Postgres outages

https://malisper.me/the-four-horsemen-behind-thousands-of-postgres-outages/
1•r4um•4m ago•0 comments

Alaska megatsunami was second largest ever recorded

https://www.bbc.com/news/articles/c1m253033m4o
2•BaudouinVH•4m ago•0 comments

The "AI Job Apocalypse" is a complete fantasy

https://www.a16z.news/p/the-ai-job-apocalypse-is-a-complete
1•thoughtpeddler•7m ago•0 comments

Show HN: Codex Pets – tiny animated pets for web apps

https://froemic.github.io/codex-pets-web/
1•froemic•8m ago•1 comments

Notes on GPT 5.x Model Regressions

https://taoofmac.com/space/notes/2026/05/07/0600
2•rcarmo•14m ago•0 comments

I Let Agents Run a Retail Store. Here Is What Happened

https://shish.substack.com/p/i-let-agents-run-a-retail-store-here
1•5h15h•14m ago•0 comments

Halt of Google fine? Serious accusations against von der Leyen

https://www.heise.de/en/news/Halt-of-Google-fine-Serious-accusations-against-von-der-Leyen-112848...
2•vrganj•16m ago•0 comments

Men, masculinities, and the planet at the end of (M)Anthropocene

https://www.tandfonline.com/doi/full/10.1080/18902138.2025.2576458
2•DamonHD•20m ago•0 comments

Craft-guilds of the thirteenth century in Paris (1915)

https://gutenberg.org/cache/epub/78623/pg78623-images.html
1•petethomas•23m ago•0 comments

GigaQR – Dynamic QR Codes and Analytics for Teams – GigaQR

https://gigaqr.com
2•darkrishabh•30m ago•0 comments

Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

https://github.com/darkrishabh/agent-skills-eval
4•darkrishabh•31m ago•0 comments

Architectural Framework for Agentic AI in Identity and Eligibility

https://wwps.microsoft.com/content/agentic-ai-identity-eligibility/
2•krautburglar•33m ago•0 comments

The Bean is here (New "pointing stick" mouse by Ploopy)

https://blog.ploopy.co/the-bean-is-here-435
3•arcanemachiner•37m ago•2 comments

Mythos Shows AI Weapons Inspectors Need Sharp Teeth

https://www.bloomberg.com/opinion/articles/2026-05-07/anthropic-s-mythos-shows-ai-weapons-inspect...
4•helsinkiandrew•39m ago•0 comments

Show HN: Trust – Coding Rust like it's 1989

https://github.com/wojtczyk/trust
3•wojtczyk•45m ago•0 comments

How to Buy Cheap Claude Tokens in China

https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens-in
4•tristanj•48m ago•0 comments

DNA donors help identify Franklin expedition members

https://www.theglobeandmail.com/canada/science/article-dna-donors-help-identify-franklin-expediti...
3•petethomas•50m ago•0 comments

I'm now "writing" my blog

https://www.310networks.com/posts/im-now-writing-my-blog/
3•kookster310•50m ago•0 comments

Show HN: StackSense – AI/data/systems engineering knowledge graph

https://www.stacksense.cc/
3•langtang1996•51m ago•0 comments

Show HN: Password-less authentication via Ramanujan's 1/π series and Nvidia-B200

https://zenodo.org/records/20049892
2•Prakash_1•56m ago•0 comments

MRC Protocol: Supercomputer networking to accelerate large scale AI training

https://openai.com/index/mrc-supercomputer-networking/
3•eyalitki•57m ago•1 comments

The surprisingly complex journey to text-selectable client-side generated PDFs

https://sdocs.dev/blogs/journey-to-pdf-generation
5•FailMore•1h ago•0 comments

Actor alleges James Cameron used her teenage face to create 'Avatar' character

https://www.nbcnews.com/pop-culture/movies/actor-alleges-james-cameron-teen-face-create-avatar-ch...
5•anigbrowl•1h ago•0 comments

Show HN: CRO analysis tool that finds conversion issues

https://spectry.io/analyze
3•spectry•1h ago•0 comments

Show HN: Kstack – Skill pack for monitoring/troubleshooting K8s in Claude Code

https://github.com/kubetail-org/kstack
3•andres•1h ago•0 comments

Great Attractor

https://en.wikipedia.org/wiki/Great_Attractor
2•the-mitr•1h ago•0 comments

Don't Automate Your Moat: Matching AI Autonomy to Risk and Competitive Stakes

https://www.oreilly.com/radar/dont-automate-your-moat-matching-ai-autonomy-to-risk-and-competitiv...
3•knightabu•1h ago•1 comments

LAWS: A new transform operation turning LLM inference into cheap cache lookups

https://arxiv.org/abs/2605.04069
6•EGreg•1h ago•1 comments

Elon Musk's Lawyers Ask OpenAI's President Why He Is Worth $30B

https://www.nytimes.com/2026/05/04/technology/elon-musk-greg-brockman-openai-trial.html
6•1vuio0pswjnm7•1h ago•2 comments

Publishers sue Meta, claiming it violated copyrights in training AI with books

https://www.washingtonpost.com/national-security/2026/05/05/publishers-sue-meta-ai-copyright/
2•1vuio0pswjnm7•1h ago•0 comments