frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

The End of Violence

https://www.barnesandnoble.com/w/the-end-of-violence-gary-slutkin-md/1148640975
1•rendx•3m ago•0 comments

Blink – AI powered Search. A knowledge destination

https://blink-oi.vercel.app
1•Pascal1997•3m ago•0 comments

Google Says Prompt Injection Moving from Theory into Real Abuse

https://www.searchengineworld.com/google-says-prompt-injection-moving-from-theory-into-real-abuse
1•cromulent•4m ago•0 comments

How Does SaintQuant's AI Bot Enable 24/7 Crypto Trading?

https://www.bitdeal.net/crypto-trading-bot-development
1•harrisonrichrd•11m ago•0 comments

The Art of Thought (1926)

https://archive.org/details/theartofthought
1•georgestrakhov•16m ago•0 comments

A short-lived Linux box for every run

https://crabbox.sh
1•nor0x•17m ago•0 comments

UK Defence Tech Jobs for the week of 2026-05-04

https://defencetechjobs.substack.com/p/defence-tech-jobs-for-the-week-of-026
1•foxandlion•20m ago•0 comments

Does AI lead us back to a beautiful document based waterfall approach? [video]

https://www.youtube.com/watch?v=iW-4faoku8A
1•RebootStr•21m ago•0 comments

Show HN: Screen time, but with idle game mechanics

https://apps.apple.com/pl/app/oh-my-hours/id6760450002
1•jarko27•22m ago•0 comments

To fsync or not, that is a question

https://fractalbits.com/blog/to-fsync-or-not/
1•zzsheng•23m ago•0 comments

Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

https://blog.cloudflare.com/code-orange-fail-small-complete/
1•boarush•26m ago•0 comments

It's Time to Be Right

https://brooker.co.za/blog/2026/04/30/be-right.html
2•r4um•26m ago•0 comments

Eight LLM agents wrote 1.7M words; two refused, even when ordered

https://zenodo.org/records/20020017
2•norikaoda•28m ago•1 comments

McDonald's is quietly ending the era of self-serve soda fountains nationwide

https://www.foxbusiness.com/fox-news-food-drink/mcdonalds-quietly-ditching-popular-in-store-featu...
1•not4uffin•32m ago•1 comments

Scripting on the JVM with Java, Scala, and Kotlin

https://mill-build.org/blog/19-scripting-on-the-jvm.html
1•lihaoyi•33m ago•0 comments

MathNet:30k competition math problems for AI mathematical reasoning benchmarking

https://mathnet.mit.edu/
2•nill0•33m ago•1 comments

JSXGraph – Web-Based Interactive Mathematics Visualisation

https://jsxgraph.uni-bayreuth.de/home/
1•the-mitr•34m ago•0 comments

NetHack 5.0.0 Released

https://nethack.org/common/index.html
1•isaacfrond•39m ago•0 comments

Local Harness Benchmark: Pi Coding Agent vs. OpenCode with Qwen3.6 35B A3B

https://grigio.org/local-harness-benchmark-pi-coding-agent-vs-opencode/
1•grigio•41m ago•0 comments

Show HN: Generate SKILL.md files from URLs, in the browser

https://www.getskillify.dev/
5•lukyvj•42m ago•0 comments

Hanakai: Family of Ruby tools to help you write clearer, more maintainable apps

https://hanakai.org/
2•thunderbong•43m ago•0 comments

My favorite device is a Chromebook, without ChromeOS

https://kokada.dev/blog/my-favorite-device-is-a-chromebook-without-chromeos/
2•birdculture•44m ago•0 comments

The Download: the North Pole's future and humanoid data

https://www.technologyreview.com/2026/04/30/1136713/the-download-north-pole-future-humanoid-data/
1•joozio•45m ago•0 comments

Key Components of a Linux Distribution for AI Agents

https://www.ericburel.tech/blog/agentic-linux-distribution
1•eric-burel•46m ago•0 comments

Swamp Club – Your Agent Builds the Tools, Then Runs Them

https://swamp.club/
1•mpweiher•49m ago•0 comments

Show HN: My "home rig" for iterative attribute-weighted LLM benchmarking

https://github.com/yuvhaim-gif/LLM_InSight
1•yuvalhaim•50m ago•1 comments

Red Alert 2 UI in real life [video]

https://www.youtube.com/watch?v=gvarqqtdcOQ
1•nomilk•56m ago•0 comments

Show HN: Mirror a terminal to your phone – E2E encrypted, peer-to-peer,no daemon

https://github.com/lucarp/terminalsync
1•rafaepta•56m ago•0 comments

Progressive Web Components

https://arielsalminen.com/2026/progressive-web-components/
3•mpweiher•1h ago•0 comments

Show HN: I built Luxury Intel, because every luxury hotel on Google is 4.8

https://luxuryintel.co/
1•minvariance•1h ago•0 comments