frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Ask HN: How did the industry settle on weekly limits?

1•saratogacx•1m ago•0 comments

Beyond Phishing: The Control-Plane Risk of Recursive Trust

https://zenodo.org/records/19432540
1•rogelsjcorral•3m ago•0 comments

Sustaining innovation has failed us. It's time to think more radically

https://werd.io/sustaining-innovation-has-failed-us-its-time-to-think-more-radically/
1•benwerd•3m ago•0 comments

One Developer, Two Dozen Agents, Zero Alignment

https://maggieappleton.com/zero-alignment
1•herbertl•7m ago•0 comments

I Traded My Time for Security Without Realizing It. Here's What That Costs You

https://comuniq.xyz/post?t=992
1•01-_-•8m ago•0 comments

You can parse an .env file as an .ini with PHP – but there's a catch

https://shkspr.mobi/blog/2026/04/you-can-parse-an-env-file-as-an-ini-with-php-but-theres-a-catch/
2•Brajeshwar•15m ago•0 comments

ClawCodex – Claw Code with Upgrades

https://github.com/Skynet-Pro-Plus/ClawCodex
2•skynetproplus•18m ago•0 comments

Magic by Return of Post: How Mail Order Delivered the Occult

https://publicdomainreview.org/essay/magic-by-return-of-post/
2•Vigier•18m ago•0 comments

Prototown: America's answer to China is hiding in rural Texas

https://www.youtube.com/watch?v=qIob2-ugCO0
2•rdl•20m ago•0 comments

Who's developing Golden Dome's orbital interceptors–if they're ever built

https://arstechnica.com/space/2026/04/this-is-whos-developing-golden-domes-orbital-interceptors-i...
2•rbanffy•20m ago•0 comments

Our Survey on Creativity, Writing, and Reading in the Age of AI

https://ellipsus.com/blog/survey-on-writing-and-ai
2•fao_•21m ago•0 comments

Mechanical load inhibits cancer growth in mouse and human hearts

https://www.science.org/doi/10.1126/science.ads9412
2•_Microft•23m ago•0 comments

The AI Industry Is Discovering That the Public Hates It

https://newrepublic.com/article/209163/ai-industry-discovering-public-backlash
52•chirau•24m ago•16 comments

A TUI to browse what Claude Code remembers about your projects

https://github.com/lu-zhengda/claude-mem-viz
2•zhengda-lu•24m ago•1 comments

Memory in the Age of AI Agents

https://arxiv.org/abs/2512.13564
2•fittingopposite•24m ago•1 comments

Show HN: Dial-up-loader, old-school modem terminal and synthesises dial-up

https://github.com/klexas/DialUploader
2•bilekas•25m ago•0 comments

Rcarmo/haiku-ARM64-build: Build environment and automation

https://github.com/rcarmo/haiku-arm64-build
2•rcarmo•26m ago•0 comments

Trump Fires the National Science Board

https://www.theverge.com/science/918769/trump-fires-the-entire-national-science-board
6•aaronbrethorst•27m ago•1 comments

The Merge (2017)

https://blog.samaltman.com/the-merge
3•andsoitis•39m ago•1 comments

Grove: A simple snappy TUI repo+worktree+shell manager

https://github.com/sebasv/grove/
2•sebasv_•40m ago•1 comments

Show HN: Quantum Temporal Cryptography – spec for interplanetary trust chains

https://zenodo.org/records/19770184
2•vibeagentmaking•48m ago•0 comments

Boats crash/break and can kill their passengers when falling certain distances

https://bugs.mojang.com/browse/MC/issues/MC-119369
3•zdw•52m ago•0 comments

Show HN: Talisman – A Android instrument played with two thumbs

https://talisman.by-igor.com/
3•ycosynot•53m ago•1 comments

Father warns of extremist network 764 after his daughter was 'groomed' on Roblox

https://nationalpost.com/news/canada/b-c-father-warns-of-extremist-online-network-764-after-his-d...
5•qwertyuiop_•53m ago•1 comments

Reconnecting a Post-Pandemic World

https://github.com/DaBena/Brezn
2•brezn•56m ago•1 comments

Pyptx – Write PTX Kernels in Python

https://github.com/patrick-toulme/pyptx
3•bobrenjc93•58m ago•0 comments

Show HN: LoreData – generate lore-accurate personas from pop culture universes

https://loredata.orchidfiles.com/
2•theorchid•59m ago•0 comments

Ask HN: Is anyone using Zoho, Lark or Proton?

2•wasimsk•1h ago•2 comments

Blog prize for big questions about AI

https://www.dwarkesh.com/p/blog-prize
3•gmays•1h ago•0 comments

The Professors Are Using ChatGPT, and Some Students Aren't Happy About It

https://www.nytimes.com/2025/05/14/technology/chatgpt-college-professors.html
2•coldsunrays•1h ago•0 comments