frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•12mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Running Faster to Go Nowhere: The AI Adoption Trap

https://educatedguesser.substack.com/p/running-faster-to-go-nowhere-the
1•jerrygarcia•2m ago•0 comments

Attention Is All You Need

1•raunaksingwi•2m ago•0 comments

GPT Image 2 Launch

https://twitter.com/arena/status/2046670703311884548
1•twtw99•3m ago•0 comments

Usmnt players designed the boldest kits in generations for World Cup 2026

https://www.theguardian.com/football/2026/mar/16/usmnt-kits-world-cup-2026
1•PaulHoule•5m ago•0 comments

Flex Routing (EU and EFTA) for Copilot LLM Data Processing

https://learn.microsoft.com/en-us/microsoft-365/copilot/copilot-flex-routing
1•raffael_de•5m ago•1 comments

I don't want your PRs anymore

https://dpc.pw/posts/i-dont-want-your-prs-anymore/
2•speckx•6m ago•0 comments

Courier: Real-Time Messaging for ESP32

https://interconnected.org/home/2026/04/21/courier
1•beardicus•6m ago•0 comments

Bond: A new AI social network that turns memories into discoveries

https://www.bond.now/
1•johndavisonr•8m ago•0 comments

Nothing ever dies. It merely becomes embarrassing

https://www.experimental-history.com/p/nothing-ever-dies-it-merely-becomes
1•paulpauper•8m ago•0 comments

The New Age of Performance Anxiety

https://www.theatlantic.com/culture/2026/04/screen-people-stage-fright-performance-anxiety/686803/
1•paulpauper•9m ago•0 comments

What It's Like to Live with an Experimental Brain Implant

https://spectrum.ieee.org/bci-user-experience
1•digital55•9m ago•0 comments

Wearable health tech might be Tim Cook's greatest legacy

https://www.theverge.com/tech/915976/tim-cook-john-ternus-apple-watch-health-tech-wearables
1•paulpauper•9m ago•0 comments

The Fossils 1969

https://www.youtube.com/watch?v=bn1uhSS1cDo
1•indigodaddy•9m ago•0 comments

Amtrak's "1MB" National Route Map PDF Is a 574MB File

https://www.amtrak.com/train-routes
1•tech234a•10m ago•0 comments

Iconiq, Go-To Wealth Adviser for Tech's Elite, Is Putting Billions into AI

https://www.bloomberg.com/news/articles/2026-04-17/iconiq-advisor-to-tech-billionaires-emerges-as...
1•petethomas•10m ago•0 comments

The power keeping wages low

https://text.npr.org/g-s1-118071
1•mooreds•10m ago•0 comments

InvenTree: Open-source inventory management system with OpenAPI

https://github.com/inventree/InvenTree
1•matmair•12m ago•1 comments

Brex founder open sourced his stack for running the company through OpenClaw

https://github.com/brexhq/CrabTrap
1•ofabioroma•13m ago•1 comments

Cube Sandbox: Instant, Concurrent, Secure and Lightweight Sandbox for AI Agents

https://docs.cubesandbox.ai/
1•bpierre•13m ago•0 comments

Plastic film covered in tiny pillars can tear apart viruses on contact

https://theconversation.com/new-plastic-film-covered-in-thousands-of-tiny-pillars-can-tear-apart-...
2•geox•13m ago•0 comments

Privacy raised during teen social media ban tech trial were ignored

https://www.themandarin.com.au/311397-privacy-raised-during-teen-social-media-ban-tech-trial-were...
1•cdrnsf•14m ago•0 comments

OpenAI Shuts Down Sora AI? But Why?

https://www.bbc.com/news/articles/c3w3e467ewqo
2•shockedstorys•19m ago•0 comments

Show HN: FMQL – graph query and bulk-edit CLI for Markdown and YAML frontmatter

https://github.com/buyuk-dev/fmql
1•buyukdev•19m ago•1 comments

Retro Rewind – Video Store Simulator

https://store.steampowered.com/app/3552140/Retro_Rewind__Video_Store_Simulator/
1•doener•20m ago•0 comments

Can you spend $600K on B300 GPU Server? Which LLM will you run on this?

https://www.dihuni.com/
1•tech_curator•21m ago•0 comments

The Deskilling Paradox

https://signalintent.net/2026/04/21/the-deskilling-paradox/
1•tokonomy_dev•23m ago•0 comments

Lotus Wiper: a new threat targeting the energy and utilities sector

https://securelist.com/tr/lotus-wiper/119472/
1•campuscodi•23m ago•0 comments

Perry, a TypeScript compiler written in Rust that targets nine platforms

https://www.perryts.com/
1•bpierre•23m ago•0 comments

What Drives AI Crawler Traffic?

https://www.searchenginejournal.com/68-million-ai-crawler-visits-show-what-drives-ai-search-visib...
1•restlessforge•24m ago•1 comments

NSA loads Anthropic Mythos cyberattack while Pentagon says it cannot

https://aitwerp.com/signals/nsa-cyberattack-consent-bypassed/
1•Inziu•25m ago•0 comments