frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•11mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Ask HN: How is everyone dealing with the increase of code reviews?

1•Lethalman•31s ago•0 comments

The API Key Is Dead: A Blueprint for Agent Identity in the Age of MCP

https://kontext.security/content/oauth-for-mcp-agents
1•mc-serious•1m ago•0 comments

Show HN: OpenPolicy Plus – Cloud platform for managing your privacy policies

https://plus.openpolicy.sh/
1•jamie_davenport•2m ago•0 comments

DSPi – A powerful, open-source DSP

https://www.audiosciencereview.com/forum/index.php?threads/introducing-dspi-a-powerful-user-frien...
2•djsedaw•7m ago•1 comments

Student Entrepreneur Program by Zyorabyte – Help students to build their starup

https://zyorabyte.org
1•zyoralabs•8m ago•0 comments

Hardening the Unpacakgeable: A Systemd-Run Sandbox for Third-Party Binaries

https://copyninja.in/blog/safe-run-binary-sandbox.html
2•edward•9m ago•0 comments

7 Japanese Musicians That Influenced the World – Tokyo Weekender

https://www.tokyoweekender.com/entertainment/music/7-japanese-musicians-influenced-world/
1•l8rlump•11m ago•0 comments

Flux Language

https://github.com/Y3sIH3arU/Flux
1•IHEARU•14m ago•0 comments

I Wrote PGP (1999)

https://www.philzimmermann.com/EN/essays/WhyIWrotePGP.html
2•downbad_•14m ago•1 comments

How the Roll Function Works (In APL\360 and Its Descendants)

https://www.jsoftware.com/papers/roll.htm
2•tosh•20m ago•0 comments

Ask HN: Agentic AI just makes me sad

3•NicoJuicy•23m ago•1 comments

A prototype of GNSS data parser, targeting UBX protocol of Ublox GNSS chipset

https://github.com/nguyenchiemminhvu/ubx_parser
1•ncmv92•24m ago•0 comments

I Just Want Simple S3

https://blog.feld.me/posts/2026/04/i-just-want-simple-s3/
1•mpweiher•25m ago•0 comments

A prototype of GNSS data parser, targeting NMEA protocol

https://github.com/nguyenchiemminhvu/nmea_parser
1•ncmv92•25m ago•0 comments

Automatic Vectorization

https://en.wikipedia.org/wiki/Automatic_vectorization
1•tosh•26m ago•0 comments

DuckDB Meets Data Lakes [video]

https://www.youtube.com/watch?v=AAv19oxJzdU
1•tosh•28m ago•0 comments

The Shelf Life of Intelligence

https://jigarkdoshi.bearblog.dev/the-shelf-life-of-intelligence/
2•j_juggernaut•31m ago•1 comments

Show HN: macpak (Homebrew Wrapper for macOS)

https://github.com/kavindujayarathne/macpak
3•atkavindu•36m ago•1 comments

Protesters cleared of damaging US plane at Shannon (2006)

https://www.irishtimes.com/news/protesters-cleared-of-damaging-us-plane-at-shannon-1.791686
1•yread•38m ago•0 comments

A bet on whether ML-KEM-768 or X25519 will break first

https://github.com/FiloSottile/ecc-vs-lattices-long-bet
1•birdculture•42m ago•0 comments

Backblaze's Original Storage Pod Inducted into Computer History Museum

https://www.backblaze.com/blog/backblaze-part-of-computer-history/
3•Lwrless•48m ago•0 comments

Pope Leo XIV denounces the 'delusion of omnipotence' he says fuels the Iran war

https://www.politico.com/news/2026/04/11/pope-leo-xiv-denounces-the-delusion-of-omnipotence-he-sa...
28•achierius•52m ago•2 comments

Show HN: macOS ncdu alternative with Finder reveal and live incremental scanning

https://github.com/uAIex/rdu
2•harr01•58m ago•0 comments

CUDA Programming for Nvidia H100s

https://www.freecodecamp.org/news/cuda-programming-for-nvidia-h100s/
1•eigenBasis•1h ago•0 comments

MacBook Neo vs. MacBook Air: Which One Should You Buy?

https://www.wired.com/story/macbook-neo-vs-macbook-air/
1•joozio•1h ago•0 comments

Show HN: Uptime monitoring is table stakes

https://exit1.dev/
2•m_prads•1h ago•0 comments

Apache 2.4, ETag values, and (HTTP) response compression

https://utcc.utoronto.ca/~cks/space/blog/web/Apache24EtagAndSuffixes
2•DamonHD•1h ago•0 comments

Cold Diffusion from Scratch

https://github.com/aldipiroli/cold_diffusion_from_scratch
1•tgnk2341•1h ago•0 comments

Warranty Void If Broken? How to Keep Your Rights After Self-Repair

https://holdmybill.com/blog/warranty-void-if-broken-seal-your-rights-self-repair
2•niksmac•1h ago•0 comments

Ask HN: Do you trust AI agents with API keys / private keys?

3•devendra116•1h ago•1 comments