frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Gaussian Function

https://en.wikipedia.org/wiki/Gaussian_function
1•tristenharr•1m ago•0 comments

Show HN: Neural Fit game -Adjust the network's weights and biases

https://neuralfit.ai201.site/
2•Gooblebrai•2m ago•0 comments

Trump Made $1B on Crypto Deals While His Fans Lost a Fortune

https://www.wsj.com/finance/currencies/trump-made-1-billion-on-crypto-deals-while-his-fans-lost-a...
4•doener•3m ago•0 comments

Borrowing the Night: Reclaiming Idle Inference GPUs for Research

https://runwayml.com/news/borrowing-the-night-reclaiming-idle-inference-gpus-for-research
2•nielka•4m ago•0 comments

Spiritual Bypass

https://en.wikipedia.org/wiki/Spiritual_bypass
2•wslh•7m ago•0 comments

Show HN: Visualizing Contrastive Language-Audio Pretraining (Clap)

https://adamsohn.com/clap/
2•dataviz1000•8m ago•0 comments

Code Search: How Agents Search Across Snap's Codebase

https://eng.snap.com/code_search
3•Kaedon•9m ago•0 comments

Gen Z is forgoing the institution of mariage

https://www.cnbc.com/2026/07/02/gen-z-marriage-rates-american-dream.html
3•cramer4next•9m ago•1 comments

BlastRadar – paste a Git diff, get a production risk score in 10 seconds

https://blastradar.vercel.app/
2•M_Carpenter•10m ago•0 comments

China Tells Telecom Carriers to Phase Out Foreign Chips by 2027 (2024)

https://www.wsj.com/tech/china-telecom-intel-amd-chips-99ae99a9
2•mgh2•13m ago•1 comments

Show HN: Material 3 Expressive Web Components

https://matraic.github.io/m3e/
2•matraic•14m ago•0 comments

The 'Father of the Internet' is finally retiring

https://techcrunch.com/2026/06/30/the-father-of-the-internet-is-finally-retiring/
5•rmason•14m ago•1 comments

Burned out, how do I get out of the rut?

2•brandgefahr•17m ago•2 comments

NLRB Case of Denise Unterwurzacher and Atlassian

https://www.nlrb.gov/case/16-CA-324971
3•abeppu•18m ago•1 comments

Microsoft Frontier Company – Announcement

https://blogs.microsoft.com/blog/2026/07/02/microsoft-frontier-company-ai-engineering-that-amplif...
3•dzonga•19m ago•0 comments

Swift-Nio-QUIC

https://github.com/apple/swift-nio-quic
2•frizlab•19m ago•0 comments

What's new in Swift: June 2026 Edition

https://swift.org/blog/whats-new-in-swift-june-2026/
2•frizlab•20m ago•0 comments

'Separate in name and power': How America reinvented English

https://www.bbc.com/future/article/20260630-how-america-reinvented-english
4•1659447091•20m ago•1 comments

Fly Through Hogwarts

https://hogwarts-production.up.railway.app/
2•memalign•23m ago•0 comments

Running a Virtual Machine on a Cloud Box That Can't Run Virtual Machines

https://www.frankchiarulli.com/blog/nix-pvm/
2•fcjr•25m ago•0 comments

Virginia Bans Sale of Geolocation Data

https://www.hunton.com/privacy-and-cybersecurity-law-blog/virginia-bans-sale-of-geolocation-data
3•toomuchtodo•26m ago•1 comments

GitLab founder when facing terminal cancer attacks it like a startup problem

https://twitter.com/afshineemrani/status/2072363127552016501
2•rmason•26m ago•0 comments

The Music of Destruction

https://thebaffler.com/latest/the-music-of-destruction-fuelling
2•lermontov•28m ago•0 comments

WhatsApp to roll out username system, allowing users to hide phone numbers

3•PetaTech-News•29m ago•1 comments

Show HN: LaunchPact - Get upvotes for your Product Hunt launch

https://www.launchpact.io
1•launchpact_io•30m ago•2 comments

The Ramanujan Challenge for AI

https://www.ramanujanmachine.com/ramanujan-challenge/
1•robinhouston•31m ago•0 comments

Pacific West Real Estate Group

https://www.pacificwestre.ca/
1•PWREALESTATE•31m ago•1 comments

Fundamental Attribution Error

https://en.wikipedia.org/wiki/Fundamental_attribution_error
2•chistev•34m ago•0 comments

Hands-Free Lockpicking: Critical Vulns in Dormakaba's Access Control System

https://sec-consult.com/blog/detail/hands-free-lockpicking-critical-vulnerabilities-in-dormakabas...
1•denysvitali•34m ago•0 comments

Recommending Linux Distros Has Never Been This Fun [video]

https://www.youtube.com/watch?v=855QEhMlFXg
2•ashitlerferad•35m ago•0 comments