frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•8mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Jon Skeet Facts

https://meta.stackexchange.com/questions/9134/jon-skeet-facts
1•ravenical•1m ago•1 comments

OAuth 2.1 Dynamic Client Registration for AWS BedrockAgentCore Gateway

https://github.com/orgs/stache-ai/discussions/5
1•Jtpenny•1m ago•0 comments

Serverless RAG and MCP on AWS with S3Vectors and Agentcore

https://github.com/orgs/stache-ai/discussions/4
1•Jtpenny•2m ago•0 comments

Reverse-engineering another Ursa Major classic: the StarGate 323

https://www.temeculadsp.com/journal/understanding-timing-circuits
1•johnwheeler•4m ago•0 comments

Show HN: AeroTag – Tag-based workspace management for AeroSpace (macOS)

https://typester.dev/blog/2026/01/11/tag-based-workspace-management-with-aerospace
1•typester•6m ago•1 comments

Hubble Telescope's Final Countdown: Could It Disappear Sooner Than Expected?

https://dailygalaxy.com/2026/01/hubble-countdown-could-it-disappear-sooner/
2•TMWNN•11m ago•0 comments

Token-Count-Based Batching: Faster, Cheaper Embedding Inference for Queries

https://www.mongodb.com/company/blog/engineering/token-count-based-batching-faster-cheaper-embedd...
1•fzliu•13m ago•0 comments

Tuning Random Generators: Property-Based Testing as Probabilistic Programming [pdf]

https://web.cs.ucla.edu/~todd/research/oopsla25a.pdf
2•todsacerdoti•15m ago•0 comments

Show HN: Built a course on buying small businesses – validating demand

https://smalldealschool.com/
1•boring_million•17m ago•1 comments

A $400k payout is putting prediction markets in the spotlight

https://apnews.com/article/prediction-markets-maduro-trades-1f47e737f915fff00c57f03e7390b41f
4•petethomas•21m ago•0 comments

Matchbox Educable Noughts and Crosses Engine

https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_Crosses_Engine
1•icwtyjj•23m ago•0 comments

Big Tech's Ugly Duckling: Can Snap Finally Execute?

https://ossa-ma.github.io/blog/snapchat?
1•ossa-ma•25m ago•0 comments

Live Captions

https://avc.xyz/live-captions
1•wslh•28m ago•0 comments

You don't need a skill registry (for your CLI tools)

https://solmaz.io/skillflag
2•hosolmaz•33m ago•0 comments

The US Empire is going supernova

https://simplicius76.substack.com/p/the-us-empire-is-going-supernova
1•SanjayMehta•35m ago•0 comments

Ogre 14.5 Released

https://www.ogre3d.org/2026/01/10/ogre-14-5-released
1•klaussilveira•37m ago•0 comments

Show HN: Instagram Saved Collection Downloader

https://chromewebstore.google.com/detail/instagram-saved-collectio/dibmfjgbnhbfhlajpahnbiiabpdabajo
1•qwikhost•41m ago•0 comments

Revolutionary eye injection saved my sight, says first ever patient

https://www.bbc.co.uk/news/articles/c89qyv98lzdo
2•1a527dd5•44m ago•0 comments

Show HN: I built an autopilot investor outreach tool – and it became my startup

https://pilt.ai
1•citizenbab•49m ago•0 comments

SwiftScripting (type-safe AppleScript from Swift)

https://github.com/tingraldi/SwiftScripting
1•frizlab•51m ago•0 comments

The Agent Fallacy

https://noemititarenco.com/blog/the-agent-fallacy-prompt-orchestration/
3•dvt•53m ago•0 comments

A ribbon worm's unique attack: R/interestingasfuck

https://old.reddit.com/r/interestingasfuck/comments/1p26zwp/a_ribbon_worms_unique_attack/
2•vinnyglennon•55m ago•1 comments

Show HN: Featureless – a one-page, distraction-free web app for writing

2•emanoj•59m ago•2 comments

Show HN: What if AI agents had Zodiac personalities?

https://github.com/baturyilmaz/what-if-ai-agents-had-zodiac-personalities
6•arbayi•1h ago•1 comments

iOS as Acceleration

https://arxiv.org/abs/2512.22180
3•PaulHoule•1h ago•0 comments

Trump may be beginning of the end for enshittification – make tech good again

https://www.theguardian.com/commentisfree/2026/jan/10/trump-beginning-of-end-enshittification-mak...
8•pabs3•1h ago•0 comments

How to stalk your ex; made easier than ever [video]

https://www.youtube.com/watch?v=cK6WyS2JipQ
1•vo2maxer•1h ago•0 comments

Discount Gambit

https://longform.asmartbear.com/discount-gambit/
1•mooreds•1h ago•0 comments

Kreuzberg: Extract text and metadata from a wide range of file formats

https://github.com/kreuzberg-dev/kreuzberg
3•thunderbong•1h ago•0 comments

Show HN: UCP Demo – Interactive Demo of the Universal Commerce Protocol

1•init0•1h ago•0 comments