frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•8mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

2026 Delights and Not-So-Delightfuls

https://jakesimonds.leaflet.pub/3mbi4mcd3uk2v
1•jakesimonds•29s ago•0 comments

Einstein Probe detects an X-ray flare from nearby star

https://phys.org/news/2025-12-einstein-probe-ray-flare-nearby.html
1•wglb•3m ago•1 comments

Could OpenAI make a move on Pinterest?

https://seekingalpha.com/news/4536354-could-openai-make-a-move-on-pinterest
1•randycupertino•5m ago•1 comments

Dotnet Source Build Fails in 2026 Due To Date Overflow

https://github.com/dotnet/dotnet/issues/4037
1•csmantle•5m ago•0 comments

Year end sees record borrowing from Fed's standing repo operation

https://www.reuters.com/business/finance/banks-tap-record-liquidity-new-york-feds-standing-repo-f...
1•JumpCrisscross•10m ago•0 comments

A Basic Just-in-Time Compiler

https://nullprogram.com/blog/2015/03/19/
1•ibobev•14m ago•0 comments

Proving Liveness with TLA

https://roscidus.com/blog/blog/2026/01/01/tla-liveness/
2•ibobev•17m ago•0 comments

Apple Vision Pro production reportedly axed, marketing cut by more than 95%

https://www.pcguide.com/news/apple-vision-pro-production-reportedly-axed-despite-newer-m5-model-m...
3•ivewonyoung•17m ago•0 comments

Representing Hierarchies

https://gpfault.net/posts/first-child-next-sibling.html
1•ibobev•18m ago•0 comments

Fanimal Antitrust Lawsuit Against Ticketmaster Claims Startup Was Forced Out

https://www.ticketnews.com/2026/01/fanimal-files-antitrust-lawsuit-against-ticketmaster-claims-st...
2•hnburnsy•23m ago•0 comments

Show HN: I used AI to recreate a $4000 piece of audio hardware as a plugin

3•johnwheeler•24m ago•0 comments

2025: The Year SwiftUI Died

https://blog.jacobstechtavern.com/p/the-year-swiftui-died
1•alwillis•24m ago•0 comments

Uxn/Varvara ecosystem is a personal computing stack

https://100r.co/site/uxn.html
1•doener•28m ago•1 comments

Blaze: A Dec VT420 (and More) Emulator

https://mmastrac.github.io/blaze/
3•doener•30m ago•1 comments

Show HN: Share Claude Code and Codex CLI Transcripts

https://agentexports.com/
1•nicoritschel•31m ago•0 comments

Show HN: SpeakCamera – a surprisingly useful iPhone Shortcut to read text aloud

https://speakmycamera.org/
2•wdpatti•33m ago•0 comments

Google AI Overviews put people at risk of harm with misleading health advice

https://www.theguardian.com/technology/2026/jan/02/google-ai-overviews-risk-harm-misleading-healt...
8•sandebert•39m ago•0 comments

2026 will be the year of on-device agents

1•mycelial_ali•39m ago•0 comments

Looking for Alice

https://www.henrikkarlsson.xyz/p/looking-for-alice
1•noleary•42m ago•0 comments

JMW Turner, more a Buffett than a Jane Street intern

https://www.ft.com/content/e306db7b-7403-4c99-b557-48b42a9eba51
1•hhs•43m ago•0 comments

Albert Einstein's Brilliant Politics

https://www.theatlantic.com/culture/2026/01/albert-einstein-optimistic-politics/685458/
1•fortran77•43m ago•1 comments

Erdos problems solved more or less autonomously by AI

https://mathstodon.xyz/@tao/115788262274999408
1•gmays•44m ago•0 comments

Where AI is headed in 2026

https://foundationcapital.com/where-ai-is-headed-in-2026/
2•gmays•45m ago•0 comments

System Falsification for Efficient Cyber-Kinetic Vulnerability Detection

https://arxiv.org/abs/2511.16765
1•PaulHoule•45m ago•0 comments

New reporting rules end crypto’s tax secrecy era

https://www.pymnts.com/cryptocurrency/2026/new-reporting-rules-end-cryptos-tax-secrecy-era/
2•hhs•47m ago•0 comments

Show HN: Browser in C and Lua for the Playdate Console

https://github.com/remysucre/ORBIT
1•remywang•48m ago•0 comments

NumPy Enhancement Proposal 21: Simplified and explicit advanced indexing

https://numpy.org/neps/nep-0021-advanced-indexing.html
2•dynm•48m ago•0 comments

Life and Death at the County Fair

https://bittersoutherner.com/issue-no-12/life-and-death-at-the-county-fair
2•noleary•48m ago•0 comments

Codex Front end Skill: Unique Designs within one shot

https://github.com/vipulgupta2048/codex-skills
1•vipulgupta2048_•53m ago•1 comments

Grok Blames 'Lapses in Safeguards' After AI Chatbot Posts Sexual Images of Kids

https://www.forbes.com/sites/tylerroush/2026/01/02/grok-blames-lapses-in-safeguards-after-ai-chat...
2•randycupertino•54m ago•2 comments