frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•7mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Bad Opsec Considered Harmful

https://buttondown.com/grugq/archive/bad-opsec-considered-harmful/
1•anigbrowl•3m ago•0 comments

Building Production Ready Kubernetes Operators Course for Free

https://github.com/piyushjajoo/k8s-operators-course
1•pjajoo•3m ago•1 comments

Deno 2.6

https://deno.com/blog/v2.6
2•enz•4m ago•0 comments

Show HN: Flywheel Feedback – Free feedback for projects that get 0 comments

https://www.dydomite.com/
1•chux52•6m ago•0 comments

A 'Tatooine' Planet Directly Imaged

https://www.centauri-dreams.org/2025/12/11/a-tatooine-planet-directly-imaged/
2•JPLeRouzic•7m ago•0 comments

Electrocute: See all electron-based applications that you have running

https://github.com/genu/electrocute
1•stalfosknight•8m ago•0 comments

Tether's Answer to Centralized AI

https://qvac.tether.dev/
1•longitudinal93•8m ago•0 comments

The major U.S. trends in AI in 2025 – and what's next in 2026 – Context by TRF

https://www.context.news/surveillance/the-major-us-trends-in-ai-in-2025-and-whats-next-in-2026
1•rbanffy•9m ago•0 comments

We are launching Bindu – where Agents talk, identify, trade

https://github.com/GetBindu/Bindu
1•raahul_rahl•10m ago•1 comments

Game Boy Color development tricks via de-making Pokemon Mystery Dungeon [video]

https://www.youtube.com/watch?v=qkdD6EKxlzM
1•rucury•10m ago•0 comments

Where Code Meets Creativity

https://cmsconf.com/
1•taubek•12m ago•0 comments

Vinyl Arrivals: Dec. 12, 2025

https://www.pauseandplay.com/release-dates/vinyl-releases/
1•pauseandplay•13m ago•0 comments

Independent voters ask court to declare Pa.'s closed primaries unconstitutional

https://www.pennlive.com/politics/2025/12/independent-voters-ask-court-to-declare-pas-closed-prim...
2•bikenaga•15m ago•0 comments

Information Flow in Logical Environments (2016)

https://arxiv.org/abs/1603.03475
1•ctoth•15m ago•0 comments

Tembo Automations: Background agents that automates away repetitive tasks

https://www.tembo.io/blog/introducing-automations
1•Aarekaz•16m ago•0 comments

GitVex

https://github.com/mdhruvil/gitvex
1•handfuloflight•18m ago•0 comments

Writing MCP Servers in Rust (stdio, rmcp)

https://rup12.net/posts/write-your-mcps-in-rust/
1•ruptwelve•19m ago•1 comments

13-year-old Safari bug: getBoundingClientRect [video]

https://www.youtube.com/watch?v=UaeRSh4uiQo
1•turblety•20m ago•0 comments

I've spent $25k on X ads as an indie dev

https://ruurtjan.com/articles/ive-spent-25k-on-x-ads-as-an-indie-dev
2•pul•21m ago•0 comments

How the Next Big Thing in Carbon Removal Sunk Without a Trace

https://www.wired.com/story/how-the-next-big-thing-in-carbon-removal-sunk-without-a-trace/
1•coloneltcb•22m ago•0 comments

Google is building an experimental new browser and a new kind of web app

https://www.theverge.com/tech/842000/google-disco-browser-ai-experiment
1•cpeterso•22m ago•0 comments

Ask HN: Relatively SoTA LLM Agents from Scratch?

1•solsane•22m ago•0 comments

Learn to have blind faith

https://notcoding.today/blog/blind-faith
3•notcodingtoday•24m ago•0 comments

Atlantropa

https://en.wikipedia.org/wiki/Atlantropa
1•sans_souse•27m ago•0 comments

Why reviewers underestimate the power consumption of Apple Silicon Macs

https://www.youtube.com/watch?v=zCkbVLqUedg
2•ricebunny•27m ago•1 comments

Medical Students' Disease

https://en.wikipedia.org/wiki/Medical_students%27_disease
2•danielfalbo•27m ago•0 comments

Buy. Physical. Media

https://pjmedia.com/vodkapundit/2025/12/10/buy-physical-media-n4946902
2•speckx•29m ago•0 comments

Invisible Job Market Scanner

https://invisiblejobs.jimstroud.com/
1•hunglee2•31m ago•0 comments

Sperm Donor with a Cancer-Causing Gene Fathered at Least 197 Kids

https://gizmodo.com/sperm-donor-with-a-cancer-causing-gene-fathered-at-least-197-kids-2000697978
1•rbanffy•32m ago•0 comments

What should a security CLI include for SMEs and NIS2/DORA?

https://www.npmjs.com/package/scortonjs-cli
1•bacelyy•32m ago•2 comments