frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•11mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Number in man page titles e.g. sleep(3)

https://lalitm.com/til-number-in-man-page-titles-e-g-sleep-3/
1•thunderbong•4m ago•0 comments

Verbatim 140W GAN: One of the first chargers with USB PD 3.2 AVS (SPR) support

https://charge-test.com/verbatim-mini-gan-charger-140w-review-one-of-the-first-chargers-with-full...
1•StainX•5m ago•0 comments

Show HN: I built an Open-source Dropbox/Google Drive BOYB(Bring your own bucket)

https://github.com/zmeyer44/Locker
1•Zm44•6m ago•0 comments

Talk about PPU (Parallel Processing Unit) increasing CPU speeds exponentially

https://www.youtube.com/watch?v=ZiQxxzCHLvo
1•openhw•9m ago•0 comments

Industrial Policy for the Intelligence Age

https://openai.com/index/industrial-policy-for-the-intelligence-age/
1•salkahfi•11m ago•0 comments

Avoid Concatenation in Log Statements

https://blog.bonnieeisenman.com/blog/avoid-concatenation-in-log-statements/
1•luu•11m ago•0 comments

Show HN: Beat Darwin

1•ecosystemj•12m ago•0 comments

Show HN: I built lightweight LLM tracing tool with CLI

https://github.com/SKE-Labs/lightrace
1•skele•13m ago•1 comments

GlueClaw: Use Claude subscription in OpenClaw again

https://github.com/zeulewan/glueclaw
1•zeulewan•13m ago•1 comments

Show HN: Browser-based EXIF remover – no upload, runs offline via WASM

https://picshift.app/metadata-remover/
2•pod4g•17m ago•0 comments

Drug safety intelligence API – 1M+ FDA adverse events in one call

https://pharma-signal.com
1•Niteowlpt•19m ago•0 comments

Show HN: I just built a MCP Server that connects Claude to all your wearables

https://pacetraining.co/
1•anton_salcher•21m ago•3 comments

Tips and tricks to avoid cloning in Rust

https://antoine.vandecreme.net/blog/avoiding-clone/
1•avandecreme•24m ago•0 comments

Ask HN: How do you escape golden handcuffs at FAANG?

2•oumua_don17•25m ago•5 comments

RISC-V Linux BusyBox Single Board Notebook

https://tomlarkworthy.github.io/lopebooks/notebooks/@tomlarkworthy_linux-sbc.html
1•tlarkworthy•29m ago•0 comments

Career-Ops: How I Built My Own AI Job Search Tool

https://santifer.io/career-ops-system
1•futurecat•29m ago•0 comments

Making Gamedev Tooling For Windows 3.1 in Turbo C++ [video]

https://www.youtube.com/watch?v=-7mc-D5V4L8
1•xyproto•32m ago•0 comments

Nanoscale Vacuum-Channel Transistor

https://en.wikipedia.org/wiki/Nanoscale_vacuum-channel_transistor
1•akshatjiwan•37m ago•0 comments

Texas Republican Called Out for Sharing AI Rendering of Rescued Soldier

https://www.mediaite.com/media/news/texas-republican-called-out-for-sharing-ai-rendering-of-rescu...
1•01-_-•37m ago•0 comments

Anomaly detection with nothing but Welford's algorithm and a KV store

https://uriv.me/blog/anomaly-detection-with-welford-and-kv
3•birdculture•47m ago•0 comments

All GANs No Brakes

https://mayberay.bearblog.dev/all-gans-no-brakes/
1•mugamuga•49m ago•0 comments

Paramera modified carbon fiber hood

https://www.porsche-km.com/productinfo/192708.html
1•edl8888•53m ago•0 comments

NIMBY Rails

https://store.steampowered.com/app/1134710/NIMBY_Rails/
11•altilunium•55m ago•0 comments

Show HN: Reverse-engineered the FPGA bitstream using Claude Code

https://github.com/14sea/Cyclone_CRAM_Mapper
4•14sea•55m ago•1 comments

Hell Is a World in Which Everybody Writes Like Axios (2022)

https://newrepublic.com/article/167857/axios-smart-brevity-book-hell-world
3•Tomte•55m ago•1 comments

Euro-Office – Your sovereign office

https://github.com/Euro-Office
41•XzetaU8•1h ago•12 comments

College instructor turns to typewriters to curb AI use, teach life lessons

https://apnews.com/article/typewriter-ai-cheating-chatgpt-cornell-ce10e1ca0f10c96f79b7d988bb56448b
2•1vuio0pswjnm7•1h ago•2 comments

Disposable Tools Manifesto

https://blog.vtemian.com/post/disposable-tools-manifesto/
1•vtemian•1h ago•0 comments

AI Damaged a Friendship

https://www.bartmol.io/how-ai-damaged-a-friendship/
2•Maulwurf•1h ago•1 comments

Vincelwt/gloomberb: Finance terminal, in your terminal

https://github.com/vincelwt/gloomberb
1•rcarmo•1h ago•0 comments