frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

2,218 Gary Marcus AI claims scored against evidence (dataset)

https://github.com/davegoldblatt/marcus-claims-dataset
2•davegoldblatt•31s ago•0 comments

The largest acidic geyser has been putting on quite a show

https://www.usgs.gov/observatories/yvo/news/echinus-geyser-back-action-now
1•1659447091•4m ago•0 comments

The Xkcd thing, now interactive, as jenga blocks

https://jenga.symploke.dev/
1•thomasfromcdnjs•6m ago•0 comments

Help us test WEBCAT alpha

https://securedrop.org/news/webcat-alpha/
1•ahlCVA•8m ago•0 comments

Bankster: Money as Data

https://github.com/randomseed-io/bankster
1•PaulHoule•8m ago•0 comments

Show HN: Augur – A text RPG boss fight where the boss learns across encounters

https://www.theaugur.ai/
1•thutch76•11m ago•1 comments

AgentMail Now Supports X402

https://twitter.com/agentmail/status/2028893166506787270
1•obulbo•11m ago•1 comments

Progressive Disclosure CLI for OpenAPI

https://github.com/OpenScribbler/phyllotaxis
1•mlhpdx•14m ago•0 comments

Show HN: A leadership 360 survey for startup founders: feedback please

https://org360.app/surveys/startup-founder-360
1•ddesposito•18m ago•1 comments

Python Package Uses a PRNG-Like Algorithm to Create Tokenized Infinite Data

https://github.com/stateshaper/stateshaper/tree/old_main
1•jaygeorgedunn•19m ago•0 comments

The 'Anything-but-Solar' Trade Is the Future of Solar

https://www.bloomberg.com/opinion/articles/2026-03-03/the-anything-but-solar-trade-is-the-future-...
1•petethomas•21m ago•0 comments

Show HN: OpenClawHub – A Lib for AI agent workflows so you don't have to

https://openclawhub.uk/
1•951560368•21m ago•0 comments

Critical Authentication Bypass in Pac4j-JWT – Using Only a Public Key

https://www.codeant.ai/security-research/pac4j-jwt-authentication-bypass-public-key
1•Brajeshwar•26m ago•0 comments

Show HN: DubTab – Live AI Dubbing in the Browser (Meet/YouTube/Twitch/etc.)

https://dubtab.com/
2•DanielHu87•27m ago•1 comments

Code-Offline

https://github.com/opensecurity/code-offline
2•opensecurity•27m ago•0 comments

Google employees call for military limits on AI amid Iran strikes

https://www.cnbc.com/2026/03/03/anthropic-fallout-iran-war-tech-military-ai.html
4•MilnerRoute•28m ago•0 comments

Google AI previews helped me in Iran's internet shutdown of 2025

https://ahrm.github.io/jekyll/update/2025/06/20/iran-internet-2025.html
1•owenpayton•29m ago•0 comments

Motorola GrapheneOS devices will be bootloader unlockable/relockable

https://grapheneos.social/@GrapheneOS/116160393783585567
2•pabs3•32m ago•0 comments

Developer Certificate of Origin and AI is a no-go

https://brokenco.de/2026/03/02/copyright-ai.html
1•pabs3•33m ago•0 comments

You can't use a code editor when you're under 18 now?

https://mastodon.online/@marekfort/116164253291515471
10•pabs3•34m ago•1 comments

Show HN: PreflightAPI – US airports, weather, NOTAMs and more via one API

https://preflightapi.io/
1•bberisford•35m ago•0 comments

Show HN: Formualizer – Arrow-backed spreadsheet engine, 320 functions,PyO3+WASM

https://github.com/psu3d0/formualizer
1•ManfredMacx•36m ago•1 comments

Left-Handers Are More Competitive Than Right-Handers

https://www.psychologytoday.com/us/blog/the-asymmetric-brain/202602/left-handers-are-more-competi...
4•geox•38m ago•0 comments

The Markless Document Markup Standard

https://shirakumo.org/docs/markless/
1•todsacerdoti•42m ago•0 comments

People Really Are More Likely to Commit Crimes After a Cancer Diagnosis

https://www.vice.com/en/article/people-really-are-more-likely-to-commit-crimes-after-a-cancer-dia...
3•pseudolus•43m ago•0 comments

Show HN: Restless – a CLI that discovers and maps APIs automatically

https://github.com/bspippi1337/restless
1•bspippi1337•43m ago•2 comments

Four months of Ruby Central moving Ruby backward

https://andre.arko.net/2026/03/03/four-months-of-ruby-central-moving-ruby-backward/
4•bigiain•46m ago•0 comments

Trump Worries Iran's Leaders May Be Just 'As Bad' After War

https://www.bloomberg.com/news/articles/2026-03-03/trump-worries-iranian-leaders-could-be-just-as...
3•petethomas•46m ago•5 comments

Anatomy of a Web3 Supply Chain Attack

https://www.notesoncloudcomputing.com/posts/2026-02-27-anatomy-of-a-web3-supply-chain-attack/
2•carlesloriente•48m ago•0 comments

Windows 98 Disk Defrag Simulator

https://defrag98.com/
2•nixass•50m ago•1 comments