frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

The teenagers enlisted as agents of mayhem by Russia and Iran

https://www.ft.com/content/58dabb01-41f6-4440-8a5f-69947d8afe06
1•imichael•28s ago•0 comments

A new profile to help publishers and creators highlight their work on Search

https://blog.google/products-and-platforms/products/search/a-new-profile-to-help-publishers-and-c...
1•taubek•47s ago•0 comments

Show HN: CentProof – Local-first bank statement reconciliation for macOS

https://centproof.com
1•javamantraact•4m ago•0 comments

Why Stone-Faced Fascists Keep Getting Antiquity Wrong

https://www.thebulwark.com/p/why-stone-faced-fascists-keep-getting-antiquity-wrong-x-twitter-elon...
1•martey•14m ago•0 comments

Show HN: ProData AI–AutoML,Data Analysis&PowerBI-Style Dashboards in Streamlit

https://varu4-prodata-ai-app-d2bocc.streamlit.app
1•varunwalekar•15m ago•0 comments

Show HN: ClikDeo – Browser-based video editor (trimmer, merger, shorts creator)

1•Clikdeo•16m ago•1 comments

VoidZero Is Joining Cloudflare

https://voidzero.dev/posts/voidzero-cloudflare
1•carlual•17m ago•0 comments

Can One Disgruntled Employee Destroy Big Tech? A Chilling Truth

https://comuniq.xyz/post?t=1197
2•01-_-•18m ago•0 comments

SpaceX blocked from early benchmark index entry as S&P reaffirms existing rules

https://www.reuters.com/business/finance/sp-global-keeps-fast-entry-proposal-unchanged-spacex-lis...
2•healsdata•20m ago•0 comments

Show HN: One AI agent for all your support issues

2•Daniel-Pan•20m ago•1 comments

Agent Harness Engineering: A Survey

https://picrew.github.io/LLM-Harness/
1•rippeltippel•28m ago•0 comments

Google is quietly laying off staff in its cloud division

https://www.businessinsider.com/google-clouds-quiet-layoffs-hit-cybersecurity-teams-2026-6
3•neilfrndes•33m ago•0 comments

Fresh: Terminal based IDE and text editor

https://github.com/sinelaw/fresh
1•sudenmorsian•33m ago•0 comments

Dear Microsoft, enough is enough

https://www.politico.eu/sponsored-content/dear-microsoft-enough-is-enough/
5•giuliomagnifico•34m ago•0 comments

Tokyo's Tower of Babel: The 10-Kilometer-High Megastructure Japan Almost Built

https://www.tokyoweekender.com/art_and_culture/design/tokyo-tower-of-babel/
2•techdar42•35m ago•1 comments

Ask HN: Is Apple taking AI seriously?

2•circuitfable•38m ago•2 comments

1M node reactive graph, 2.687ms updates, stealth privacy, 303 tests passing

https://neurons-me.github.io/.me/
1•suiGn•40m ago•0 comments

Hong Kong Surpasses Switzerland as the Largest Cross-Border Wealth Hub

https://www.bcg.com/press/27may2026-hong-kong-surpasses-switzerland-largest-cross-border-wealth-hub
1•LopRabbit•47m ago•0 comments

A simple product for small businesses to collect private feedback

https://telltheowner.com
1•vijayst•47m ago•0 comments

PivCo-Huffman: a novel approach to Huffman decoding

https://marcinzukowski.github.io/pivco-huffman/
1•felixhandte•47m ago•0 comments

The IsUpMap lets you check the status of over 100 major sites at once

https://isupmap.com/
2•mikelgan•51m ago•0 comments

Reusable Brick Walls for the Construction Industry

https://www.tugraz.at/en/news/article/reusable-brick-walls
1•geox•51m ago•0 comments

Show HN: Sencho – a self hosted Docker Compose UI with multi-node fleet support

https://github.com/Studio-Saelix/sencho
1•ansocode•52m ago•0 comments

The Pentagon is running an AI propaganda mill targeting Latin America

https://theintercept.com/2026/06/02/la-tilde-propaganda-latin-america-pentagon/
6•Fricken•57m ago•0 comments

Science with military applications is cited more than civilian-only research

https://www.nature.com/articles/d41586-026-01770-8
1•ilreb•58m ago•0 comments

Nouri – AI nutrition that adjusts your workouts

https://nouriwellness.app/
1•iconmarketing•59m ago•1 comments

C++: The Documentary Released Today

https://herbsutter.com/2026/06/04/c-the-documentary-released-today/
2•ingve•59m ago•0 comments

JITDomain: Instruction-level JIT code isolation

https://www.sciencedirect.com/science/article/pii/S0141933126000426
2•matt_d•1h ago•0 comments

Show HN: Lessons learned from running Claude Code swarms at scale

3•sermakarevich•1h ago•0 comments

Criticome: The Critical Period in Human Development

https://genomicpress.kglmeridian.com/view/journals/brainhealth/aop/article-10.61373-bh026i.0021/a...
2•rramadass•1h ago•1 comments