frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Honeypot Design

https://bruceediger.com/posts/honeypot-design/
1•NaOH•2m ago•0 comments

Reclaiming Digital Sovereignty

https://www.ucl.ac.uk/bartlett/publications/2024/dec/reclaiming-digital-sovereignty
1•carschno•3m ago•0 comments

Knowledge Collapse

https://www.bostonreview.net/articles/knowledge-collapse/
1•pseudolus•3m ago•0 comments

Show HN: Geiger – A blast radius triage tool for any credential

https://github.com/puck-security/geiger
1•thesubtlety•4m ago•0 comments

Finding high-severity security issues with publicly available models

https://twitter.com/RampLabs/status/2059678575939273091
1•gmays•4m ago•0 comments

Being an old school web-based sports sim dev in the era of vibe coded games

https://zengm.com/blog/2026/06/vibecoded-games/
1•YesBox•6m ago•0 comments

Measuring LLMs' impact on N-day exploits

https://red.anthropic.com/2026/n-days/
1•hackerBanana•6m ago•0 comments

How ClickHouse Became Fast at Joins

https://clickhouse.com/blog/clickhouse-fast-joins
1•eatonphil•6m ago•0 comments

A Fake Bug Report Hijacks Your AI Coding Agent – and Nothing Catches It

https://tenetsecurity.ai/blog/agentjacking-coding-agents-with-fake-sentry-errors/
1•patrickdavey•7m ago•0 comments

The Log Is the Agent

https://www.omnara.com/blog/the-log-is-the-agent
4•isehgal•8m ago•0 comments

EV demand up 50% in France and Germany since Iran war

https://www.reuters.com/business/renault-electric-vehicle-orders-have-surged-since-start-iran-war...
2•a_paddy•8m ago•0 comments

The apocalyptic future we're being sold isn't inevitable

https://www.theguardian.com/technology/2026/jun/11/ai-absolutism-apocalyptic-future
1•01-_-•9m ago•0 comments

Octopuses use mirrors to find food they cannot see

https://www.sciencedaily.com/releases/2026/06/260605023402.htm
1•bookofjoe•9m ago•0 comments

Google sues Chinese cybercrime network that used Gemini to automate scams

https://arstechnica.com/google/2026/06/google-sues-chinese-cybercrime-network-that-used-gemini-to...
3•01-_-•11m ago•0 comments

SpaceX Rented Out Computing After Own Teams Had Trouble Using It

https://www.bloomberg.com/news/articles/2026-06-12/spacex-rented-out-computing-after-own-teams-ha...
2•helsinkiandrew•14m ago•1 comments

The Socratic Method: Teaching by Asking Instead of by Telling

http://www.garlikov.com/Soc_Meth.html
3•thunderbong•14m ago•0 comments

Huh. Apparently we can just end smoking

https://www.not-ship.com/huh-apparently-we-can-just-stop-smoking/
3•gmays•14m ago•1 comments

More than a quarter of private colleges at risk of closing, new projection shows

https://hechingerreport.org/more-than-a-quarter-of-private-colleges-are-at-risk-of-closing-new-pr...
2•1vuio0pswjnm7•15m ago•0 comments

Ask HN: What are your best ideas you will probably never build?

3•hsuduebc2•17m ago•3 comments

A Peter Thiel-Backed Tribunal Is Putting Journalists on Trial

https://www.hollywoodreporter.com/business/business-news/peter-thiel-tribunal-journalists-trial-1...
7•cdrnsf•17m ago•1 comments

DNI Gabbard Reveals Evidence of U.S. Taxpayer-Funded Global Biolab Program [pdf]

https://www.dni.gov/files/BIOLAB_Slides.pdf
3•dnideny•20m ago•1 comments

I Won't Buy You a Coffee

https://hakkerman.eu/blog/i-wont-buy-you-a-coffee/
16•speckx•21m ago•11 comments

Mcy/best – A C++ STL replacement

https://github.com/mcy/best
1•pie_flavor•21m ago•0 comments

Recursive Self-Improvement

https://ana15.substack.com/p/recursive-self-improvement
2•aborovykh•23m ago•1 comments

Yeti

https://mth.github.io/yeti/
1•tosh•23m ago•0 comments

Ask HN: What happens when AI-voice becomes good enough?

1•boa00•28m ago•2 comments

Over 900 Arch Linux Packages Infected with infostealers and rootkits

https://lists.archlinux.org/archives/list/aur-general@lists.archlinux.org/thread/FGXPCB3ZVCJIV7FX...
5•fortran77•31m ago•1 comments

Scarab Field Lab – public case records for software drift diagnostics

https://github.com/scarab-systems/scarab-field-lab
1•scarabsystems•32m ago•0 comments

Show HN: Vilvona AI – Self-Hosted AI Assistant with Tamil and Hindi UI

https://github.com/vignesh2027/Vilvona-AI
2•deepscalelabs•32m ago•0 comments

Ramp SWE-Bench a private contamination-free benchmark from production work

https://labs.ramp.com/swebench
1•turadg•32m ago•0 comments