frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

A secret Microsoft tool fixed Windows Performancr

https://www.youtube.com/watch?v=jH0BYAkPj78
1•robrain•38s ago•0 comments

macOS Security Audit Script

https://github.com/pogwizdb/macOS_security_audit
1•pTech-pl•2m ago•0 comments

Why is address space allocation granularity 64KB?

https://devblogs.microsoft.com/oldnewthing/20031008-00/?p=42223
1•tosh•2m ago•0 comments

The Forge We Deserve

https://btao.org/posts/2026-05-09-the-forge-we-deserve/
1•icy•3m ago•0 comments

Show HN: Data Control Center: Local UI for Dbt and DuckDB Workflows

https://github.com/hypertrial/data-control-center
1•mattfaltyn•3m ago•0 comments

FulcrumSec Leaks Novo Nordisk Data After $25M Demand Goes Unpaid

https://databreaches.net/2026/06/15/scoop-fulcrumsec-leaks-novo-nordisk-data-after-25m-demand-goe...
1•dbcooper•6m ago•0 comments

The middle-click is the worst habit in modern computing

https://gopeek-lovat.vercel.app/blog-middle-click-worst-habit.html
1•GeorgeWoff25•10m ago•1 comments

Arianespace launches 36 Amazon satellites with Ariane 64 with advanced boosters

https://newsroom.arianespace.com/arianespace-successfully-launches-36-additional-amazon-leo-satel...
2•vrganj•11m ago•0 comments

China's EV Price War Was Built on Cars Sold at a Loss

https://m.slashdot.org/story/455638
1•ilreb•13m ago•0 comments

MariaDB now has a DuckDB storage engine

https://mariadb.org/duckdb-storage-engine-for-mariadb-when-the-sea-lion-learns-to-quack/
1•XCSme•14m ago•0 comments

World-first: therapy to make cells young again trialled in a person

https://www.nature.com/articles/d41586-026-01836-7
1•noleary•15m ago•0 comments

'Prototype' Stonehenge Discovered

https://news.sky.com/story/prototype-stonehenge-discovered-13555188
2•austinallegro•20m ago•0 comments

Ask HN: Am I being advertised an ARG via user agent logs?

2•SpecialistK•22m ago•0 comments

Rust Game Engine

https://github.com/relizv/rust-engine
1•reliz•23m ago•0 comments

Software Is Not a Single-Player Game

https://www.davidpoll.com/2026/06/software-is-not-a-single-player-game/
1•depoll•23m ago•0 comments

Rust Foundation Welcomes OpenAI as Platinum Member

https://rustfoundation.org/media/rust-foundation-welcomes-openai-as-platinum-member-announces-don...
1•tosh•24m ago•0 comments

Visual Representation Learning via Temporal Differences

https://twitter.com/ID_AA_Carmack/status/2067437937713717609
1•tosh•27m ago•0 comments

Show HN: Deck-IR – render .pptx to HTML in pure JavaScript, no LibreOffice

https://darksun113.github.io/deck-ir/
1•darksun113•31m ago•0 comments

Show HN: Topaz – A small application language that compiles through Rust

https://github.com/studiohaze/topaz
2•yo_tafo•31m ago•1 comments

Daigo Umehara

https://en.wikipedia.org/wiki/Daigo_Umehara
1•davedx•33m ago•3 comments

Eternal Software Initiative: Open-source tech to preserve software for 1k

https://eternal-software.org
1•birdculture•33m ago•0 comments

Can We Use Fable.today?

https://canweusefable.today
1•heyyeah•34m ago•2 comments

Free PDF to Markdown Converter Online

https://pdf-to-markdown.app/
1•light001•39m ago•0 comments

Physicists split apart a photon and ended up with improbable swarm of particles

https://www.livescience.com/physics-mathematics/particle-physics/a-mixture-from-zero-to-infinity-...
1•rustoo•40m ago•0 comments

Any Startup hiring for an SDR role in Canada?

1•CharlesAdili•43m ago•0 comments

Show HN: Automatiq - generate webscrapers/automations by browsing any website

https://github.com/StoneSteel27/AutomatiQ
1•stonesteel27•45m ago•0 comments

xAI sued for firing an engineer who raised alarms about Grok safety

https://techcrunch.com/2026/06/10/xai-fired-an-engineer-who-raised-alarms-about-grok-safety-new-l...
2•reasonableklout•45m ago•2 comments

Binance set to lose permission to operate in EU, sources say

https://www.reuters.com/business/finance/binance-set-lose-eu-licence-bid-permission-offer-service...
2•lode•46m ago•0 comments

Deep Fission Goes Public

https://www.deepfission.com/investors/news-events/press-releases/detail/110/advanced-nuclear-comp...
1•simonebrunozzi•46m ago•0 comments

Microsoft Makes Big AI Inroads in China by Selling OpenAI Models

https://www.bloomberg.com/news/articles/2026-06-17/microsoft-s-china-ai-business-grows-on-openai-...
1•0in•46m ago•0 comments