frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Lidar threatens to rewrite everything we know about photography

https://www.digitalcameraworld.com/cameras/cmos-camera-sensors-will-become-obsolete-revolutionary...
1•leopoldj•3m ago•1 comments

React module for visualising server racks

https://react-networks-lib.rackout.net/racks
1•matt-p•3m ago•0 comments

Darknet Map

1•KAGEjin•5m ago•0 comments

Agent-to-agent discovery and real-time bidding marketplace

https://github.com/open-experiments/agent-exchange
1•parlakisik•6m ago•0 comments

Thermal Power and Climate Change

https://eartharxiv.org/repository/view/10865/
2•measurablefunc•8m ago•0 comments

Japan sees largest protest in support of pacifist constitution

https://www.theguardian.com/world/2026/may/04/japan-sees-largest-protest-in-support-of-pacifist-c...
2•robtherobber•8m ago•0 comments

Against Nicotine

https://kupajo.com/against-nicotine/
1•kolyder•8m ago•0 comments

ClaudeBleed: Claudes Browser Extension Allows Any Extension to Hijack It

https://layerxsecurity.com/blog/a-flaw-in-claudes-browser-extension-allows-any-extension-to-hijac...
1•_____k•10m ago•0 comments

Show HN: Slidemoji – A Daily Puzzle Game

https://slidemoji.com/
1•niknat•10m ago•0 comments

Perplexity's New Mac App Brings Personal Computer to Pro Users

https://www.macrumors.com/2026/05/07/perplexity-mac-app-personal-computer/
1•samsolomon•10m ago•0 comments

Ask HN: What's your favorite production-mistake-left-in record?

1•vinylcast•11m ago•0 comments

Ask HN: Would you use non-SaaS downloadable chatbot app?

1•adinhitlore•12m ago•2 comments

Small Claims Court with Clankers

https://disputron.ai/
2•mlhpdx•12m ago•0 comments

Cluster of lost cities in Ecuador that lasted 1k years mapped (2024)

https://thehill.com/homenews/ap/ap-science/ap-archeologists-map-lost-cities-in-ecuadorian-amazon-...
1•thunderbong•13m ago•0 comments

Native Instruments Acquired by InMusic

https://blog.native-instruments.com/an-announcement-from-nick-williams/
1•thm•13m ago•0 comments

Don't Get Too Comfortable

https://www.wsj.com/health/wellness/dont-get-too-comfortable-your-quality-of-life-depends-on-it-a...
1•sanj•14m ago•0 comments

Sony's PS5 sales plummet amid price rises and a memory crisis

https://www.theverge.com/news/926609/sonys-ps5-sales-plummet-memory-costs-price-hikes
2•Brajeshwar•16m ago•1 comments

The Ploopy Bean – External trackball with 4 buttons

https://ploopy.co/bean/
1•namanyayg•19m ago•0 comments

A recent experience with ChatGPT 5.5 Pro

https://gowers.wordpress.com/2026/05/08/a-recent-experience-with-chatgpt-5-5-pro/
1•ColinWright•19m ago•0 comments

Dante's Inferno modelled a planetary impact 500 years before modern science

https://www.egu.eu/news/1777/new-research-proposes-dantes-inferno-modelled-a-planetary-impact-500...
1•geox•20m ago•0 comments

5x perf increase on writes with FPW disabled in Postgres

https://www.databricks.com/blog/how-lakebase-architecture-delivers-5x-faster-postgres-writes
1•sp_from_db•20m ago•0 comments

Plus Codes

https://maps.google.com/pluscodes/
1•smartmic•21m ago•0 comments

iPhone Launch – Studio C (Comedy Sketch) [video]

https://www.youtube.com/watch?v=B7Ujn91mfAk
1•mrtimo•22m ago•0 comments

Show HN: AI Fluency Diagnostic

https://ai-pilled.com/
1•chrija•22m ago•0 comments

Show HN: Ocelot – A Game Boy and Game Boy Color Emulator in Haskell

2•habedi0•26m ago•0 comments

A new suite of modern tools coming for editing and publishing RFCs

https://www.ietf.org/blog/new-tools-coming-for-editing-and-publishing-rfcs/
1•cxr•26m ago•0 comments

Raven: Memory as a Set of Slots

https://goombalab.github.io/blog/2026/raven-part1/
1•cmogni1•26m ago•0 comments

Linear Diffs

https://linear.app/docs/diffs
2•bpierre•26m ago•0 comments

Show HN: Open Source FreeCAD dataset for CAD generation tasks

https://huggingface.co/datasets/gnucleus-ai/cad-gen-freecad
1•gNucleusAI•27m ago•0 comments

Mythos set off a cybersecurity 'hysteria.' Experts say threat was already here

https://www.cnbc.com/2026/05/08/anthropic-mythos-ai-cybersecurity-banks.html
4•pr337h4m•28m ago•0 comments