frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

US agency removes Chinese toy drones from import ban list

https://www.reuters.com/world/us/us-agency-removes-chinese-toy-drones-import-ban-list-2026-06-16/
1•onemoresoop•1m ago•0 comments

Show HN: Alternative way to do remote codex via NovaScale with builtin Tailscale

https://apps.apple.com/us/app/novascale-built-for-tailscale/id6749938291
1•mintflow•1m ago•0 comments

The New SDLC with Vibe Coding

https://www.kaggle.com/whitepaper-the-new-SDLC-with-vibe-coding
1•simonpure•2m ago•0 comments

After AI Takes Everything

https://ursb.me/en/posts/after-ai-takes-everything/
1•speckx•2m ago•0 comments

A 10-KB model that decides when a 4B-parameter robot policy wakes up

https://huggingface.co/spaces/Kaikaku/aegis-demo
1•josefchen•3m ago•0 comments

Leading Deepfake Expert No Longer Trusts His Own Eyes

https://www.nytimes.com/2026/06/14/us/ai-deepfake-hany-farid.html
1•jonbaer•4m ago•0 comments

Where do migrants live, and where were they born?

https://ourworldindata.org/where-do-migrants-live-and-where-were-they-born
1•surprisetalk•6m ago•0 comments

The Art of Noises

https://www.arthistoryproject.com/artists/luigi-russolo/the-art-of-noises/
1•jruohonen•7m ago•0 comments

Show HN: AI vs. AI – code and reviews only count if they survive an attack

https://github.com/lolu1032/pantheon-skills
1•lolu1032•7m ago•0 comments

How We Run Firecracker VMs Inside EC2 and Start Browsers in <1s

https://browser-use.com/posts/firecracker-browser-infra
1•gregpr07•7m ago•0 comments

Viral "dopamine sites" let users shop without buying anything

https://www.dexerto.com/entertainment/dopamine-sites-that-mimic-online-purchase-experience-for-sh...
2•randycupertino•7m ago•1 comments

Ask HN: What are some good/fast coding models for Apple Silicon?

1•LoganDark•8m ago•1 comments

Ask HN: Is our data warehouse setup normal or over-complicated?

2•ealready_value•9m ago•0 comments

Wait, How Do You Pronounce Turkey? [video]

https://www.youtube.com/watch?v=WzohU9JYWOg
2•dataflow•9m ago•0 comments

Show HN: Infer0 – do AI apps need subscriptions?

https://infer0.com/
3•sumolessons•10m ago•0 comments

Show HN: Absolute best option for networkmanager in Rust

https://github.com/networkmanager-rs/nmrs
2•cachebag•12m ago•0 comments

SpaceX Acquires Cursor for $60B: What It Means for Software Security

https://www.pentesty.co/blog/spacex-acquires-cursor-60-billion-software-security
4•johnzoro107•12m ago•0 comments

The Daemon in the Middle

https://blog.tacoda.dev/the-daemon-in-the-middle-a7a2ae4503fb
2•tacoda•12m ago•0 comments

Bundt Cakes

https://tck.mn/food/bundt/
3•FinnLobsien•13m ago•0 comments

Catastrophic DoorDash Outage

https://www.doordashstatus.com
3•40four•14m ago•2 comments

KeyCon 2026 Recap

https://cassidoo.co/post/keycon-2026/
2•mooreds•15m ago•0 comments

For the last 2 years, 95% of my conversations have been with LLMs

https://www.youtube.com/watch?v=gf0-L5om_HM
2•emzra•16m ago•0 comments

Google Chrome's Next Update Will Mark the End of Popular Ad Blockers

https://tech.slashdot.org/story/26/06/15/205219/google-chromes-next-update-will-mark-the-end-of-p...
14•arnejenssen•17m ago•3 comments

Reading Ulysses: Splendid literature that can suck the life out of you

https://www.irishtimes.com/culture/books/2025/06/11/reading-ulysses-splendid-literature-that-can-...
3•pretext•17m ago•0 comments

AI is good at web design now

https://repaint.com/blog/ai-is-good-at-web-design-now
2•benshumaker•18m ago•0 comments

Google Chrome is closing the loopholes that let old ad blockers keep working

https://www.theverge.com/tech/950005/google-chrome-removing-ad-blocker-loopholes
2•taubek•18m ago•0 comments

Getting over the Nebulosity of Agents

https://text-incubation.com/getting-over-the-nebulosity-of-agents
2•krrishd•19m ago•0 comments

FIFAnomics

https://www.profgmedia.com/p/fifanomics
2•mooreds•19m ago•0 comments

Show HN: Kinetk – Multimodal intelligence API and MCP for grounding agents

https://www.kinetk.ai
2•thinkmariale•19m ago•0 comments

French Companies Are Inviting Homeless People to Sleep in Their Offices

https://reasonstobecheerful.world/offices-homeless-accommodation/
2•heavybiscotti•20m ago•0 comments