frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Show HN: I made a simulator for personal finance (5 year update)

https://projectionlab.com/
1•scubakid•38s ago•0 comments

Custom Voices and Voice Library

https://x.ai/news/grok-custom-voices
1•tjek•2m ago•0 comments

I'm a late arrival to short-form video – its effect on my life has shocked me

https://www.theguardian.com/commentisfree/2026/may/03/algorithm-short-form-video-overload
1•andsoitis•4m ago•0 comments

Sandwich Theorem

https://en.wikipedia.org/wiki/Squeeze_theorem
1•_Microft•7m ago•0 comments

Pipelight

https://pipelight.dev/
1•fulafel•8m ago•0 comments

After giving away $26B MacKenzie Scott's wealth remains largely unchanged

https://timesofindia.indiatimes.com/world/us/after-giving-away-26-billion-in-philanthropy-mackenz...
1•gscott•8m ago•0 comments

The Lobster in the Hot Pot – OpenTentacle

https://opententacle.com/article/2026-04-05/
1•birdculture•8m ago•0 comments

Dogs' brains began to shrink at least 5k years ago

https://www.theguardian.com/science/2026/apr/29/dogs-brains-shrink-5000-years-ago
1•gmays•10m ago•0 comments

What are the chances a hurricane will hit my home?

https://www.noaa.gov/stories/what-are-chances-hurricane-will-hit-my-home
1•bilsbie•11m ago•0 comments

How the JVM Optimizes Generic Code – A Deep Dive [pdf]

https://cr.openjdk.org/~jrose/pres/202603-SpecializedGeneric.pdf
1•Tomte•12m ago•0 comments

I rebuilt my blog's cache. Bots are the audience now

https://hoeijmakers.net/thirty-years-of-caching-sorted-in-an-afternoon/
2•robhoeijmakers•13m ago•0 comments

Robotics startups are losing to slow feedback loops

https://www.simbiotic.dev/blog/robotics-startup-development-speed
1•nikhilol•14m ago•0 comments

I completed 100 Days of Java over 5 years and mapped the journey as a graph

https://mohibulsblog.netlify.app/java/100daysofjava/graph/
1•celurian92•20m ago•1 comments

Learn concurrency – a deep dive into multithreading with Python

https://blog.geekuni.com/2026/04/python-concurrency.html
2•aquastorm•21m ago•0 comments

StackGraveyard.dev – Live mortality scores for NPM packages

https://www.stackgraveyard.dev/
1•tlseternal•22m ago•0 comments

What If Your Boss Monitored Your Emotions?

https://www.theatlantic.com/culture/2026/05/worker-surveillance-emotion-ai/687029/
2•Brajeshwar•22m ago•1 comments

1984 Called: It predicted our AI-obsessed reality

https://medium.com/@tk512/the-ai-predictions-in-the-movie-electric-dreams-1984-0c6a47a6326e
1•sgt•24m ago•0 comments

What Makes Art Great

https://nabeelqu.substack.com/p/what-makes-art-great
1•jger15•26m ago•0 comments

OpenAI delays ChatGPT "adult mode"

https://www.axios.com/2026/03/06/openai-delays-chatgpt-adult-mode
1•embedding-shape•27m ago•0 comments

Benchmarking local LLMs against coding agent harnesses [WIP]

https://neuralnoise.com///2026/harness-bench-wip/
1•Terretta•28m ago•0 comments

Virtual violin produces realistic sounds

https://news.mit.edu/2026/mit-engineers-virtual-violin-produces-realistic-sounds-0429
1•gmays•30m ago•0 comments

How vLLM Works

https://avkcode.github.io/blog/how-vllm-works.html
2•IngessLabs•30m ago•1 comments

Gerry Conway, creator of the Punisher in Spider-Man comics, dies at 73

https://www.theguardian.com/culture/2026/may/01/gerry-conway-punisher-spider-man-comics
1•bookofjoe•31m ago•0 comments

Show HN: Next-gen music visualization in the browser

https://vizz.fm/
2•lowtecky•33m ago•2 comments

Why you're more likely to buy something for $4.99 than $5.00

https://thehustle.co/originals/why-youre-more-likely-to-buy-something-for-499-than-500
1•Anon84•33m ago•0 comments

Koch Brothers' covert operations (2010)

https://www.newyorker.com/magazine/2010/08/30/covert-operations
2•simonebrunozzi•35m ago•0 comments

VSCode now adds Copilot as a co-author of all your commits

https://bsky.app/profile/majormcdoom.bsky.social/post/3mkvqyjtlqc25
2•doener•38m ago•0 comments

Meditations on Moloch (2014)

https://slatestarcodex.com/2014/07/30/meditations-on-moloch/
1•simonebrunozzi•40m ago•0 comments

Running official Arch Linux on Arm (not to be confused ArchLinuxARM)

https://charon.konekopi.com/posts/archlinux_on_arm/
1•Charon77•42m ago•0 comments

Dermatology is wrong about the sun

https://twitter.com/MattZirwas/status/2050586857868591306
3•bilsbie•42m ago•0 comments