frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Lambda Calculus Explorer

http://kmicinski.com/cis352-s26/lambda-playground/
1•todsacerdoti•43s ago•0 comments

Ask HN: What AI content automation stack are you using in 2026?

2•jackcofounder•1m ago•0 comments

Bash is all you need. A nano Claude Code–like agent, built from 0 to 1

https://github.com/shareAI-lab/learn-claude-code
1•Oras•2m ago•0 comments

Hardware passkeys are winning on security, losing on adoption

https://www.corbado.com/blog/hardware-passkey-adoption-observability
1•vdelitz•3m ago•0 comments

Too Much Color

https://www.keithcirkel.co.uk/too-much-color/
3•Keithamus•3m ago•0 comments

CPG – Generate Cilium network policies from dropped Hubble flows

1•soulkyu•5m ago•0 comments

What's my JND? – a colour guessing game

https://www.keithcirkel.co.uk/whats-my-jnd/?r=ARUjKP__-ve-
2•Keithamus•6m ago•1 comments

I think I'm turning into a vibe coder

1•bekauridev•8m ago•0 comments

Measuring the Weight of an Electron (2017)

https://deftly.net/posts/2017-06-01-measuring-the-weight-of-an-electron.html
2•asimovDev•9m ago•0 comments

I made myself a device that tells me what plane flies above my home

https://old.reddit.com/r/aviation/comments/1roy7qs/i_made_myself_a_device_that_tells_me_what_plane/
1•taubek•10m ago•0 comments

Working to Decentralize FedCM

https://atproto.com/blog/working-to-decentralize-fedcm
1•erlend_sh•13m ago•0 comments

Wolfram LLM Benchmarking Project

https://www.wolfram.com/llm-benchmarking-project/
1•amai•14m ago•0 comments

Recreate Lost Chinese Font from ancient books using AI

https://github.com/kaonashi-tyc/Zi-QuanHengDuLiang
1•kaonashi-tyc-01•15m ago•0 comments

Why Your AI Coding Agent Gets Worse over Time (and How to Fix It)

https://www.davidreis.me/2026/why-your-ai-coding-agent-gets-worse-over-time
1•dreis_sw•15m ago•0 comments

Tim FTTH and GeForce Now: Diagnosing an ICMP Black Hole on PPPoE

https://paolocostanzo.github.io/tim-packet-loss-gfn/
1•PCostanzo•16m ago•1 comments

EVi, a Hard-Fork of Vim

https://codeberg.org/NerdNextDoor/evi
1•todsacerdoti•16m ago•0 comments

Unleash raises $35M to help enterprises govern AI-generated code

https://siliconangle.com/2026/03/04/unleash-raises-35m-rein-ai-driven-software-risk/
1•ivarconr•17m ago•0 comments

You could be next The lawyers and scientists training AI to steal their careers

https://www.theverge.com/cs/features/877388/white-collar-workers-training-ai-mercor
1•JeanKage•21m ago•0 comments

The Star Chamber: Multi-LLM Consensus for Code Quality

https://blog.mozilla.ai/the-star-chamber-multi-llm-consensus-for-code-quality/
1•gsaslis•23m ago•0 comments

Vibe Tuning Startup - Waitlist (ex google labs + deepmind)

https://vibetune.framer.ai/
1•pranavch28•24m ago•0 comments

Post-quantum cryptography beyond TLS

https://www.akamai.com/blog/security/post-quantum-cryptography-beyond-tls
1•fanf2•26m ago•0 comments

An Ode to Craftsmanship in Software Development

https://www.infoworld.com/article/4140156/an-ode-to-craftsmanship-in-software-development.html
2•SuaveSteve•26m ago•0 comments

Bell's X-76 Fold-Away Rotor Aircraft Is DARPA's Newest X-Plane

https://www.twz.com/air/bells-x-76-fold-away-rotor-aircraft-is-darpas-newest-x-plane
1•throwawayffffas•27m ago•0 comments

Claude helped me get a traffic light reprogrammed in my town

https://www.reddit.com/r/ClaudeAI/s/yqe7NEFI6b
1•merlindru•29m ago•0 comments

Show HN: SoloDB 1.1 – embedded .NET document DB with real referential integrity

https://unconcurrent.com/articles/SoloDB110.html
1•falsename•29m ago•0 comments

Stop using chat history as your agent's state store

https://blog.raed.dev/posts/agentic-workflows-are-not-conversations/
1•Raed667•31m ago•1 comments

MPs reject ban on social media for under-16s

https://www.theguardian.com/uk-news/2026/mar/09/proposed-ban-on-social-media-for-under-16s-reject...
1•chrisjj•36m ago•0 comments

Making your JITted Code known: Let me count the ways

https://wakelift.de/2026/03/09/making-your-jitted-code-known-let-me-count-the-ways/
1•lizmat•36m ago•0 comments

Ask HN: Are there any alternatives to LocalStack?

1•sceptic123•37m ago•1 comments

Increased risk of bullying in open-plan offices

https://www.eurekalert.org/news-releases/1118481
1•robtherobber•38m ago•0 comments