frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Qwen 3.5: Architecture, Benchmarks, and Model Selection

https://blog.overshoot.ai/blog/qwen3.5-on-overshoot
1•borisjabes•14s ago•0 comments

Future Shock

https://blog.ceejbot.com/posts/future-shock/
1•crcastle•28s ago•0 comments

Jan 6 rioter pardoned by Trump sentenced to life for sexually abusing children

https://www.sao5.org/johnson-sentenced-to-life-for-multiple-sex-crimes-against-children/
1•TigerUniversity•36s ago•0 comments

Bulk Hexagonal Diamond

https://www.nature.com/articles/s41586-026-10212-4
1•planetmechanic•1m ago•0 comments

Googleworkspace/CLI isn't optimized – Test your skills

https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337edb7
1•sjmaplesec•2m ago•1 comments

Humanoid robot: The evolution of Kawasaki’s challenge

https://kawasakirobotics.com/in/blog/202511_kaleido/
1•hhs•2m ago•0 comments

Grith

https://grith.ai/
1•handfuloflight•3m ago•0 comments

Leave the Kurds alone. We are not guns for hire

https://thenewregion.com/posts/4762
2•inaros•4m ago•0 comments

When to Use BFF and Should It Replace API Gateway?

https://reactdevelopment.substack.com/p/when-to-use-bff-and-should-it-replace
1•javatuts•6m ago•0 comments

Show HN: Real-time collaborative editing plugin for Blender

https://github.com/arryllopez/meerkat
1•arryleo10•7m ago•0 comments

Essential use cases for web scraping data extraction

https://spidra.io/blog/7-essential-use-cases-for-web-scraping
1•joelolawanle•8m ago•1 comments

Show HN: Crazly – structured AI workflows instead of random prompts

https://crazly.pro/
1•starup-guy•9m ago•0 comments

The next generations of Bubble Tea, Lip Gloss, and Bubbles are available now

https://charm.land/blog/v2/
1•atkrad•9m ago•0 comments

Trampolining Nix with GenericClosure

https://blog.kleisli.io/post/trampolining-nix-with-generic-closure
1•ret2pop•10m ago•1 comments

Show HN: I mapped 954K addresses because AI hallucinated my trash day

https://trashalert.io
1•hudtaylor•13m ago•1 comments

VoiceVista – Resurrection of Microsoft Soundscape for the Blind and VI

https://www.applevis.com/apps/ios/navigation/voicevista
1•Fr0styMatt88•15m ago•0 comments

Show HN: A new package manager for Ada

https://github.com/tomekw/tada
1•tomekw•15m ago•1 comments

JavaScript Note: ToggleEvent.source and Dialog.closedBy

https://jsdev.space/toggleevent-source-dialog-closedby/
1•javatuts•16m ago•0 comments

Stanford EE 292P: How China will 'quarantine' Taiwan

https://hnvr.medium.com/week-9-geopolitics-and-national-security-ee-292p-atoms-bits-and-the-natio...
1•malchow•16m ago•0 comments

AI Agents Have Senior Engineer Capabilities and Day-One Intern Context

https://equatorops.com/resources/blog/ai-agents-need-consequence-awareness
1•bobjordan•16m ago•1 comments

Migrating a 300GB PostgreSQL database from Heroku to AWS with minimal downtime

https://argos-ci.com/blog/heroku-to-aws-migration
2•neoziro•16m ago•0 comments

Eclipse GlassFish: This Isn't Your Father's GlassFish

https://omnifish.ee/eclipse-glassfish-this-isnt-your-fathers-glassfish/
2•henk53•17m ago•0 comments

Show HN: LM Canvas – current chat interfaces suck, so I built a canvas for LLMs

https://twitter.com/maxleedev/status/2029695170040529306
1•max-lee-dev•17m ago•0 comments

The Sandboxed Open-Source Agent that is 70% cheaper than E2B

https://coasty.ai:443/
1•nkov47as•20m ago•1 comments

MrBeast fired a video editor after Kalshi accused employee of insider trading

https://apnews.com/article/mrbeast-jimmy-donaldson-kalshi-7a8bb7e2aecee7428bcc2dd1eb08ac67
1•petethomas•20m ago•0 comments

Stop Using Grey Text

https://catskull.net/stop-using-grey-text.html
2•catskull•20m ago•0 comments

Show HN: The CTO Game – Scale your infra in real-time, under pressure

https://thectogame.com/
1•frenchmajesty•23m ago•0 comments

Labubu sues 3D printer maker Bambu Lab for items made by its users

https://www.tomshardware.com/3d-printing/labubu-sues-3d-printer-maker-bambu-lab-for-items-made-by...
2•josephcsible•23m ago•0 comments

Amazon Appears to Be Down

https://arstechnica.com/gadgets/2026/03/amazon-appears-to-be-down-with-over-20000-reported-problems/
2•samizdis•23m ago•0 comments

Feeling the Effects of 260k Federal Jobs Lost

https://www.nytimes.com/2026/03/05/climate/climate-forward-science-federal-cuts.html
3•geox•27m ago•0 comments