frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•9mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
1•pacod•1m ago•0 comments

The AI4Agile Practitioners Report 2026

https://age-of-product.com/ai4agile-practitioners-report-2026/
1•swolpers•2m ago•0 comments

Digital Independence Day

https://di.day/
1•pabs3•6m ago•0 comments

What a bot hacking attempt looks like: SQL injections galore

https://old.reddit.com/r/vibecoding/comments/1qz3a7y/what_a_bot_hacking_attempt_looks_like_i_set_up/
1•cryptoz•7m ago•0 comments

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

https://flashmesh.netlify.app
1•Elevanix•8m ago•0 comments

Show HN: AgentLens – Open-source observability and audit trail for AI agents

https://github.com/amitpaz1/agentlens
1•amit_paz•9m ago•0 comments

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

https://shipclaw.app
1•sunpy•11m ago•0 comments

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

https://daily-trending.org
1•azamsayeedit•13m ago•1 comments

Explanation of British Class System

https://www.youtube.com/watch?v=Ob1zWfnXI70
1•lifeisstillgood•14m ago•0 comments

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

https://github.com/alesr/jwtpeek
1•alesrdev•17m ago•0 comments

Willow – Protocols for an uncertain future [video]

https://fosdem.org/2026/schedule/event/CVGZAV-willow/
1•todsacerdoti•18m ago•0 comments

Feedback on a client-side, privacy-first PDF editor I built

https://pdffreeeditor.com/
1•Maaz-Sohail•22m ago•0 comments

Clay Christensen's Milkshake Marketing (2011)

https://www.library.hbs.edu/working-knowledge/clay-christensens-milkshake-marketing
2•vismit2000•29m ago•0 comments

Show HN: WeaveMind – AI Workflows with human-in-the-loop

https://weavemind.ai
7•quentin101010•35m ago•1 comments

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

https://seedream5ai.org
1•dallen97•36m ago•0 comments

A contributor trust management system based on explicit vouches

https://github.com/mitchellh/vouch
2•admp•38m ago•1 comments

Show HN: Analyzing 9 years of HN side projects that reached $500/month

2•haileyzhou•39m ago•0 comments

The Floating Dock for Developers

https://snap-dock.co
2•OsamaJaber•40m ago•0 comments

Arcan Explained – A browser for different webs

https://arcan-fe.com/2026/01/26/arcan-explained-a-browser-for-different-webs/
2•walterbell•41m ago•0 comments

We are not scared of AI, we are scared of irrelevance

https://adlrocha.substack.com/p/adlrocha-we-are-not-scared-of-ai
1•adlrocha•42m ago•0 comments

Quartz Crystals

https://www.pa3fwm.nl/technotes/tn13a.html
2•gtsnexp•45m ago•0 comments

Show HN: I built a free dictionary API to avoid API keys

https://github.com/suvankar-mitra/free-dictionary-rest-api
2•suvankar_m•47m ago•0 comments

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

https://kybera.xyz
2•xipz•48m ago•0 comments

Show HN: brew changelog – find upstream changelogs for Homebrew packages

https://github.com/pavel-voronin/homebrew-changelog
1•kolpaque•52m ago•0 comments

Any chess position with 8 pieces on board and one pair of pawns has been solved

https://mastodon.online/@lichess/116029914921844500
2•baruchel•54m ago•1 comments

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

https://cyber-omelette.com/posts/the-abstraction-rises.html
2•birdculture•56m ago•0 comments

Projecting high-dimensional tensor/matrix/vect GPT–>ML

https://github.com/tambetvali/LaegnaAIHDvisualization
1•tvali•56m ago•1 comments

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

https://www.whereismymoneygo.com/
2•raleobob•1h ago•1 comments

Our Stolen Light

https://ayushgundawar.me/posts/html/our_stolen_light.html
2•gundawar•1h ago•0 comments

Matchlock: Linux-based sandboxing for AI agents

https://github.com/jingkaihe/matchlock
2•jingkai_he•1h ago•0 comments