frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

I Paid $300 for Career Coaching. Here's What I Should Have Done Instead

https://www.tumblr.com/login_required/wonderfullysacredtrap
1•Uniqu•36s ago•0 comments

Try not to get scammed while looking for work

https://trysound.io/try-not-to-get-scammed-while-looking-for-work/
1•TrySound•2m ago•0 comments

US Job Market Visualizer

https://karpathy.ai/jobs/
3•chizkidd•13m ago•0 comments

Health Effects of Coffee

https://en.wikipedia.org/wiki/Health_effects_of_coffee
1•pinkmuffinere•16m ago•0 comments

Show HN: Port42 – AI companions that build and act on your Mac (v0.5.0)

https://port42.ai/
2•gordonmattey•21m ago•0 comments

TripBoard – Stop scrolling the group chat for that one booking link

https://tripboard.fortheplot.today
1•atharvashembe•25m ago•0 comments

The Agentic Workload

https://opencomputer.dev/blog/the-agentic-workload
1•iacguy•26m ago•0 comments

What Is an Agent Harness?

https://parallel.ai/articles/what-is-an-agent-harness
1•vismit2000•28m ago•0 comments

Solve Toronto

1•basileafe•28m ago•0 comments

FSF threatens Anthropic over infringed copyright: share your LLMs freely

https://news.slashdot.org/story/26/03/16/0539240/fsf-threatens-anthropic-over-infringed-copyright...
3•MilnerRoute•28m ago•0 comments

EU axes AI, chips, and quantum from the Industrial Accelerator Act

https://www.sdxcentral.com/news/eu-axes-ai-chips-and-quantum-from-strategic-tech-list-in-proposed...
2•alephnerd•30m ago•3 comments

Save 70-90% in tokens per session

1•hasna•34m ago•2 comments

We built a GRC tool after watching SMBs fail ISO audits for the dumbest reasons

https://mitigata-grc-tfukpqvn.manus.space/
1•Areena_28•34m ago•1 comments

Panopticon

https://en.wikipedia.org/wiki/Panopticon
2•simonebrunozzi•38m ago•0 comments

Strait of Hormuz Update 15 March 2026 – Update on Other Maritime Stories – US De [video]

https://www.youtube.com/watch?v=0SELRtaciaI
1•kamaraju•48m ago•0 comments

Pgtui, a Postgres TUI Client

https://kdwarn.net/programming/blog/227
2•salkahfi•49m ago•0 comments

Symfony 8.0.6 Released

https://symfony.com/blog/symfony-8-0-6-released
2•ms7892•50m ago•0 comments

Race on to establish globally recognised 'AI-free' logo

https://www.bbc.com/news/articles/cj0d6el50ppo
1•voxadam•50m ago•2 comments

10-Minute Description of How Judy Arrays Work and Why They Are So Fast

https://judy.sourceforge.net/doc/10minutes.htm
1•prakashqwerty•53m ago•0 comments

Apollo's John Zito Sounds Off on 'Arrogance' in Private Markets

https://www.wsj.com/finance/investing/top-apollo-executive-sounds-off-on-arrogance-in-private-mar...
1•petethomas•54m ago•0 comments

Productizing the Meta

https://nick.cloud/posts/productizing-the-meta/
1•npad•1h ago•0 comments

Agentic Trust Framework (ATF)

https://github.com/massivescale-ai/agentic-trust-framework
1•teleforce•1h ago•0 comments

Tool to visualize everything between your keypress and the kernel

https://shellcraft.vercel.app
2•uphiago•1h ago•0 comments

I made an app to create beautiful thumbnail from screenshots

https://www.beautifulscreenshots.com/
1•siv_io_•1h ago•1 comments

Show HN: Crowd-sourced LPG cylinder availability tracker for India's gas crisis

https://www.gasnearme.in/
1•smankoo•1h ago•1 comments

Performance: 53% faster parse+render, 61% fewer allocations

https://github.com/Shopify/liquid/pull/2056
1•prakashqwerty•1h ago•0 comments

Various Novel iOS Apps by Elvure

https://elvure.app
2•mening12001•1h ago•2 comments

BotStadium – AI agents compete on live sports predictions in real-time

https://botstadium.ai
3•veeceey•1h ago•2 comments

Open Source, Open Mind: The Cost of Free Software (2024)

https://freeasinweekend.org/open-source-open-mind
3•pabs3•1h ago•0 comments

Free as in Weekend

https://freeasinweekend.org/
1•pabs3•1h ago•0 comments