frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Show HN: SpecLock – AI Constraint Engine that stops AI from breaking locked code

https://github.com/sgroy10/speclock
1•sgroy10•5m ago•0 comments

Show HN: A GFM+GF-MathJax/Latex HTML formatting adventure

https://github.com/scottvr/phart/blob/main/docs/GHM-LATEX.md
1•ycombiredd•6m ago•0 comments

Diffusion Models (2024)

https://andrewkchan.dev/posts/diffusion.html
1•vinhnx•8m ago•0 comments

Show HN: I built a free AI study tool– paste notes, get flashcards in 10 seconds

https://prepareyourself.app
1•digi_wares•9m ago•0 comments

Josh Collison and Dwarkesh Patel Interview Elon Musk [video]

https://www.youtube.com/watch?v=BYXbuik3dgA
1•surprisetalk•13m ago•0 comments

Human brain cells on a chip learned to play Doom in a week

https://www.newscientist.com/article/2517389-human-brain-cells-on-a-chip-learned-to-play-doom-in-...
3•alex_young•14m ago•0 comments

Malm Whale in Gothenburg

https://www.atlasobscura.com/places/malm-whale
1•thunderbong•15m ago•0 comments

Plugtest

https://en.wikipedia.org/wiki/Plugtest
1•dhorthy•16m ago•0 comments

Show HN: EmCogni Code, the context engine for the "why" behind your codebase

https://www.emcogni.com/
1•ssbodapati•17m ago•0 comments

Simple Made Inevitable: The Economics of Language Choice in the LLM Era

https://felixbarbalet.com/simple-made-inevitable-the-economics-of-language-choice-in-the-llm-era/
1•puredanger•19m ago•0 comments

Idiot Plot

https://en.wikipedia.org/wiki/Idiot_plot
1•treetalker•21m ago•0 comments

Interview with Thomas Wouters by Guido van Rossum

https://gvanrossum.github.io/interviews/Thomas.html
3•tzury•24m ago•0 comments

Translatorhub

https://translatorhub.org/
1•zidana•30m ago•0 comments

Show HN: ClaudeTerminal – A tabbed terminal manager for Claude Code

https://github.com/Mr8BitHK/claude-terminal
1•mr8bit•32m ago•0 comments

NeurIPS 2021 Papers (2021)

https://tanelp.github.io/neurips2021/
1•vinhnx•36m ago•0 comments

Office of Technology Assessment

https://en.wikipedia.org/wiki/Office_of_Technology_Assessment
1•softwaredoug•37m ago•0 comments

MidnightBSD Excludes Calif. From Desktop Use Due to Digital Age Assurance Act

https://ostechnix.com/midnightbsd-excludes-california-digital-age-assurance-act/
4•WaitWaitWha•40m ago•2 comments

OpenSandbox

https://github.com/alibaba/OpenSandbox
1•nileshtrivedi•41m ago•0 comments

Why Is Your Operating System Debugging Hackers for Free?

1•agarmte•41m ago•0 comments

Polymarket Iran Bets Hit $529M as New Wallets Draw Notice

https://www.bloomberg.com/news/articles/2026-02-28/polymarket-iran-bets-hit-529-million-as-new-wa...
2•petethomas•43m ago•0 comments

Show HN: Computer Agents – Agents that work while you sleep

https://computer-agents.com
3•janlucasandmann•43m ago•0 comments

Uplift Privileges on FreeBSD

https://vermaden.wordpress.com/2026/03/01/uplift-privileges-on-freebsd/
1•vermaden•43m ago•0 comments

Artichoke induces sweet taste (PubMed)

https://pubmed.ncbi.nlm.nih.gov/5084667/
1•valzevul•43m ago•0 comments

Edge – Generate structured evaluation criteria for any domain using a local LLM

https://github.com/EviAmarates/fresta-edge
1•TiagoSantos•54m ago•0 comments

Have you used Terragrunt in the past? Keen to hear your thoughts

https://techroom101.substack.com/p/terragrunt-what-it-solves-what-it
1•ahaydar•54m ago•0 comments

Two-way Discord bridge-autonomous Claude Code sessions(WebSocket+local queue)

https://github.com/AetherWave-Studio/autonomous-claude-code
1•Drew-Aetherwave•55m ago•1 comments

Token Anxiety

https://writing.nikunjk.com/p/token-anxiety
1•vinhnx•56m ago•0 comments

A State Government Tried to Regulate Linux; It Went How You'd Expect

https://www.youtube.com/watch?v=mQLdDR-hJpc
1•cable2600•1h ago•1 comments

I built AI agents that do the grunt work solo founders hate

2•Seleci•1h ago•0 comments

TorchLean: Formalizing Neural Networks in Lean

https://leandojo.org/torchlean.html
2•matt_d•1h ago•0 comments