frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Technology has changed the world in my lifetime

https://www.noahpinion.blog/p/how-technology-has-already-changed
1•paulpauper•24s ago•0 comments

Evolution of Computers [video]

https://www.youtube.com/watch?v=aa6YISbAJEA
1•measurablefunc•1m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
1•thanthtet•1m ago•0 comments

Show HN: AI pentester – verified exploits, $999/assessment

1•gauravbsinghal•1m ago•0 comments

PEP 814 – Add frozendict built-in type

https://peps.python.org/pep-0814/
2•azhenley•3m ago•0 comments

Show HN: Rot – Financial Intelligence MCP Server

https://web-production-71423.up.railway.app/mcp-server
2•Shmungus•3m ago•0 comments

Unauthorized Immigration Effects on Local Labor Markets

https://www.frbsf.org/research-and-insights/publications/economic-letter/2026/02/unauthorized-imm...
2•johntfella•5m ago•0 comments

ChatGPT promised to help her find her soulmate. Then it betrayed her

https://www.npr.org/2026/02/14/nx-s1-5711441/ai-chatgpt-openai-love-betrayal-delusion-chatbot
1•andsoitis•5m ago•0 comments

A fluid can store solar energy and then release it as heat months later

https://arstechnica.com/science/2026/02/dna-inspired-molecule-breaks-records-for-storing-solar-heat/
1•apparent•5m ago•0 comments

GLM-5 Technical Report

https://arxiv.org/abs/2602.15763
1•meetpateltech•9m ago•0 comments

Learning Low-Level Computing and C++ by Making a Game Boy Emulator

https://byteofmelon.com/blog/2026/making-of-gamebyte
2•PaulHoule•11m ago•0 comments

I Built a Roguelike RPG Card Game with Compose Multiplatform

https://medium.com/@cliffrob25/how-i-built-a-roguelike-rpg-with-compose-multiplatform-and-skipped...
1•farmerbb•13m ago•0 comments

Show HN: I built yawdl a tiny language that compiles in the browser

https://chersbobers.github.io/posts/yawdl
1•chersbobers•14m ago•0 comments

"Vendoring" is a vile anti-pattern (2014)

https://gist.github.com/datagrok/8577287
1•todsacerdoti•14m ago•1 comments

BGP in 2025 – Geoff Huston [video]

https://www.youtube.com/watch?v=Sm1HjdmoeeA
1•Unearned5161•16m ago•0 comments

Peter Thiel knows about the AntiChrist

1•zerosizedweasle•18m ago•1 comments

Charting market dynamics in India's underground ticket resale WhatsApp groups

https://aftereod.substack.com/p/stress-fractures-indias-concert-boom
1•huwsername•20m ago•0 comments

Claimcheck: Narrowing the Gap Between Proof and Intent

https://midspiral.com/blog/claimcheck-narrowing-the-gap-between-proof-and-intent/
3•todsacerdoti•22m ago•0 comments

Show HN: Instrumental Model from Scratch (With Demo)

https://instr.io/?view=model
1•day6•24m ago•0 comments

Tell HN: Ramadan Mubarak

7•Sayyidalijufri•26m ago•1 comments

Personal Agents with David Singleton and Hugo Barra [video]

https://www.youtube.com/watch?v=1tK_x_vxGWs
1•jairojair•27m ago•0 comments

Microsoft tests Researcher and Analyst agents in Copilot

https://www.testingcatalog.com/microsoft-tests-researcher-and-analyst-agents-in-copilot-tasks/
1•gmays•28m ago•0 comments

Show HN: Agent Audit Kit v0.1 – deterministic replay + stress for LLM agents

https://github.com/helpfuldolphin/AgentAuditKit/releases/tag/aak-v0.1.0-e3
1•helpfuldolphin•30m ago•0 comments

Honey bees navigate more precisely than previously thought

https://uni-freiburg.de/en/honey-bees-navigate-more-precisely-than-previously-thought/
3•geox•36m ago•0 comments

FBI, St. Paul police probing ICE arrest that resulted in skull fractures

https://apnews.com/article/immigration-enforcement-minneapolis-hospital-ice-beating-assault-eb305...
6•petethomas•37m ago•0 comments

Lessons learned from `oapi-codegen`'s time in the GitHub Secure Open Source Fund

https://www.jvt.me/posts/2026/02/17/oapi-codegen-github-secure/
1•zdw•42m ago•0 comments

"Observability Engineering": a book so nice, we wrote it twice

https://substack.com/home/post/p-186798752
3•donutshop•44m ago•0 comments

Claude Is Okay

3•zerosizedweasle•46m ago•1 comments

Which Future?

https://michaelnotebook.com/whichfuture/
1•yurivish•48m ago•0 comments

Tell HN: Attackers using Google parental controls to prevent account recovery

6•TazeTSchnitzel•49m ago•0 comments