frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•12mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Microsoft looked at buying Cursor before SpaceX deal

https://www.cnbc.com/2026/04/22/microsoft-looked-at-buying-cursor-before-spacex-deal-sources-say....
1•mfiguiere•1m ago•0 comments

XAIDR – first runtime benchmark for agent-to-agent attack detection

https://github.com/anirudhraokotaru/xaidr-benchmark
1•delphisec•1m ago•0 comments

Let's Simulate the Org Charts Meme with Agents and See Who Wins

https://kunchenguid.substack.com/p/org-bench-lets-simulate-the-org-charts
1•bpierre•2m ago•0 comments

Fatty acid could restore failing vision

https://www.sciencedaily.com/releases/2026/04/260422091043.htm
1•y1n0•5m ago•0 comments

Job Is to Give a Shit

2•danfunk•7m ago•0 comments

Orthogravity [Desktop Webgame]

https://app-b5dj4l0ji2gx.appmedo.com/
1•mrKola•8m ago•0 comments

TeraFab facilities will use Intel's 14A process

https://www.tomshardware.com/tech-industry/semiconductors/elon-musk-says-terafab-will-use-intels-...
2•y1n0•9m ago•0 comments

Bruce Davidson – His landmark Subway series and his path to Magnum

https://www.youtube.com/watch?v=8KmDB4VHpzQ
1•fallinditch•9m ago•0 comments

ICE Got My Data – EFFector 38.8

https://www.eff.org/deeplinks/2026/04/how-ice-got-my-data-effector-388
3•omer_k•14m ago•1 comments

Vibe Genomics

https://vibe-genomics.replit.app/
1•jedixit•14m ago•0 comments

Database Turing Award Winner Mike Stonebraker [video]

https://www.youtube.com/watch?v=YPObBOwIrHk
3•guiambros•15m ago•0 comments

SportScore MCP – free live sports data for Claude

https://github.com/Backspace-me/sportscore-mcp
1•sportscore•25m ago•0 comments

Show HN: Stenobird, podcast transcription service for your agent

https://stenobird.com
1•somewhatjustin•26m ago•0 comments

Z.ai phasing out original subscription plans

https://docs.z.ai/devpack/transition
1•reddec•27m ago•0 comments

Quantum 'Jamming' Explores the Fundamental Principles of Nature

https://www.quantamagazine.org/quantum-jamming-explores-the-truly-fundamental-principles-of-natur...
2•rolph•29m ago•0 comments

Beyond Memorization: Violating Privacy via Inference with Large Language Models [pdf]

https://proceedings.iclr.cc/paper_files/paper/2024/file/9028b8a3ca98f58e373f0c1497a17448-Paper-Co...
1•zinekeller•30m ago•0 comments

You Can't Game Your Way to a Real Education

https://www.nytimes.com/2026/04/19/opinion/schools-edtech-laptops-games-learning.html
5•paulpauper•39m ago•0 comments

My family tried to eat fewer ultra-processed foods

https://www.theguardian.com/global/2026/apr/20/ultra-processed-foods-diet-healthy-eating
5•paulpauper•39m ago•0 comments

Trade-school graduates often enter the workforce with significant debt

https://www.wsj.com/us-news/education/they-chose-careers-in-the-trades-and-still-wound-up-with-de...
1•paulpauper•40m ago•0 comments

xAI is buying cursor for $60B and the real play isnt code editors

https://webmatrices.com/post/xai-is-buying-cursor-for-60-billion-and-the-real-play-isnt-code-editors
1•bishwasbh•42m ago•0 comments

Fundamental Theorem of Calculus

https://david.alvarezrosa.com/posts/fundamental-theorem-of-calculus/
1•dalvrosa•42m ago•0 comments

OpClave- council of LLMs to get one synthesized answer

https://www.opclave.com
1•Aarush_b•44m ago•1 comments

Tempest vs. Tempest: The Making and Remaking of Atari's Iconic Video Game

https://tempest.homemade.systems
5•mwenge•50m ago•1 comments

Small regulatory RNAs show changes in schizophrenia and bipolar disorder

https://www.nature.com/articles/s41398-026-03808-x
3•PaulHoule•50m ago•0 comments

Hermes Agent – A self-improving AI agent

https://hermes-agent.nousresearch.com
1•rob•51m ago•0 comments

AI Tokens Optimization

https://tokenpinch.com
1•javiercast•53m ago•1 comments

Stalwart v0.16: A New Foundation

https://stalw.art/blog/stalwart-0-16/
1•birdculture•54m ago•0 comments

A deadly bacteria is creeping up the Atlantic Coast

https://grist.org/health/vibrio-bacteria-florida-shellfish/
3•Brajeshwar•1h ago•1 comments

Ask HN: Feedback request on new external network attack surface discovery tool

https://turbopentest.com/cloud-easm
1•integsec•1h ago•1 comments

Aegis Solis Archive: Hash manifest for cross-platform verification

https://archive.org/details/aegis-solis-archive-master-hash-manifest-v-13-final
1•aegissolis•1h ago•0 comments