frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Veo 3 AI

https://veo-3-ai.org/
1•Evan233•1m ago•1 comments

Show HN: GitHub Repo Agent – an agent that explores and reasons on GitHub repos

https://github.com/gauravvij/GithubRepoAgent
1•gauravvij137•5m ago•0 comments

I Put a Full JVM Inside a Browser Tab

https://bmarti44.substack.com/p/i-put-a-full-jvm-inside-a-browser
2•todsacerdoti•6m ago•0 comments

Full speech pipeline in native Swift/MLX – ASR, TTS, speech-to-speech, on-device

https://github.com/ivan-digital/qwen3-asr-swift
1•ipotapov•7m ago•1 comments

People in northeast BC say rest of province should embrace year-round time zone

https://www.cbc.ca/news/canada/british-columbia/time-change-british-columbia-9.7112139
1•divbzero•7m ago•0 comments

California to require age verification for all OS including Linux

https://www.tomshardware.com/software/operating-systems/california-introduces-age-verification-law
2•hambes•8m ago•0 comments

Rare Not Random – Using Token Efficiency for Secrets Scanning

https://lookingatcomputer.substack.com/p/rare-not-random
1•boyter•10m ago•0 comments

Strict Monospace Font for Chinese Japanese Korean for AI LLM-CLI Users|CodexMono

https://www.npmjs.com/package/@monolex/codexmono
1•monokist•13m ago•1 comments

Working on Things That Suck

https://mayberay.bearblog.dev/working-on-things-that-suck/
1•mugamuga•16m ago•0 comments

Ask HN: How Do Emergency Alerts on Phone Work?

2•rishikeshs•20m ago•2 comments

US President struggles to explain why he launched another Middle Eastern war

https://www.ft.com/content/fd31c6ad-39f0-4fae-851c-fadf44f006eb
9•Jimmc414•27m ago•3 comments

Apple Does Value (Week)

https://om.co/2026/03/02/apple-does-value-week/
1•tosh•28m ago•1 comments

The Pointless War Between The Pentagon and Anthropic

https://www.wsj.com/opinion/the-pointless-war-between-the-pentagon-and-anthropic-9284fd37
4•jrosenblatt•34m ago•0 comments

Show HN: wo; a better CD for repo management

https://github.com/anishalle/wo
1•itsagamer124•38m ago•0 comments

Show HN: AI gaming copilot that uses a phone camera instead of screen capture

https://github.com/ninja-otaku/Project_Aegis
1•Genome123•39m ago•0 comments

OpenAI Amends A.I. Deal with The Pentagon

https://www.nytimes.com/2026/03/02/technology/openai-pentagon-deal-amended-surveillance.html
4•fatboy•39m ago•1 comments

Show HN: Archilvx-Own your Twitter data because cloud tools will fail you

https://www.archivlyx.com/twitter-archive
1•ErinSunny•41m ago•0 comments

Israel hacked Iran traffic cams for years to pinpoint Khaemnei prior to strike

https://www.google.com/search?q=https://www.ft.com/content/1317d740-410c-46a2-97b7-6573e0477121
4•c420•43m ago•1 comments

What 10 Years of Building Social Apps Taught Me

https://twitter.com/nikitabier/status/1481118406749220868
2•metmirr•44m ago•0 comments

OSINT Expert Recreates Timeline of Operation Epic Fury in Interactive 3D Model

https://www.jfeed.com/news-world/operation-epic-fury-3d-replay
2•rapnie•48m ago•1 comments

Show HN: Trade Stocks and Crypto On-Chain with Full Transparency

https://www.aulico.com
2•rendernos•49m ago•0 comments

Buckle Up for Bumpier Skies

https://www.newyorker.com/magazine/2026/03/09/buckle-up-for-bumpier-skies
10•littlexsparkee•57m ago•1 comments

Claude's Constitution and Asimov's Laws

https://yadin.com/notes/asimov/
3•dryadin•1h ago•0 comments

Data-structure-typed – TreeMap, Heap, Graph and more for TypeScript

https://github.com/zrwusa/data-structure-typed
1•zrwusa•1h ago•0 comments

Stop macOS 26 nagging with one tiny policy tweak

https://www.theregister.com/2026/03/02/stop_tahoe_update/
2•fghorow•1h ago•0 comments

Is Mandarin superior for LLM data?

https://medium.com/@tjanmichela/the-language-of-intelligence-could-mandarin-be-the-secret-to-smar...
5•treebeard901•1h ago•3 comments

Show HN: DiffMem in production, Git-based AI memory

https://withanna.io
2•alexmrv•1h ago•0 comments

Constraints and the Lost Art of Optimization

https://denodell.com/blog/constraints-and-the-lost-art-of-optimization
1•CharlesW•1h ago•0 comments

Show HN: Calendar Tool for College Students

https://almanaccal.com/
1•TG_Dev•1h ago•0 comments

Show HN: Private AI Document Server

https://github.com/queryhat/super-hat/blob/main/README.md
1•chhetri978•1h ago•1 comments