frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•10mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Show HN: CDNs – A simple Go CLI to switch DNS servers in one second

https://github.com/junevm/cdns
1•unsorted2270•59s ago•0 comments

Kansas Sends Letters to Trans People Demanding the Surrender of Drivers Licenses

https://www.erininthemorning.com/p/kansas-sends-letters-to-trans-people
1•speckx•2m ago•0 comments

Story of XZ Backdoor [video]

https://www.youtube.com/watch?v=aoag03mSuXQ
1•Ulf950•5m ago•0 comments

Show HN: Soften Sleep – an iOS app for waking up at 3 AM with racing thoughts

https://apps.apple.com/nl/app/soften-sleep-3-am-wake-relief/id6759115897
1•ilkeraltin•7m ago•0 comments

"TBPN" and the Rise of the Tech-Friendly Talk Show

https://www.newyorker.com/culture/the-lede/tbpn-and-the-rise-of-the-tech-friendly-talk-show
1•stackbutterflow•7m ago•0 comments

Show HN: Tiqd – a checklist library for life tasks

https://www.tiqd.app/
1•rvalley•9m ago•0 comments

OSS Maintainers Can Inject Their Standards into Contributors' AI Tools

https://nonconvexlabs.com/blog/oss-maintainers-can-inject-their-standards-into-contributors-ai-tools
2•aaddrick•9m ago•1 comments

Show HN: Bored, so I graphed 2M Telegram users by their gifts

https://tgnetwork.sarm.solutions/
3•dmpyatyi•9m ago•1 comments

NSA and IETF – The Structure of the Debate

https://blog.cr.yp.to/20260221-structure.html
1•_tk_•10m ago•1 comments

Anthropic gives Opus 3 exit interview, "retirement" blog

https://www.anthropic.com/research/deprecation-updates-opus-3
1•colinhb•11m ago•0 comments

Show HN: Sonde – Open-source LLM analytics (track brand mentions across LLMs)

https://github.com/compiuta-origin/sonde-analytics
2•marcopinato•11m ago•0 comments

First writing may be 40k years earlier than thought

https://www.bbc.com/news/articles/cvgknj7yyv2o
1•xoxxala•11m ago•0 comments

96.5% of confusables.txt from Unicode is not high-risk

https://paultendo.github.io/posts/confusable-vision-visual-similarity/
1•colejohnson66•11m ago•0 comments

Rampant online abuse and deepfakes targeting women on Substack

https://lettersfromafeminist.substack.com/p/an-open-letter-to-the-substack-team
2•navs•11m ago•0 comments

Workers on training AI to do their jobs

https://www.theguardian.com/technology/2026/feb/26/workers-training-ai-to-do-their-jobs
2•n1b0m•12m ago•0 comments

The Forever Pollution Project

https://foreverpollution.eu/
2•doener•12m ago•0 comments

Air defence in Kyiv visible on ISS video stream [video]

https://www.youtube.com/watch?v=m5VHETDtQ_M
1•IndrekR•13m ago•0 comments

zram

https://wiki.archlinux.org/title/Zram
1•tosh•13m ago•0 comments

Ask HN: What causes Claude's '[mistake] – wait, no [correction]' pattern?

1•alastairr•13m ago•2 comments

OpenAI's Kevin Weil on the Future of Scientific Discovery

https://speedrun.substack.com/p/openai-kevin-weil-future-of-scientific-discovery
1•7777777phil•14m ago•0 comments

OpenAI Codex and Figma launch seamless code-to-design experience

https://openai.com/index/figma-partnership/
1•JeanKage•17m ago•0 comments

CodeSpeak, next-generation programming language powered by LLMs

https://codespeak.dev/
1•pjmlp•18m ago•0 comments

"Superintelligence and Law"

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6302179
1•doubleuinsights•19m ago•0 comments

Show HN: EZClaw – Deploy OpenClaw in Minutes

https://www.ezclaw.cloud
1•HiTechK•19m ago•1 comments

Hot take: movies suck because there is no rental market

https://tildes.net/~movies/1sqi/hot_take_movies_suck_because_there_is_no_rental_market
1•PaulHoule•19m ago•0 comments

Does Agents.md Help Coding Agents?

https://academy.dair.ai/blog/agents-md-evaluation
1•omarsar•20m ago•0 comments

BuildKit: Docker's Hidden Gem That Can Build Almost Anything

https://tuananh.net/2026/02/25/buildkit-docker-hidden-gem/
1•jasonpeacock•21m ago•0 comments

Lessons from my overly-introspective, self-improving coding agent

https://ngrok.com/blog/bmo-self-improving-coding-agent
1•EndEntire•21m ago•0 comments

Show HN: WebGL mipmap renderer for a zoomable R/place on a real world map

https://worldcanvas.art/
1•recuerdame•22m ago•0 comments

Is AI Making Us Dumb?

https://profgmedia.substack.com/p/is-ai-making-us-dumb
2•obscurette•22m ago•0 comments