frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•9mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Show HN: Changeflow – Giving up on pixel diffs after 10 years of false positives

https://changeflow.com
1•stevewillbe•28s ago•0 comments

Disenshittification Nation

https://pluralistic.net/2026/01/29/post-american-canada/
1•hn_acker•54s ago•0 comments

Claude Code Daily Benchmarks for Degradation Tracking

https://marginlab.ai/trackers/claude-code/
1•qwesr123•1m ago•1 comments

Google Disrupts Ipidea Proxy Network

https://www.securityweek.com/google-disrupts-ipidea-proxy-network/
1•alephnerd•1m ago•1 comments

Hunting AitM Phishing Infrastructure Using Certificate Transparency

https://j027.net/hunting-evilginx/
1•j027•2m ago•0 comments

Show HN: I built a pSEO game wiki with Astro and fixed Schema validation errors

https://gamestrategyhub.com/
1•causalzap•2m ago•0 comments

Frigate NVR Critical RCE Vulnerability Severity

1•shadybraden•3m ago•0 comments

Show HN: Planet Cert – Practice Tests for AWS, Cisco, and AI Certs

https://planetcert.com/
1•Alex_Weinberg•3m ago•0 comments

Terry Pratchett's novels may have pointed to his dementia 10y before diagnosis

https://theconversation.com/terry-pratchetts-novels-may-have-held-clues-to-his-dementia-a-decade-...
1•kareemm•4m ago•0 comments

Git protects you [audio]

https://www.buzzsprout.com/2469780/episodes/18555806-20-git-protects-you
1•jammcq•6m ago•0 comments

Text2Vid

https://text2vid.org
1•zhouhua•6m ago•0 comments

Goodhart's Law: When a Measure Becomes a Target, It Loses Its Value

https://read.perspectiveship.com/p/the-cobra-effect
1•birdculture•7m ago•0 comments

Microbubble-induced erosion releases micro- and nanoplastics into water

https://www.science.org/doi/10.1126/sciadv.aea4729
1•PaulHoule•8m ago•0 comments

What is a loot box and why is there one at The Pentagon?

https://taskandpurpose.com/news/pentagon-lucky-box/
2•cainxinth•8m ago•0 comments

Show HN: LLM-assisted research paper reproduction and understanding

https://zllmplayground.com/transend
1•bladecd•9m ago•0 comments

What does AI-assisted development look like in a big open-source project?

https://www.getunleash.io/blog/ai-assisted-development-open-source-project
3•alexcasalboni•9m ago•0 comments

Data Science Weekly – Issue 636

https://datascienceweekly.substack.com/p/data-science-weekly-issue-636
1•sebg•9m ago•0 comments

"Remove Before Flight" tags bought on eBay in 2010 were from Challenger

https://arstechnica.com/space/2026/01/attached-to-tragedy-tracing-challenger-remove-before-flight...
1•chha•10m ago•0 comments

Show HN: Cloudness – An open-source tool to deploy and run apps on Kubernetes

2•Karthik_N•11m ago•0 comments

Show HN: BarrierX – AI that finds which lost deals are worth re-engaging now

https://barrierx.ai/
1•IAMsterdam•12m ago•0 comments

Variable Fonts Workshop

https://variablefonts.gdwithgd.com/
2•noreplica•13m ago•0 comments

The Second Great Error Model Convergence

https://matklad.github.io/2025/12/29/second-error-model-convergence.html
1•surprisetalk•14m ago•0 comments

China Is Erasing Signs of Pessimism and Despair

https://www.nytimes.com/2025/10/08/world/asia/china-censorship-pessimism-despair.html
1•surprisetalk•14m ago•0 comments

Proving (literally) that ChatGPT isn't conscious

https://www.theintrinsicperspective.com/p/proving-literally-that-chatgpt-isnt
1•surprisetalk•14m ago•0 comments

Show HN: Candid – Your front-row seat to politics

https://www.candidmedia.ai/
1•ericatcandid•15m ago•0 comments

Photoshop is overkill for most workflows – so I built a browser-based editor

https://www.picify.co/editor
1•xohails•15m ago•0 comments

AI isn't coming for your job, it's coming for your justification

https://thenextweb.com/news/ai-isnt-coming-for-your-job
1•speckx•16m ago•0 comments

Apple WINS AI because Intel and Microsoft got it wrong [video]

https://www.youtube.com/watch?v=31OyQa_3gZU
1•mfbx9da4•19m ago•0 comments

Last Call for Mass Market Paperbacks

https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/99293-last-call...
2•barry-cotter•20m ago•0 comments

The Logic of the Wide Top Surface in Guest Rooms

https://dreamhomestore.co.uk/products/8-drawer-wide-chest-of-drawers
1•dreamhomestore•21m ago•1 comments