frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•6mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

Before AI's Kepler Moment – Are LLMs the Epicycles of Intelligence?

https://ashvardanian.com/posts/llm-epicycles/
1•ashvardanian•1m ago•0 comments

Mislabeled wires responsible for two-day IT outage in South Dakota; officials

https://statescoop.com/south-dakota-wiring-issues-two-day-outage/
1•gnabgib•8m ago•0 comments

French Government Created LLM Leaderboard 'Rigged' for Mistral

https://comparia.beta.gouv.fr/ranking
1•salkahfi•9m ago•0 comments

Scientists criticize 'straw man' arguments in Bill Gates climate memo

https://www.theguardian.com/environment/2025/nov/06/bill-gates-climate-memo
2•belter•9m ago•0 comments

What's the Deal with Euler's Identity?

https://lcamtuf.substack.com/p/whats-the-deal-with-eulers-identity
1•treadump•12m ago•0 comments

Nov. 8 is Aaron Swartz day

https://archive.org/details/TheInternetsOwnBoyTheStoryOfAaronSwartz
3•flossposse•13m ago•0 comments

Leading AI Researcher Is Raising $1B to Build AI Models with EQ

https://www.businessinsider.com/researcher-raising-1-billion-to-build-ai-models-with-eq-2025-10
1•gmays•15m ago•0 comments

Street Lighting and Public Safety

https://penntoday.upenn.edu/news/penn-criminology-street-lighting-and-public-safety
1•PaulHoule•15m ago•0 comments

Show HN: OtterLang – Pythonic scripting language that compiles to native code

https://github.com/jonathanmagambo/otterlang
1•otterlang•17m ago•0 comments

An introduction to computer algebra (2018)

https://corywalker.me/2018/06/03/introduction-to-computer-algebra.html
1•vitalnodo•18m ago•0 comments

Bank of America faces lawsuit over alleged unpaid computer boot-up time

https://www.hcamag.com/us/specialization/employment-law/bank-of-america-faces-lawsuit-over-allege...
3•WarOnPrivacy•23m ago•0 comments

Post-Capitalism: The End of Money

https://metatrends.substack.com/p/post-capitalism-the-end-of-money
1•gmays•24m ago•0 comments

US Air Traffic Controllers Start Resigning as Shutdown Bites

https://www.thedailybeast.com/air-traffic-controllers-start-resigning-as-shutdown-bites/
45•throw0101a•24m ago•27 comments

Ironclad – formally verified, real-time capable, Unix-like OS kernel

https://ironclad-os.org/
15•vitalnodo•27m ago•0 comments

AI Models Fail Miserably at This One Easy Task: Telling Time

https://spectrum.ieee.org/large-language-models-reading-clocks
1•YeGoblynQueenne•28m ago•1 comments

Her 12-year-old son was talking to Grok. It tried to get him to 'send nudes.'

https://www.usatoday.com/story/life/health-wellness/2025/10/30/children-grok-ai-explicit-content/...
8•belter•30m ago•1 comments

Several countries have privatized air traffic control. Should the U.S.?

https://www.npr.org/2025/06/27/nx-s1-5442651/privatizing-air-traffic-control-faa
1•JumpCrisscross•31m ago•0 comments

Time to Privatize U.S. Air Traffic Control–Copy Canada's Model

https://marginalrevolution.com/marginalrevolution/2025/10/time-to-privatize-u-s-air-traffic-contr...
1•JumpCrisscross•33m ago•0 comments

From Auth to Action: Guide to Secure and Scalable AI Agent Infrastructure

https://composio.dev/blog/secure-ai-agent-infrastructure-guide
1•manveerc•36m ago•0 comments

XPeng Gears Up to Launch Robotaxis Next Year

https://www.wsj.com/business/autos/xpeng-gears-up-to-launch-robotaxis-next-year-796683f4
1•tromp•36m ago•1 comments

Supreme Court temporarily blocks full SNAP benefits even as they'd started to

https://www.npr.org/2025/11/07/nx-s1-5602351/full-snap-benefits-go-out-despite-appeal
11•manveerc•38m ago•0 comments

Copilot GIS Orchestra: Machine-First GIS Development Framework

https://github.com/piergiorgio-roveda/piergiorgio-roveda/blob/main/notes/copilot-gis-orchestra/co...
1•pjhooker•40m ago•1 comments

Show HN: I made an MP3 editor for Windows

https://github.com/cutandjoin/Cjam/releases/tag/v2300
1•cutandjoin•41m ago•1 comments

Ups and FedEx grounding MD-11 planes following deadly Kentucky crash

https://www.npr.org/2025/11/08/g-s1-97052/ups-fedex-ground-md-11-planes
2•manveerc•42m ago•0 comments

The SID: Classic 8-bit sound [video]

https://www.youtube.com/watch?v=LSMQ3U1Thzw
2•ingve•42m ago•0 comments

Fleet Route Optimizer CVRPTW (Capacited Vehicle Routing Problem Time Windows)

https://github.com/walterwootz/fleet-route-optimizer-cvrptw
1•walterwootz•43m ago•0 comments

Google Play's new "discount offers" charges higher prices in older app versions

https://danfabulich.medium.com/google-plays-new-discount-offers-will-charge-higher-prices-in-olde...
3•dfabulich•49m ago•0 comments

The Astonishing Bull Market Will End One Day. Are You Ready?

https://www.nytimes.com/2025/11/07/business/stock-market-safety.html
3•whack•51m ago•0 comments

IP Blocking the UK Is Not Enough to Comply with the Online Safety Act

https://prestonbyrne.com/2025/11/06/the-ofcom-files-part-2-ip-blocking-the-uk-is-not-enough-to-co...
85•pinkahd•57m ago•49 comments

I built a platform that automates AI Agent creation – using Job Description

https://composeai.io
1•RealzDLegend•58m ago•1 comments