frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•1y ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

The AI Bifurcation of Tech: Why the fundamentals matter more

https://neevash.com/blog/tech-bifurcation-and-the-0.5-layer
1•Nash0x7e2•20s ago•0 comments

A History of Obituaries in American Newspapers

https://blogs.loc.gov/headlinesandheroes/2026/05/mourn-not-a-history-of-obituaries-in-american-ne...
1•NaOH•53s ago•0 comments

Decree 770

https://en.wikipedia.org/wiki/Decree_770
1•mmh0000•1m ago•0 comments

Study of AI use by undergrads revealing disparities in access – and in cheating

https://news.berkeley.edu/2026/05/21/the-largest-study-of-ai-use-by-undergrads-is-in-revealing-di...
1•ChrisArchitect•1m ago•1 comments

When Generation Becomes Cheap, Selection Becomes Governance

https://lospino.so/blog/sunday-field-notes/the-workshop-has-changed/
1•jalospinoso•3m ago•0 comments

DHS says ICE has 'no relationship' with spyware maker Paragon Solutions

https://www.npr.org/2026/05/22/nx-s1-5831577/dhs-ice-spyware-paragon
3•devonnull•4m ago•0 comments

San Francisco Buried Treasure Has Been Found. Stop Hunting

https://www.buriedtreasuresf.com/solution
2•ChrisArchitect•5m ago•1 comments

Designing Event-Driven EVM Monitoring Systems

https://blog.bridgexapi.io/designing-event-driven-evm-monitoring-systems
1•Bridgexapi•7m ago•0 comments

CBP Directive 3340-049B: Border Search of Electronic Devices

https://www.cbp.gov/document/directives/cbp-directive-no-3340-049b-border-search-electronic-devices
1•Ember_Wipe•10m ago•0 comments

AV2 Codec Looks Like It Will Be Officially Released Next Week

https://www.phoronix.com/news/AV2-Next-Week
1•WithinReason•10m ago•0 comments

What Happens When Someone You Love Changes Their Face?

https://www.bloomberg.com/news/articles/2026-05-22/plastic-surgery-and-glp-1s-are-inspiring-a-new...
1•thunderbong•11m ago•0 comments

BYU's Supermileage vehicle: Squeezing 2,145 miles out of a single gallon of fuel

https://news.byu.edu/intellect/the-best
2•_josh_meyer_•12m ago•1 comments

A Fundamental Principle of Aeronautical Engineering Has Been Overturned

https://www.wired.com/story/a-fundamental-principle-of-aeronautical-engineering-has-been-overturned/
1•littlexsparkee•12m ago•2 comments

LLM Edit Tool – Failure Modes and Proposed Improvements

https://github.com/professor-jonny/pulsar-edit-mcp-server/blob/main/LLM-FAILURE-MODES.md
1•professor_jonny•14m ago•1 comments

We built OpenLinker, an open-source channel manager

https://openlinker.io/en/blog/why-we-built-openlinker/
1•PeterSwierzy•14m ago•0 comments

Libexpat Is Understaffed

https://github.com/libexpat/libexpat/blob/master/expat/Changes
1•wg0•15m ago•0 comments

List of April Fools RFCs

https://gist.github.com/eliminmax/7e70b89ae9a996aec7bbb32229def45b
2•NicoHartmann•16m ago•0 comments

New Zealand at wild frontier of AI superhacking

https://www.rnz.co.nz/news/science-and-technology/596203/nz-at-wild-frontier-of-ai-superhacking
6•billybuckwheat•17m ago•1 comments

The Race Is on (AI)

https://www.reloadnyc.com/the-race-is-on/
1•smesser•18m ago•0 comments

A new suite of modern tools coming for editing and publishing RFCs

https://www.ietf.org/blog/new-tools-coming-for-editing-and-publishing-rfcs/
2•shpat•20m ago•0 comments

A decades-old forest planting practice from Japan is gaining traction in the US

https://text.npr.org/nx-s1-5734482
3•mooreds•23m ago•0 comments

Why I Sacrificed a Goat to AWS Gods

https://blog.light-cloud.com/cloud/rethinking-infrastructure
2•julia-kafarska•23m ago•0 comments

Show HN: CRED-1 – Open domain credibility dataset for on-device pre-bunking

https://github.com/aloth/cred-1
2•xlth•25m ago•0 comments

Australia Four-Day Work Week Study Data Shows Boosted Productivity

https://scienceaim.com/australia-just-proved-the-four-day-work-week-works-here-is-what-the-data-a...
3•randycupertino•27m ago•0 comments

White House Approves $9B for Spy Agencies to Catch Up on A.I

https://www.nytimes.com/2026/05/22/us/politics/spy-agencies-ai-chips-shortage.html
3•01-_-•30m ago•0 comments

Measuring LLMs' ability to develop exploits

https://red.anthropic.com/2026/exploit-evals/
3•allenleee•30m ago•0 comments

Google CEO Sundar Pichai says booing graduates will shape AI's future

https://www.businessinsider.com/sundar-pichai-google-graduation-speech-stanford-ai-backlash-eric-...
2•01-_-•32m ago•0 comments

I ran 7 Claude Code instances as an adversarial research collective

https://paragraph.com/@adversarial-auditor/i-ran-7-claude-code-instances-as-an-adversarial-resear...
1•adv-auditor•33m ago•0 comments

The 'Vibecession' Is Over. The 'Permacession' Is Here

https://www.theatlantic.com/ideas/2026/05/americans-depressed-economy/687278/
3•paulpauper•33m ago•0 comments

Show HN: TalkTimer, a micro-SaaS run by an AI agent team

https://talktimer.co
1•a3e7•33m ago•0 comments