frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fast and Quality Code Chunking with Chonkie

1•snyy•11mo ago
Hi HN,

We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.

When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.

Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.

How it works:

(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)

Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.

It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.

What it’s useful for:

  - Embedding-based code search

  - RAG (retrieval-augmented generation) over codebases

  - Long-context analysis of code

  - Preparing repos for fine-tuning or pretraining
Try it out:

  - Open source package: https://docs.chonkie.ai/chunkers/code-chunker

  - Hosted playground (free with account): https://cloud.chonkie.ai
Happy Chonking!

We tested the insider trading claim on Polymarket with Taleb-proof methods

https://www.convexly.app/blog/polymarket-whale-audit
1•convexly•29s ago•0 comments

Claude Code is thinking too much

1•mrprincerawat•50s ago•0 comments

Rembg is a tool to remove images background

https://github.com/danielgatis/rembg
1•rob•2m ago•0 comments

I added a leaderboard to my shikaku site

https://shikaku.ch/
1•lucernelucerne•2m ago•0 comments

AI creates jobs. Simplicity destroys jobs

https://hexaray.com/blog/ai-creates-jobs-simplicity-destroys-jobs
1•gatinsama•2m ago•0 comments

Journey Foods (track and improve food formulations with ingredient scoring)

https://www.journeyfoods.io
1•rianascientist•2m ago•0 comments

How About a Nice Game of Thermonuclear War?

https://substack.com/@gaspland/note/p-193306535
1•meany•2m ago•0 comments

Mind the Gap

https://www.butthistime.com/p/mind-the-gap
1•gHeadphone•2m ago•0 comments

Hyper-DERP: Same throughput as Tailscale's derper, half the cores

https://hyper-derp.dev/blog/hyper-derp-announcement/
1•KRuskowski•2m ago•0 comments

Everything Should Be Typed: Scalar Types Are Not Enough

https://sot.dev/everything-should-be-typed.html
1•01-_-•3m ago•0 comments

Autonomous robo-taxis now driving themselves on British streets

https://news.sky.com/story/autonomous-robo-taxis-now-driving-themselves-on-british-streets-13531630
1•Brajeshwar•3m ago•0 comments

Ask HN: What Is the Big-O Order of a Jigsaw Puzzle?

1•AnimalMuppet•5m ago•0 comments

Show HN: DynamoSQL – I made a SQL query engine for DynamoDB

https://dynamosql.com/
1•cammasmith•6m ago•0 comments

Forensic analysis of 37GB data loss caused by Cursor AI Agent

https://github.com/kotarimorm/-Report-AI-coding-agent-programmatically-bypassing-OS-security-poli...
1•GRAY_WHALE_CO•7m ago•0 comments

A sleek, wearable airbag for cyclists is nearly here

https://www.theverge.com/gadgets/911540/a-sleek-wearable-airbag-for-cyclists-is-nearly-here
1•JeanKage•7m ago•0 comments

Dual-Boot an Apple Silicon Mac in Sequoia or Tahoe

https://eclecticlight.co/2026/04/14/dual-boot-an-apple-silicon-mac-in-sequoia-or-tahoe/
1•chmaynard•7m ago•0 comments

Cloudflare Incident – Increased HTTP 5xx Errors

https://www.cloudflarestatus.com/incidents/213v4v7s7m97
1•palmeida•8m ago•0 comments

Dermcidin has antiviral activity and protects against influenza

https://www.pnas.org/doi/10.1073/pnas.2424461123
1•geox•8m ago•0 comments

Show HN: Hotel MCP server for cash and points search/booking (free)

https://www.gondola.ai/mcp
1•skyler1537•11m ago•0 comments

Can you pass the reverse Turing test?

https://nataliercargill.substack.com/p/can-you-pass-the-reverse-turing-test
1•eatitraw•11m ago•0 comments

Man suspected of Molotov attack Sam Altmans home charged with attempted murder

https://www.theregister.com/2026/04/14/altman_attempted_murder/
2•Bender•12m ago•0 comments

Linux 7.0 debuts as Linus Torvalds ponders AI's bug-finding powers

https://www.theregister.com/2026/04/13/linux_kernel_7_releaseed/
2•Bender•13m ago•1 comments

Measles takes a plane to Idaho, which has worst vaccination rate in US

https://arstechnica.com/health/2026/04/airport-measles-case-reported-in-idaho-state-with-lowest-v...
3•Bender•14m ago•0 comments

My First Fight

https://dschorno.wordpress.com/2026/04/10/my-first-fight/
1•eatitraw•17m ago•0 comments

Surf

https://about.surf.social/
1•bookofjoe•21m ago•0 comments

Show HN: Manex Hub – private AI research memory, on-device

https://manex.app/
2•krcnow•22m ago•0 comments

Show HN: Get Hired with AI, a free book I wrote on using LLMs for a job search

https://www.careervectorhq.com/book/get-hired-with-ai.html
1•dawie•23m ago•1 comments

Time for Scrutiny: Getting to the Bottom of Daylight Saving Time and Time Zones

https://fixdst.com/?rdt_cid=5846299599488362810
1•surprisetalk•23m ago•0 comments

JSON-complete data formats and programming languages

https://lemire.me/blog/2025/12/20/json-complete-data-format-and-programming-languages/
1•surprisetalk•23m ago•0 comments

Noise

https://homosabiens.substack.com/p/noise
1•surprisetalk•23m ago•0 comments