frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Optimizing LiteLLM with Rust – When Expectations Meet Reality

https://github.com/neul-labs/fast-litellm
14•ticktockten•1h ago
I've been working on Fast LiteLLM - a Rust acceleration layer for the popular LiteLLM library - and I had some interesting learnings that might resonate with other developers trying to squeeze performance out of existing systems.

My assumption was that LiteLLM, being a Python library, would have plenty of low-hanging fruit for optimization. I set out to create a Rust layer using PyO3 to accelerate the performance-critical parts: token counting, routing, rate limiting, and connection pooling.

The Approach

- Built Rust implementations for token counting using tiktoken-rs

- Added lock-free data structures with DashMap for concurrent operations

- Implemented async-friendly rate limiting

- Created monkeypatch shims to replace Python functions transparently

- Added comprehensive feature flags for safe, gradual rollouts

- Developed performance monitoring to track improvements in real-time

After building out all the Rust acceleration, I ran my comprehensive benchmark comparing baseline LiteLLM vs. the shimmed version:

Function Baseline Time Shimmed Time Speedup Improvement Status

token_counter 0.000035s 0.000036s 0.99x -0.6%

count_tokens_batch 0.000001s 0.000001s 1.10x +9.1%

router 0.001309s 0.001299s 1.01x +0.7%

rate_limiter 0.000000s 0.000000s 1.85x +45.9%

connection_pool 0.000000s 0.000000s 1.63x +38.7%

Turns out LiteLLM is already quite well-optimized! The core token counting was essentially unchanged (0.6% slower, likely within measurement noise), and the most significant gains came from the more complex operations like rate limiting and connection pooling where Rust's concurrent primitives made a real difference.

Key Takeaways

1. Don't assume existing libraries are under-optimized - The maintainers likely know their domain well 2. Focus on algorithmic improvements over reimplementation - Sometimes a better approach beats a faster language 3. Micro-benchmarks can be misleading - Real-world performance impact varies significantly 4. The most gains often come from the complex parts, not the simple operations 5. Even "modest" improvements can matter at scale - 45% improvements in rate limiting are meaningful for high-throughput applications

While the core token counting saw minimal improvement, the rate limiting and connection pooling gains still provide value for high-volume use cases. The infrastructure I built (feature flags, performance monitoring, safe fallbacks) creates a solid foundation for future optimizations.

The project continues as Fast LiteLLM on GitHub for anyone interested in the Rust-Python integration patterns, even if the performance gains were humbling.

Edit: To clarify - the negative performance for token_counter is likely in the noise range of measurement, suggesting that LiteLLM's token counting is already well-optimized. The 45%+ gains in rate limiting and connection pooling still provide value for high-throughput applications.

Comments

solidsnack9000•51m ago
Interesting write-up.
aaronblohowiak•29m ago
measure before implementing "improvements", you'll develop a sense over time of what is taking too long.
jmalicki•28m ago
The benchmarks in your README.md state that it is several times faster for those operations, are they a lie?

Cloudflare Global Network experiencing issues

https://www.cloudflarestatus.com/?t=1
2058•imdsm•6h ago•1340 comments

Gemini 3 for developers: New reasoning, agentic capabilities

https://blog.google/technology/developers/gemini-3-developers/
281•janpio•1h ago•86 comments

Gemini 3 Pro Preview Live in AI Studio

https://aistudio.google.com/prompts/new_chat?model=gemini-3-pro-preview
382•preek•2h ago•156 comments

Pebble, Rebble, and a Path Forward

https://ericmigi.com/blog/pebble-rebble-and-a-path-forward/
45•phoronixrly•33m ago•4 comments

A Day at Hetzner Online in the Falkenstein Data Center

https://www.igorslab.de/en/a-day-at-hetzner-online-in-the-falkenstein-data-center-insights-into-s...
63•speckx•1h ago•16 comments

Gemini 3

https://blog.google/products/gemini/gemini-3/
292•meetpateltech•1h ago•85 comments

5 Things to Try with Gemini 3 Pro in Gemini CLI

https://developers.googleblog.com/en/5-things-to-try-with-gemini-3-pro-in-gemini-cli/
58•keithba•1h ago•17 comments

Solving a Million-Step LLM Task with Zero Errors

https://arxiv.org/abs/2511.09030
26•Anon84•1h ago•3 comments

Strix Halo's Memory Subsystem: Tackling iGPU Challenges

https://chipsandcheese.com/p/strix-halos-memory-subsystem-tackling
20•PaulHoule•1h ago•7 comments

Nearly all UK drivers say headlights are too bright

https://www.bbc.com/news/articles/c1j8ewy1p86o
441•YeGoblynQueenne•3h ago•428 comments

How Quake.exe got its TCP/IP stack

https://fabiensanglard.net/quake_chunnel/index.html
356•billiob•9h ago•74 comments

Google Brings Gemini 3 AI Model to Search and AI Mode

https://blog.google/products/search/gemini-3-search-ai-mode/
54•CrypticShift•1h ago•4 comments

Short Little Difficult Books

https://countercraft.substack.com/p/short-little-difficult-books
86•crescit_eundo•3h ago•39 comments

Do Not Put Your Site Behind Cloudflare If You Don't Need To

https://huijzer.xyz/posts/123/do-not-put-your-site-behind-cloudflare-if-you-dont
314•huijzer•5h ago•235 comments

Google Antigravity

https://antigravity.google/
158•Fysi•2h ago•115 comments

Google Antigravity, a New Era in AI-Assisted Software Development

https://antigravity.google/blog/introducing-google-antigravity
170•meetpateltech•1h ago•124 comments

Show HN: Optimizing LiteLLM with Rust – When Expectations Meet Reality

https://github.com/neul-labs/fast-litellm
14•ticktockten•1h ago•3 comments

The Miracle of Wörgl

https://scf.green/story-of-worgl-and-others/
97•simonebrunozzi•6h ago•51 comments

Mathematics and Computation (2019) [pdf]

https://www.math.ias.edu/files/Book-online-Aug0619.pdf
44•nill0•5h ago•9 comments

Gemini 3 Pro Model Card

https://pixeldrain.com/u/hwgaNKeH
387•Topfi•6h ago•257 comments

Ruby 4.0.0 Preview2 Released

https://www.ruby-lang.org/en/news/2025/11/17/ruby-4-0-0-preview2-released/
146•pansa2•4h ago•50 comments

Looking for Hidden Gems in Scientific Literature

https://elicit.com/blog/literature-based-discovery
7•ravenical•5d ago•1 comments

Beauty in/of mathematics: tessellations and their formulas

https://www.tandfonline.com/doi/full/10.1080/00036811.2025.2510472
12•QueensGambit•5d ago•0 comments

GoSign Desktop RCE flaws affecting users in Italy

https://www.ush.it/2025/11/14/multiple-vulnerabilities-gosign-desktop-remote-code-execution/
43•ascii•5h ago•18 comments

How many video games include a marriage proposal? At least one

https://32bits.substack.com/p/under-the-microscope-ncaa-basketball
305•bbayles•5d ago•73 comments

I've Wanted to Play That 'Killer Shark' Arcade Game Briefly Seen in 'Jaws'

https://www.remindmagazine.com/article/15694/jaws-arcade-video-game-killer-shark-atari-sega-elect...
21•speckx•4d ago•7 comments

Langfuse (YC W23) Hiring OSS Support Engineers in Berlin and SF

https://jobs.ashbyhq.com/langfuse/5ff18d4d-9066-4c67-8ecc-ffc0e295fee6
1•clemo_ra•10h ago

The Uselessness of "Fast" and "Slow" in Programming

https://jerf.org/iri/post/2025/the_uselessness_of_fast/
98•zdw•6d ago•50 comments

Azure hit by 15 Tbps DDoS attack using 500k IP addresses

https://www.bleepingcomputer.com/news/microsoft/microsoft-aisuru-botnet-used-500-000-ips-in-15-tb...
454•speckx•1d ago•287 comments

The surprising benefits of giving up

https://nautil.us/the-surprising-benefits-of-giving-up-1248362/
173•jnord•13h ago•138 comments