frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Optimizing LiteLLM with Rust – When Expectations Meet Reality

https://github.com/neul-labs/fast-litellm
12•ticktockten•1h ago
I've been working on Fast LiteLLM - a Rust acceleration layer for the popular LiteLLM library - and I had some interesting learnings that might resonate with other developers trying to squeeze performance out of existing systems.

My assumption was that LiteLLM, being a Python library, would have plenty of low-hanging fruit for optimization. I set out to create a Rust layer using PyO3 to accelerate the performance-critical parts: token counting, routing, rate limiting, and connection pooling.

The Approach

- Built Rust implementations for token counting using tiktoken-rs

- Added lock-free data structures with DashMap for concurrent operations

- Implemented async-friendly rate limiting

- Created monkeypatch shims to replace Python functions transparently

- Added comprehensive feature flags for safe, gradual rollouts

- Developed performance monitoring to track improvements in real-time

After building out all the Rust acceleration, I ran my comprehensive benchmark comparing baseline LiteLLM vs. the shimmed version:

Function Baseline Time Shimmed Time Speedup Improvement Status

token_counter 0.000035s 0.000036s 0.99x -0.6%

count_tokens_batch 0.000001s 0.000001s 1.10x +9.1%

router 0.001309s 0.001299s 1.01x +0.7%

rate_limiter 0.000000s 0.000000s 1.85x +45.9%

connection_pool 0.000000s 0.000000s 1.63x +38.7%

Turns out LiteLLM is already quite well-optimized! The core token counting was essentially unchanged (0.6% slower, likely within measurement noise), and the most significant gains came from the more complex operations like rate limiting and connection pooling where Rust's concurrent primitives made a real difference.

Key Takeaways

1. Don't assume existing libraries are under-optimized - The maintainers likely know their domain well 2. Focus on algorithmic improvements over reimplementation - Sometimes a better approach beats a faster language 3. Micro-benchmarks can be misleading - Real-world performance impact varies significantly 4. The most gains often come from the complex parts, not the simple operations 5. Even "modest" improvements can matter at scale - 45% improvements in rate limiting are meaningful for high-throughput applications

While the core token counting saw minimal improvement, the rate limiting and connection pooling gains still provide value for high-volume use cases. The infrastructure I built (feature flags, performance monitoring, safe fallbacks) creates a solid foundation for future optimizations.

The project continues as Fast LiteLLM on GitHub for anyone interested in the Rust-Python integration patterns, even if the performance gains were humbling.

Edit: To clarify - the negative performance for token_counter is likely in the noise range of measurement, suggesting that LiteLLM's token counting is already well-optimized. The 45%+ gains in rate limiting and connection pooling still provide value for high-throughput applications.

Comments

solidsnack9000•41m ago
Interesting write-up.
aaronblohowiak•18m ago
measure before implementing "improvements", you'll develop a sense over time of what is taking too long.
jmalicki•18m ago
The benchmarks in your README.md state that it is several times faster for those operations, are they a lie?

Generative UI: LLMs Are Effective UI Generators

https://generativeui.github.io/
1•easton•31s ago•0 comments

Show HN: Dataset Factory – Generate RAG evaluation datasets from a text prompt

https://alexjacobs08.github.io/datasetFactory/
1•tacoooooooo•1m ago•0 comments

What Is Work?

https://substack.com/inbox/post/179266883
1•gilfoyle_7•2m ago•0 comments

Active short video use linked to altered attention and brain connectivity

https://www.psypost.org/active-short-video-use-linked-to-altered-attention-and-brain-connectivity/
1•01-_-•4m ago•0 comments

Nestle accused of risking babies' health in Africa

https://www.aljazeera.com/news/2025/11/18/nestle-accused-of-risking-baby-heath-in-africa-asia-and...
1•Qem•4m ago•0 comments

To Be a Leader of Systems

https://hazelweakly.me/blog/to-be-a-leader-of-systems/
1•gpi•5m ago•0 comments

Mapping the future with 3D‑printed titanium Apple Watch cases

https://www.apple.com/newsroom/2025/11/mapping-the-future-with-3d-printed-titanium-apple-watch-ca...
1•throwfaraway4•5m ago•0 comments

Build a full data set using a single web query

https://parallel.ai/blog/introducing-findall-api
1•lukaslevert•6m ago•0 comments

John Henry and the Broken Dishwasher

https://substack.com/inbox/post/179265755
1•mathattack•7m ago•0 comments

Why one of the nation's most prosperous industries is shedding jobs

https://www.washingtonpost.com/business/2025/11/18/big-tech-layoffs-ai/
2•jackallis•7m ago•0 comments

A City Is Broke. Can a Billionaires' Urbanist Dream Offer It a Last Chance?

https://www.nytimes.com/2025/11/18/business/economy/suisun-city-makes-an-offer-to-california-fore...
2•mitchbob•7m ago•1 comments

Feeling Flush with Success – Making Museum Bathrooms into Exhibition Spaces

https://blog.orselli.net/2025/07/feeling-flush-with-success-making.html
1•ripe•8m ago•0 comments

The Connectivity Standards Alliance Announces Zigbee 4.0 and Suzi

https://csa-iot.org/newsroom/the-connectivity-standards-alliance-announces-zigbee-4-0-and-suzi-em...
1•paulatreides•12m ago•0 comments

MPEG: Setting the Standards for a Digital Future

https://computeradsfromthepast.substack.com/p/mpeg-setting-the-standards-for-a
1•rbanffy•13m ago•0 comments

Mac Mini M4 Storage Upgrade: My Take on the Acasis M001 vs. WD SN7100 NVMe

https://wasi0013.com/2025/11/18/mac-mini-m4-storage-upgrade-my-honest-take-on-the-acasis-m001-and...
1•furkansahin•13m ago•0 comments

Optimizing RhBMP-2 Therapy for Bone Regeneration

https://www.mdpi.com/1422-0067/26/21/10723
1•PaulHoule•14m ago•0 comments

Color Palette Pro

https://colorpalette.pro/
1•cal85•15m ago•1 comments

Old 'Ghost' Theory of Quantum Gravity Makes a Comeback

https://www.quantamagazine.org/old-ghost-theory-of-quantum-gravity-makes-a-comeback-20251117/
1•pseudolus•16m ago•0 comments

Google is collecting troves of data from downgraded Nest thermostats

https://www.theverge.com/news/820600/google-nest-learning-thermostat-downgraded-data-collection
2•sdoering•17m ago•0 comments

High-resolution climate model forecasts a wet, turbulent future

https://www.science.org/content/article/high-resolution-climate-model-forecasts-wet-turbulent-future
1•bikenaga•17m ago•0 comments

Google Antigravity – Agentic development IDE [video]

https://www.youtube.com/watch?v=nTOVIGsqCuY
1•truth_seeker•19m ago•0 comments

Why crypto is melting down and stocks keep falling

https://www.cnn.com/2025/11/18/business/bitcoin-price-crypto-stocks
3•Bender•20m ago•0 comments

The Only AI Explainer You'll Ever Need

https://kemendo.com/Understand-AI.html
2•AndrewKemendo•21m ago•0 comments

Tooltip Components Should Not Exist

https://tkdodo.eu/blog/tooltip-components-should-not-exist
2•agos•21m ago•0 comments

Hey there You are using WhatsApp (enumerating 3B WhatsApp accounts)

https://github.com/sbaresearch/whatsapp-census
1•ano-ther•22m ago•0 comments

Pebble, Rebble, and a Path Forward

https://ericmigi.com/blog/pebble-rebble-and-a-path-forward/
25•phoronixrly•23m ago•1 comments

Rails to SvelteKit Migration – LocallyGrown

https://blog.kestrelsnest.social/posts/locallygrown-rails-svelte-migration/
1•dzonga•25m ago•0 comments

Camper Rental Company Is Selling All of Its Custom Vans

https://www.thedrive.com/news/a-defunct-camper-rental-company-is-selling-all-of-its-custom-vans-a...
3•iancmceachern•26m ago•0 comments

RasterFlow – A lightweight node-based image editor

https://rasterflow.io
2•activey•27m ago•1 comments

Show HN: I am self-hosting a time-sorted list of top STEM, Arts and Design posts

https://limereader.com/
1•busymom0•29m ago•1 comments