frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Optimizing LiteLLM with Rust – When Expectations Meet Reality

https://github.com/neul-labs/fast-litellm
16•ticktockten•1h ago
I've been working on Fast LiteLLM - a Rust acceleration layer for the popular LiteLLM library - and I had some interesting learnings that might resonate with other developers trying to squeeze performance out of existing systems.

My assumption was that LiteLLM, being a Python library, would have plenty of low-hanging fruit for optimization. I set out to create a Rust layer using PyO3 to accelerate the performance-critical parts: token counting, routing, rate limiting, and connection pooling.

The Approach

- Built Rust implementations for token counting using tiktoken-rs

- Added lock-free data structures with DashMap for concurrent operations

- Implemented async-friendly rate limiting

- Created monkeypatch shims to replace Python functions transparently

- Added comprehensive feature flags for safe, gradual rollouts

- Developed performance monitoring to track improvements in real-time

After building out all the Rust acceleration, I ran my comprehensive benchmark comparing baseline LiteLLM vs. the shimmed version:

Function Baseline Time Shimmed Time Speedup Improvement Status

token_counter 0.000035s 0.000036s 0.99x -0.6%

count_tokens_batch 0.000001s 0.000001s 1.10x +9.1%

router 0.001309s 0.001299s 1.01x +0.7%

rate_limiter 0.000000s 0.000000s 1.85x +45.9%

connection_pool 0.000000s 0.000000s 1.63x +38.7%

Turns out LiteLLM is already quite well-optimized! The core token counting was essentially unchanged (0.6% slower, likely within measurement noise), and the most significant gains came from the more complex operations like rate limiting and connection pooling where Rust's concurrent primitives made a real difference.

Key Takeaways

1. Don't assume existing libraries are under-optimized - The maintainers likely know their domain well 2. Focus on algorithmic improvements over reimplementation - Sometimes a better approach beats a faster language 3. Micro-benchmarks can be misleading - Real-world performance impact varies significantly 4. The most gains often come from the complex parts, not the simple operations 5. Even "modest" improvements can matter at scale - 45% improvements in rate limiting are meaningful for high-throughput applications

While the core token counting saw minimal improvement, the rate limiting and connection pooling gains still provide value for high-volume use cases. The infrastructure I built (feature flags, performance monitoring, safe fallbacks) creates a solid foundation for future optimizations.

The project continues as Fast LiteLLM on GitHub for anyone interested in the Rust-Python integration patterns, even if the performance gains were humbling.

Edit: To clarify - the negative performance for token_counter is likely in the noise range of measurement, suggesting that LiteLLM's token counting is already well-optimized. The 45%+ gains in rate limiting and connection pooling still provide value for high-throughput applications.

Comments

solidsnack9000•1h ago
Interesting write-up.
aaronblohowiak•43m ago
measure before implementing "improvements", you'll develop a sense over time of what is taking too long.
jmalicki•43m ago
The benchmarks in your README.md state that it is several times faster for those operations, are they a lie?

Show HN: Optimizing LiteLLM with Rust – When Expectations Meet Reality

https://github.com/neul-labs/fast-litellm
16•ticktockten•1h ago•3 comments

Show HN: Guts – convert Golang types to TypeScript

https://github.com/coder/guts
4•emyrk•16m ago•0 comments

Show HN: Parqeye – A CLI tool to visualize and inspect Parquet files

https://github.com/kaushiksrini/parqeye
147•kaushiksrini•18h ago•34 comments

Show HN: I built a synth for my daughter

https://bitsnpieces.dev/posts/a-synth-for-my-daughter/
1235•random_moonwalk•6d ago•205 comments

Show HN: ESPectre – Motion detection based on Wi-Fi spectre analysis

https://github.com/francescopace/espectre
188•francescopace•1d ago•47 comments

Show HN: Continuous Claude – run Claude Code in a loop

https://github.com/AnandChowdhary/continuous-claude
154•anandchowdhary•3d ago•55 comments

Show HN: Reversing a Cinema Camera's Peripherals Port

https://3nt3.de/blog/reversing-fs7-comms
44•3nt3•1w ago•2 comments

Show HN: PrinceJS – 19,200 req/s Bun framework in 2.8 kB (built by a 13yo)

https://princejs.vercel.app
143•lilprince1218•22h ago•66 comments

Show HN: Strawk – I implemented Rob Pike's forgotten Awk

4•ahalbert2•4h ago•0 comments

Show HN: Kalendis – Scheduling API (keep your UI, we handle timezones/DST)

https://kalendis.dev
17•dcabal25mh•1d ago•5 comments

Show HN: My hobby OS that runs Minecraft

https://astral-os.org/posts/2025/10/31/astral-minecraft.html
14•avaliosdev•19h ago•2 comments

Show HN: Building WebSocket in Apache Iggy with Io_uring and Completion Based IO

https://iggy.apache.org/blogs/2025/11/17/websocket-io-uring/
26•spetz•1d ago•6 comments

Show HN: Bsub.io – zero-setup batch execution for command-line tools

21•wkoszek•1d ago•7 comments

Show HN: Agfs – Aggregated File System, a modern tribute to the spirit of Plan9

https://github.com/c4pt0r/agfs
9•c4pt0r•18h ago•0 comments

Show HN: Octopii, a framework for building distributed applications in Rust

19•janicerk•1d ago•3 comments

Show HN: How are Markov chains so different from tiny LLMs?

15•JPLeRouzic•21h ago•1 comments

Show HN: Unflip – a puzzle game about XOR patterns of squares

https://unflipgame.com/
176•bogdanoff_2•6d ago•51 comments

Show HN: Blindfold Chess App

https://www.psochess.com/
5•psovit•13h ago•1 comments

Show HN: I have created an alternative for Miro

https://nodeland.io
7•gxara•23h ago•2 comments

Show HN: I build a strace clone for macOS

https://github.com/Mic92/strace-macos
8•Mic92•1d ago•0 comments

Show HN: Discussion of ICT Model – Linking Information, Consciousness and Time

https://www.academia.edu/s/8924eff666
2•DmitriiBaturo•15h ago•0 comments

Show HN: UltraLocked – iOS file vault using Secure Enclave and PFS

https://github.com/UltraLocked/UltraLocked
5•proletarian•1d ago•2 comments

Show HN: Model-agnostic cognitive architecture for LLMs

https://github.com/scottonanski/persistent-mind-model-v1.0
3•HimTortons•17h ago•0 comments

Show HN: I ditched Grafana for my home server and built this instead

https://github.com/alibahmanyar/simon
8•bahmann•1d ago•0 comments

Show HN: Encore – Type-safe back end framework that generates infra from code

https://github.com/encoredev/encore
76•andout_•4d ago•47 comments

Show HN: DBOS Java – Postgres-Backed Durable Workflows

https://github.com/dbos-inc/dbos-transact-java
114•KraftyOne•4d ago•57 comments

Show HN: Hegelion-Dialectic Harness for LLMs (Thesis –> Antithesis –> Synthesis)

https://github.com/Hmbown/Hegelion
2•hunterbown•23h ago•3 comments

Show HN: MCP Traffic Analysis Tool

https://github.com/mcp-shark/mcp-shark
16•o4isec•1d ago•0 comments

Show HN: UpBeat – an AI-Enhanced RSS/Atom Reader that only shows you good news

https://upbeat.mitchelltechnologies.co.uk
5•seanmtracey•1d ago•0 comments

Show HN: Whirligig.live

https://whirligig.live
12•idiocache•2d ago•12 comments