Fast LiteLLM
High-performance Rust acceleration for LiteLLM - providing 2-20x performance improvements for token counting, routing, rate limiting, and connection management.
Why Fast LiteLLM?
Fast LiteLLM is a drop-in Rust acceleration layer for LiteLLM that provides significant performance improvements:
5-20x faster token counting with batch processing
token counting with batch processing 3-8x faster request routing with lock-free data structures
request routing with lock-free data structures 4-12x faster rate limiting with async support
rate limiting with async support 2-5x faster connection management
Built with PyO3 and Rust, it seamlessly integrates with existing LiteLLM code with zero configuration required.
Installation
... continue reading