Built for production: Rust backend, NVIDIA Triton integration, metrics for monitoring.
Flexible policies: Use built-in classifiers or plug in your own PyTorch models.
Easy integration: No major code changes needed, point your client at the router.
Example: Route complex coding questions to a powerful model, and simple rewrites to a smaller, cheaper one.
Repo: github.com/NVIDIA-AI-Blueprints/llm-router
Would love feedback, ideas, and to hear how others are handling multi-LLM workflows!
b0a04gl•3h ago