newest
Open in hackernews
Using the Lamborghini of inference engines for serverless Llama 3
https://modal.com/docs/examples/trtllm_latency
1
•
birdculture
•
4h ago
Comments
gnabgib
•
4h ago
Title:
Serve an interactive language model app with latency-optimized TensorRT-LLM
gnabgib•4h ago