[0] font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
Only one suggestion: On page "OpenAI-compatible API" it would be great to have also a simple example for the pure REST call instead of the need to import the OpenAI package.
Running them well is very important too. As we get to grips with everything models can do and look to deploy them widely knowledge of how to best run them becomes ever more important.
sherlockxu•7mo ago
We created this handbook to make LLM inference concepts more accessible, especially for developers building real-world LLM applications. The goal is to pull together scattered knowledge into something clear, practical, and easy to build on.
We’re continuing to improve it, so feedback is very welcome!
GitHub repo: https://github.com/bentoml/llm-inference-in-production
armcat•7mo ago
DiabloD3•7mo ago
leopoldj•7mo ago
DiabloD3•7mo ago
nl•7mo ago
DiabloD3•7mo ago
When you get a model offered by Ollama's service, you have no clue what you're getting, and normal people who have no experience aren't even aware of this.
Ollama is an unrestricted footgun because of this.
nl•6mo ago
DiabloD3•6mo ago
When R1 first came out, for example, their official copy of it was one of the distills labeled as "R1" instead of something like "R1-qwen-distill". They've done this more than once.
ChromaticPanic•6mo ago
criemen•7mo ago
I have a question. In https://github.com/bentoml/llm-inference-in-production/blob/..., you have a single picture that defines TTFT and ITL. That does not match my understanding (but you guys know probably more than me): In the graphic, it looks like that the model is generating 4 tokens T0 to T3, before outputting a single output token.
I'd have expected that picture for ITL (except that then the labeling of the last box is off), but for TTFT, I'd have expected that there's only a single token T0 from the decode step, that then immediately is handed to detokenization and arrives as first output token (if we assume a streaming setup, otherwise measuring TTFT makes little sense).
sherlockxu•6mo ago
sethherr•7mo ago
At the very least, the sections should be a single page each.