We built a consumer app that does deep ingredient and health analysis (food, supplements, skincare, cat treats, etc.) using llama-3.3-70b in production.
Some numbers from the last month:
- ~3.0M+ tokens processed
- ~$2.07 total inference cost
- ~0.5–0.6 cents per scan
- Median latency ~3s, typical range 3–5s
- Long prompts, structured outputs, ingredient-level caching
This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.
The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.
Happy to answer questions or share more details if useful.
rs1996•17h ago
Some numbers from the last month: - ~3.0M+ tokens processed - ~$2.07 total inference cost - ~0.5–0.6 cents per scan - Median latency ~3s, typical range 3–5s - Long prompts, structured outputs, ingredient-level caching
This isn’t a demo or batch job — it’s a real latency-constrained mobile workload with thousands of active scanning users.
The main takeaway for us was that deep, high-quality inference can be surprisingly cheap and predictable if you design for it intentionally.
Happy to answer questions or share more details if useful.