frontpage.

Show HN: Inference API that adapts to your SLA and quality constraints

https://models.exosphere.host/

6•spacemnstr42069•1mo ago

Hi HN, I'm one of the creators of Exosphere. Think of us like a reliability lab for agents.

Today we are launching Exosphere Flex Inference APIs: Inference APIs should adapt to your constraints, not the other way around.

Usually, when you need to run inference at scale, you are forced into rigid boxes:

1. "Real-time" APIs (Expensive, optimized for <1s latency, prone to 429s).

2. "Batch" APIs (Cheaper, but often force 24-hour windows and rigid file formats).

3. "Self-hosted" (Total control, but high ops overhead).

We built a flexible inference engine that sits in the middle. You define the constraints—SLA (time), Cost, and Quality and the system handles the execution.

Here is how it works under the hood:

1. Flexible SLAs (The "Time" Constraint): Instead of just "now" or "tomorrow," you pass an `sla` parameter (e.g., 60 minutes, 4 hours). Our scheduler bins these requests to optimize GPU saturation across our provider mesh. You trade strict immediacy for up to ~70% lower cost.

2. Reliability Layer (The "Ops" Constraint): We abstract away the error handling. If a provider throws a 429 or 503, you shouldn't have to write a retry loop with backoff jitter. Our infrastructure absorbs these failures and retries internally. We guarantee the request eventually succeeds (within your SLA) or we don't charge you.

3. Built-in Quality Gates (The "Accuracy" Constraint): This is the feature I’m most excited about. You can define an "eval" config in the request (using LLM-as-a-Judge or python scripts). If the output doesn't meet your criteria, our system automatically feeds the failure back into the model and retries it. This moves the "validation loop" from your client code into the infrastructure.

I’d love to hear your thoughts on this approach—specifically, does moving the "retry/eval" loop into the API layer simplify your backend, or do you prefer keeping that logic client-side?

Playground: https://models.exosphere.host/

More Details: https://exosphere.host/flex-inference

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

Show HN: Runtime Fence – Kill switch for AI agents

Researchers surprised by the brain benefits of cannabis usage in adults over 40

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind