The problem
Most SLO tools treat a user journey as a simple chain: if any service fails, the whole thing fails. But real traffic doesn't work that way. In a checkout flow, 90% of users might skip the coupon service entirely. If the coupon service has 99.5% availability, does that really pull your checkout SLO down to 99.5%? No — because most of your users never touch it.
The model
WEIGHTED_ROUTES lets you describe which percentage of traffic flows through which service chain. Each chain is an implicit AND (all services in the chain must succeed). The composed error rate is:
e_total = 1 - Σ( weight_i × Π(1 - e_j) ) For the checkout example (90% skip coupon, 10% use it):
e_total = 1 - ( 0.9 × (1 - e_base) × (1 - e_payments) + 0.1 × (1 - e_base) × (1 - e_coupon) × (1 - e_payments) ) SLOK translates this formula directly into Prometheus recording rules wired into the standard multi-window burn rate pipeline.
The YAML
kind: SLOComposition spec: target: 99.9 window: 30d objectives: - name: base ref: { name: checkout-base-slo } - name: payments ref: { name: payments-slo } - name: coupon ref: { name: coupon-slo } composition: type: WEIGHTED_ROUTES params: routes: - name: no-coupon weight: 0.9 chain: [base, payments] - name: with-coupon weight: 0.1 chain: [base, coupon, payments] alerting: burnRateAlerts: enabled: true The operator generates the PrometheusRules automatically. You get burn rate alerts on the composed SLO, not just on individual services.
Other things SLOK does
AND_MIN composition (worst-case across services) Built-in SLI templates for http-availability, http-latency, kubernetes-apiserver Automatic error budget tracking exposed in .status Event correlation: when a burn rate spike is detected, SLOK creates an SLOCorrelation resource listing recent Deployments, ConfigMap changes, and cluster events that may have caused it — with an optional LLM-enhanced summary (Llama 3.3 70B via Groq) WEIGHTED_ROUTES is alpha. Feedback on the API shape is welcome.