The 60-second answer

Azure Spot Instances let you bid on Microsoft's unused compute capacity at 60–90% off pay-as-you-go — with the catch that Azure can evict your VM with 30 seconds' notice when it needs the capacity back. For batch processing, CI/CD runners, dev/test, training jobs, and fault-tolerant containerised workloads, Spot is a structural cost lever most enterprises under-use. For production databases, identity services, anything stateful, or anything time-critical, Spot is the wrong instrument. The azure spot instances guide question that matters is not "should we use Spot?" but "which 8–15% of our compute is genuinely Spot-eligible, and what is the engineering cost to shift it there?"

How Azure Spot Instances actually work

Azure Spot VMs run on the same hardware as standard pay-as-you-go VMs — identical SKUs, identical features, identical performance. The only differences are commercial. You set a max price (or accept the prevailing Spot price) and a fixed eviction policy. Azure runs your VM at the discounted Spot rate as long as two conditions hold: the Spot price stays under your max, and Microsoft has unused capacity in that region for that VM size. When either condition breaks — price spike, capacity reclaim — Azure evicts your VM with 30 seconds of advance notice via the Instance Metadata Service.

The eviction is the whole story. If your workload can checkpoint state, save progress, and resume on a new VM, Spot is essentially free money. If it cannot, Spot is a footgun. The engineering question is whether you can decouple compute from state cleanly enough to absorb an eviction.

Spot discount structure in 2026

Spot prices float continuously by region, VM family, and size. Typical observed discounts in 2026:

  • General-purpose D-series, E-series: 65–82% off pay-as-you-go in most regions.
  • Compute-optimised F-series: 70–88% off, with higher volatility.
  • GPU instances (NC, ND, NV): 60–75% off, but eviction rates are materially higher because GPU capacity is constrained.
  • Memory-optimised M-series: variable; in some regions Spot capacity is rare and discounts shrink to 40–50%.

The discount is not the only number that matters. Eviction rate matters more. Azure publishes an eviction rate band per VM size per region in the Spot Pricing API. A 90% discount with a 25% monthly eviction rate may be worse than a 65% discount with a 3% eviction rate, once you account for restart overhead, partial work loss, and orchestration complexity.

The Microsoft commercial bias

Microsoft sales does not push Spot. Spot does not absorb against MACC the same way committed compute does — Spot consumption counts toward MACC but the deeper Spot discount means a given workload contributes less dollar burn to the commitment. From Microsoft's revenue perspective, the ideal customer runs all compute on Savings Plans. From yours, the ideal customer runs Spot wherever it is structurally viable. Treat the absence of a Microsoft Spot recommendation as the signal it is.

Workloads where Spot wins

Five workload archetypes are structurally Spot-eligible. If any of these describe meaningful compute in your tenant and you are not running them on Spot, you are paying 3–5x more than necessary.

WorkloadWhy it fits SpotTypical savings
Batch processingCheckpointing native; restart cheap; throughput > latency75–85%
CI/CD build agentsBuilds are idempotent; retries are normal; latency tolerant70–82%
ML training jobsCheckpoint to storage; resume from checkpoint; cost-dominant60–75%
Dev/test environmentsDisposable; lower SLA expectations; off-hours acceptable70–80%
Stateless container fleets on AKSKubernetes handles eviction natively via node pools65–80%

The pattern across all five: state lives somewhere other than the VM, and the workload tolerates a restart. Anything stateful on the VM disk, or anything with hard latency SLAs, falls outside the pattern.

Map your Spot-eligible compute footprint
We profile your tenant against the five Spot-eligible patterns and model the savings — including the engineering investment needed to absorb evictions cleanly.
Book the Analysis

AKS Spot node pools — the highest-leverage entry point

For tenants running AKS at meaningful scale, Spot node pools are the highest-leverage Spot adoption point. AKS lets you provision a Spot node pool alongside your on-demand pool; Kubernetes schedules eviction-tolerant pods (annotated with the Spot toleration) onto the Spot pool, and falls back to the on-demand pool if Spot capacity is unavailable. The integration is operationally low-cost because Kubernetes already handles pod rescheduling on node loss — Spot eviction looks identical to a node failure from the orchestrator's perspective.

A typical AKS-heavy tenant captures 22–38% reduction in cluster compute cost by shifting stateless workloads (web fronts, batch consumers, ML inference at low criticality) onto Spot node pools. The ratio of Spot-to-OnDemand capacity should be tuned to the eviction rate — we typically start at 60% Spot / 40% OnDemand and adjust based on observed eviction patterns over the first 90 days.

Three Spot pitfalls that look like savings and are not

  1. Using Spot for capacity that is structurally short. Some VM sizes in some regions have Spot eviction rates above 20% per month. The headline discount looks great; the actual cost (including engineering overhead, work loss, retries) is higher than running pay-as-you-go. Check the eviction rate band before committing engineering effort.
  2. Building Spot orchestration in-house when AKS or Batch already does it. Custom Spot orchestration on raw VMs is rarely worth the engineering investment unless you have very specific scheduling requirements. Use AKS, Azure Batch, or Spot VM Scale Sets first.
  3. Ignoring the data egress cost. If your Spot workload writes a lot to durable storage after each eviction-recovery cycle, egress and transaction costs can eat the Spot savings. Architect Spot workloads to minimise post-eviction state rebuilding.

Anonymised case study: 2,400-node ML training fleet

A genomics research client running a 2,400-node ML training fleet on Azure ND-series GPU instances was paying $11.4M per year at pay-as-you-go. The training jobs already checkpointed to durable storage every 600 seconds, so the workload was structurally Spot-eligible. We migrated 1,800 nodes (75% of the fleet) to Spot, kept 600 on pay-as-you-go for time-critical experiments. Net result: $6.2M annual savings, $410K added engineering investment in the first year to harden the orchestration around evictions, ROI of 15x in year one. Observed eviction rate stabilised at 4.1% per month against a published band of 5–10%, well within tolerable.

$6.2M
Annual savings on a single Spot migration in a workload Microsoft would never have recommended for Spot — because the recommendation works against their revenue model.

Spot adoption checklist

  1. Inventory eviction-tolerant workloads. Batch, CI/CD, dev/test, ML training, stateless containerised services.
  2. Profile the eviction rate band for your target VM size in your target region. Anything above 15% monthly is borderline.
  3. Pick the right orchestration layer. AKS Spot node pools, Azure Batch, or Spot VM Scale Sets. Avoid custom orchestration unless your workload demands it.
  4. Engineer for eviction. 30-second graceful shutdown handlers, frequent checkpoints, idempotent retries, state externalised.
  5. Set a max-price ceiling rather than accepting any Spot price. This caps your exposure to price spikes.
  6. Monitor eviction rate as a first-class metric. If your observed rate drifts above the published band, reduce Spot ratio.
  7. Track MACC inclusion. Spot consumption counts toward your MACC but at the discounted rate — size your commitment accordingly.

Done right, Spot is the highest-leverage cost lever in the Azure compute stack for the right workloads. Done wrong — on the wrong workloads, in capacity-constrained regions, or without eviction handling — it can be more expensive than the alternative. For tenants doing serious Azure cost work, mapping the Spot-eligible footprint should be near the top of the list. See the complete Azure cost optimisation guide for where Spot fits alongside RIs, Savings Plans, AHB, and rightsizing. For renewal context, the EA tier collapse playbook shows how a strong Spot posture changes your commitment posture and your negotiation position.