The 60-second answer

Azure Cosmos DB pricing is unlike any other Azure service: you pay for throughput (Request Units per second, RU/s) and stored data, not for compute. Three commercial models compete — manual provisioned, autoscale, and serverless — and the wrong choice routinely doubles the bill. The four levers that cut Cosmos DB spend 30–50%: right-size RU/s against actual workload percentiles (most enterprises over-provision by 2–3x), switch low-volume containers to serverless, use autoscale for spiky workloads, and audit multi-region write configuration (every additional write region multiplies RU cost by 1.25). Reserved Capacity for predictable workloads adds another 20–65% on top.

How Cosmos DB actually bills

Azure Cosmos DB pricing separates throughput from storage from data movement. You pay three line items:

  • Throughput (RU/s): the primary cost. A Request Unit is the abstracted cost of a 1 KB document read. Writes cost ~5 RU, queries cost 5–50 RU depending on selectivity. You provision RU/s per container (or per database for shared throughput).
  • Storage: per-GB-month on transactional storage and analytical storage (analytical store is cheaper per GB but billed separately).
  • Cross-region replication and backup: per-GB egress when data replicates between regions, plus periodic-backup storage cost for retention beyond the free tier.

The model rewards predictable throughput and punishes burst-without-design. That single property explains most Cosmos DB overspend.

Provisioned vs autoscale vs serverless

The three throughput models differ in cost ceiling and floor:

ModelBills onBest forTrap
Manual provisionedProvisioned RU/s, 24/7Steady high-throughput workloads where consumption is predictableOver-provisioning. Most teams set headroom against P99 spikes and pay for it 100% of the time.
AutoscaleMax of (provisioned, 10% of max, actual) per hourSpiky workloads with predictable peaks50% per-RU premium vs manual; only worthwhile if your low-usage hours are below 10% of peak.
ServerlessPer million RUs consumedDev/test, low-volume production, intermittent workloads under 5K RU/s sustainedPer-RU cost is the highest of the three; runaway queries cause bill shock; container-size cap of 1 TB.

The general rule: workloads averaging >60% of peak run cheapest on manual provisioned; spiky workloads with deep troughs run cheapest on autoscale; intermittent or unpredictable workloads under ~5 million RUs/day run cheapest on serverless. The crossover math is workload-specific; benchmark with a 30-day production sample before locking in.

RU/s right-sizing: the biggest single lever

Most Cosmos DB containers are over-provisioned by 2–3x. The pattern: a workload was sized at launch against a generous estimate, the application scaled differently than projected, and the RU/s setting was never revisited because "it's working." Each unused 1,000 RU/s costs roughly $58/month per region in manual mode — multiply by container count and region count and the leak compounds.

The audit procedure: pull 60 days of normalised RU consumption metrics from Azure Monitor for each container; compute the P95 and P99; set provisioned RU/s to P99 + 10% headroom (manual) or P95 (autoscale, where the burst capacity covers the gap). For workloads with P50 well below P99, evaluate switching to autoscale even at the 50% per-RU premium — the trough savings often exceed the peak premium.

The Microsoft commercial bias

The Cosmos DB sales pitch defaults to manual provisioned at high RU/s for "guaranteed performance." This is convenient for predictable Azure invoice line items but punitive for workloads that don't need it. Microsoft account teams rarely volunteer the autoscale or serverless option because both reduce committed consumption. The buyer's posture: instrument first, choose the throughput model from data, and revisit at every MACC review.

Right-size your Cosmos DB estate
RU/s audit, throughput-model selection, multi-region review, Reserved Capacity analysis. Typical reduction 30–50% with no SLA change.
Book the Audit

Multi-region writes: the silent multiplier

Cosmos DB multi-region writes multiply throughput cost by 1.25 per additional write region beyond the first. A workload running at 50,000 RU/s in one region costs 1x; the same workload with three write regions costs 1x × 3 × 1.25 = 3.75x. Most enterprise Cosmos DB deployments don't need multi-region writes — they were configured during platform setup "for resilience" and never reviewed against the actual write SLA requirement.

The audit question: which containers genuinely require <10 ms write latency from multiple geographies versus which would tolerate <100 ms cross-region replication of writes from a single primary? In most enterprises 80%+ of containers can move to single-write-region with read replicas. The annual saving on a moderate Cosmos DB estate runs into six figures.

Reserved Capacity for the predictable base

Once a workload's steady-state RU/s is established and stable, Cosmos DB Reserved Capacity (1-year or 3-year) delivers 20% (1-year) or 65% (3-year) off the on-demand rate. Reservations apply globally across regions and across containers in the subscription — you reserve a quantity of RU/s, not a specific resource. That portability makes Reserved Capacity attractive even for shifting workloads, provided the aggregate floor stays above the reservation level.

The decision framework: reserve only what you are confident the aggregate Cosmos DB estate will consume continuously for the reservation term. Stack reservations conservatively (60–70% of P50 consumption) and leave the variable headroom on pay-as-you-go. The reservation downside is loss of negotiation leverage on Microsoft commitment if you commit too much to RIs; coordinate with the broader MACC commitment structure.

Anonymised case study: $620K Cosmos DB reduction

A financial-services platform ran 47 Cosmos DB containers across four Azure regions on manual provisioned throughput, total $1.8M/year. The audit found: aggregate RU/s provisioning at 3.1x P99 consumption (sized at launch, never revisited); seven containers configured for multi-region writes that only needed multi-region reads; one container with a runaway analytical query consuming 18% of total RU; zero Reserved Capacity despite 14 months of stable consumption history. Remediation: RU/s right-sized to P99 + 10%; six containers moved from multi-write to single-write with read replicas; analytical query rewritten to use the analytical store rather than the transactional store; 3-year Reserved Capacity purchased against the steady-state floor. Annual saving: $620K (34% of prior spend). The client now monitors RU consumption as a standing line item in monthly FinOps review.

$620K
Annual Cosmos DB reduction from RU/s right-sizing, multi-region rationalisation, query optimisation, and 3-year Reserved Capacity.

The Microsoft Licensing Briefing — 3 minutes, every Friday

Independent analysis of Microsoft commercial moves, with implications for your EA and Azure commit. No vendor spin.

No spam. Unsubscribe any time.

Where to take this from here

Cosmos DB is one of the most over-provisioned services on the average enterprise Azure invoice. Sequence the work: RU/s right-sizing first (largest single lever); multi-region write audit second; throughput-model selection per container third; Reserved Capacity layer last. Pair with Azure Savings Plans vs Reserved Instances for the broader commit posture, Azure Monitor pricing for the observability that makes RU sizing possible, and Azure SQL DTU vs vCore if you are evaluating a relational alternative for analytical-heavy workloads. For commitment design, the MACC explainer. For renewal leverage, the EA tier collapse 2026 playbook — Azure consumption growth is the lever that offsets EA tier-collapse exposure. For end-to-end support, our Azure & MACC Advisory covers database services as part of total Azure cost discipline. Request a discovery call to benchmark.