The 60-second answer

Azure AI services pricing spans three commercial models that look adjacent but bill very differently: the legacy Cognitive Services bundle (Vision, Speech, Language, Decision — pay-as-you-go per-transaction), Azure OpenAI Service (per-token model pricing with optional Provisioned Throughput Units), and Azure AI Foundry / agent-orchestration services (per-feature licence layered on top of model consumption). The cost trap: enterprises buy Azure OpenAI thinking they have committed to AI spend, then incur three or four times that figure across Cognitive Services, Document Intelligence, Speech, and Translator that engineering teams provisioned alongside. The four levers: rationalise the Cognitive Services SKU mix, choose between pay-as-you-go and Provisioned Throughput Units (PTUs) per workload, capture the right commitment tier on PTUs, and audit the OpenAI region selection (Sweden Central and West US 3 are often dramatically cheaper than East US).

The three commercial models

Azure AI services pricing is not one price book. It is three.

  • Cognitive Services / Azure AI services: per-transaction pricing on Vision (Computer Vision, Custom Vision, Face), Speech (STT, TTS, Translation), Language (Sentiment, Entity, NER, Summarisation), Document Intelligence (Form Recogniser), and Decision (Anomaly Detector, Personaliser). Free tier per resource for prototyping; pay-as-you-go above the free quota.
  • Azure OpenAI Service: per-token pricing on GPT-4, GPT-4 Turbo, GPT-4o, o1, Claude (Sonnet), embeddings, DALL-E. Optional Provisioned Throughput Units (PTUs) for dedicated capacity.
  • Azure AI Foundry / Agent Service: per-feature licensing on top of underlying model consumption (Agent Service, evaluations, Content Safety, prompt flow runtime).

The structural mistake we see: enterprises focus on OpenAI cost negotiation because it's visible and named, while Cognitive Services and AI Foundry components quietly compound to half the total AI bill.

Cognitive Services SKU rationalisation

Each Cognitive Services SKU prices independently. Common waste patterns:

  • Computer Vision F0 (free) deployed in dev where S1 is on, both billing.
  • Form Recogniser / Document Intelligence on the standard tier for batch workloads where the commitment-tier discount of 15–30% would apply.
  • Speech S0 consuming transactions when a custom container deployment on existing AKS would amortise the cost over a fixed monthly fee.
  • Translator pay-as-you-go where Translator with monthly commitment tier is cheaper at the actual transaction volume.

The right posture: audit each AI SKU's actual monthly transaction count, map to the commitment tier that minimises cost per transaction, switch the resource configuration. Most rationalisations save 15–30% of Cognitive Services spend with no behaviour change.

Azure OpenAI Service: pay-as-you-go vs PTUs

Azure OpenAI prices two ways:

  • Pay-as-you-go (token-based): per-1K-token rate for input and output, varies by model. GPT-4o input $2.50/1M tokens, output $10/1M tokens; GPT-4 Turbo input $10/1M, output $30/1M; o1 input $15/1M, output $60/1M (illustrative, rates evolve).
  • Provisioned Throughput Units (PTUs): dedicated capacity reserved per model deployment. Hourly per-PTU rate, available in monthly or annual commitment tiers (~15% / ~30% discount).

The crossover point: workloads with sustained >~50K tokens/minute throughput on a single model are typically cheaper on PTUs than pay-as-you-go. Lower-volume workloads stay on token-based.

The Microsoft commercial bias

The Microsoft sales narrative around Azure OpenAI emphasises PTU commitments for "enterprise-grade" deployments. PTUs are right for sustained high-throughput workloads but expensive for spiky or experimental use. Most enterprise AI workloads through 2026 are still in experimentation; pay-as-you-go is the right answer until the workload is proven and predictable. We routinely see PTU commitments sized 2–3x actual consumption because the account team pushed for "confidence" in the commitment. Audit actual consumption monthly; downsize PTUs at the next reservation window.

Audit your Azure AI services posture
Cognitive Services SKU rationalisation, OpenAI PTU sizing, region choice, AI Foundry feature audit. Independent advisory.
Book the Audit

Region selection — the hidden lever

Azure OpenAI pricing varies by region. East US (where most enterprises default) is typically the highest-priced region. Sweden Central, West US 3, and Switzerland North have historically offered the same models at 5–15% lower per-token rates. Where data residency permits, deploying the model in a cheaper region and routing API calls cross-region (with the egress cost factored in) is often net cheaper than the default East US deployment. Run the math: token volume * regional price differential * 12 months versus cross-region egress on prompt/response size. For high-volume workloads, the regional choice is worth $50K–$200K annually.

AI Foundry / Agent Service feature audit

Azure AI Foundry layers on additional per-feature pricing:

  • Agent Service: per-agent-hour for orchestrated agents (separate from underlying model token cost).
  • Content Safety: per-1K-records moderated.
  • Evaluations: per-evaluation transaction.
  • Prompt flow runtime: per-instance-hour for managed runtime.

Engineering teams turn each feature on independently. The bill compounds. Audit each AI Foundry resource for actual usage versus configuration; we routinely find 30–50% of AI Foundry surface unused but billed.

Anonymised case study: $410K AI services reduction

A financial services client had $1.4M annual Azure AI spend across 18 Cognitive Services resources, three Azure OpenAI deployments (all East US, all on PTUs), and 12 AI Foundry agents. The audit: oversized PTU commitment by 2.2x actual consumption (Microsoft account team sized for "burst capacity"); 4 Cognitive Services resources duplicated across teams (no shared SKU policy); region selection defaulting to East US where Sweden Central was 11% cheaper and data residency permitted; 7 AI Foundry agents provisioned but unused. Remediation: PTU downsize to 1.05x actual; consolidate duplicate Cognitive Services; redeploy primary OpenAI to Sweden Central; decommission unused AI Foundry agents. Annual saving: $410K, no application behaviour change. The client now routes Copilot Studio AI through the same disciplined posture.

$410K
Annual Azure AI services reduction from PTU right-sizing, Cognitive Services consolidation, region rationalisation, and AI Foundry feature audit.

The Microsoft Licensing Briefing — 3 minutes, every Friday

Independent analysis of Microsoft commercial moves, with implications for your EA and Azure commit. No vendor spin.

No spam. Unsubscribe any time.

Where to take this from here

Azure AI is the fastest-growing line item on enterprise Azure invoices — and the one with the least mature governance. Sequence the work: PTU right-sizing first (largest single saving), region selection second, Cognitive Services rationalisation third, AI Foundry feature audit fourth. Pair with Copilot security and compliance for the policy plane, AI integration licensing for the broader Microsoft AI estate, and Copilot Studio licensing for agent-side billing. For commitment design, MACC explainer covers how OpenAI consumption flows through MACC. For renewal leverage, the EA tier collapse 2026 playbook. For end-to-end advisory, our Azure & MACC Advisory covers AI as part of broader Azure cost discipline. Request a discovery call to benchmark.