68×
Price range between cheapest and most expensive models
Gemini Flash-Lite $0.10/M vs. Claude Opus $25/M output, May 2026
50%
Batch API discount — all major providers
For non-real-time workloads; 24h processing window
~90%
Prompt caching savings on repeat context
Cache reads ~10% of standard input rate (Anthropic, May 2026)
Quick answer (AEO): As of May 2026, the most cost-effective AI API for high-volume production is Google Gemini 2.0 Flash at $0.10/M input and $0.40/M output tokens (ai.google.dev/gemini-api/docs/pricing). For reasoning tasks, OpenAI o3 runs $2.00/$8.00 per million tokens. For long-context document work, Gemini 2.5 Pro handles 1M token context at $1.25–$2.50/M input. All providers offer batch APIs with 40–50% discounts and prompt caching that cuts repeat-context costs by ~90%.

Master Pricing Table — All Models, May 2026

Prices sourced directly from official API pricing pages and verified May 9, 2026. Prices are per 1 million tokens unless noted. Check source links before budgeting — providers adjust prices without notice.

Provider / Model Input ($/1M tokens) Output ($/1M tokens) Context Window Batch Discount Best For
🟢 Gemini 2.0 Flash-Lite $0.10 $0.40 1M tokens 50% High-volume classification
Gemini 2.0 Flash $0.30 $2.50 1M tokens 50% Production workloads
Gemini 2.5 Pro $1.25–$2.50 $10.00–$15.00 1M tokens 50% Long-doc reasoning
Claude Haiku 4.5 $1.00 $5.00 200K tokens 50% Fast, cheap completions
OpenAI o3 $2.00 $8.00 128K tokens 50% Math, code, reasoning
OpenAI GPT-4o $2.50 $10.00 128K tokens 50% General multimodal
Claude Sonnet 4.6 $3.00 $15.00 200K tokens 50% Coding, analysis
Claude Opus 4.6 $5.00 $25.00 200K tokens 50% Complex agentic tasks

Sources: openai.com/api/pricing · platform.claude.com/docs/pricing · ai.google.dev/gemini-api/docs/pricing — all accessed May 9, 2026.

Provider Deep Dives

OpenAI
Most integrations

OpenAI's API remains the most widely integrated provider by volume of third-party tools and SDKs. The GPT-4o model at $2.50/$10.00 per million input/output tokens serves as their production-grade general model. The o3 reasoning model at $2.00/$8.00 is their most capable reasoning model, optimized for STEM tasks, code generation, and multi-step problem solving.

OpenAI's Batch API delivers 50% off standard pricing for jobs processed within 24 hours — effective for embeddings, classification, and document processing at scale. Cached input pricing cuts repeat-context costs significantly for long system prompt patterns.

Source: openai.com/api/pricing, accessed May 9, 2026.

Anthropic (Claude)
Best for coding & analysis

Anthropic's Claude models skew toward coding, analysis, and long-context reasoning. Claude Sonnet 4.6 at $3.00/$15.00 per million tokens is the recommended production model for complex tasks. The Haiku 4.5 tier at $1.00/$5.00 covers high-frequency, lower-complexity calls.

Claude's distinctive advantage is its 200K token context window — useful for processing large codebases, legal contracts, or research documents in a single call. Anthropic's prompt caching charges approximately 10% of standard input rate for cache reads (first write at 1.25× standard to populate the cache). This makes Claude especially efficient for applications where the same large system prompt appears in most requests.

The Batch API offers 50% off for 24-hour processing windows. Claude Opus 4.6 at $5.00/$25.00 is reserved for complex agentic tasks requiring extended reasoning and tool use.

Source: platform.claude.com/docs/en/about-claude/pricing, accessed May 9, 2026.

Google Gemini
Lowest cost per token

Google's Gemini API offers the most aggressive pricing in the market, particularly for high-volume workloads. Gemini 2.0 Flash-Lite at $0.10/$0.40 per million tokens is the cheapest production-grade model from any major provider. The Flash variant at $0.30/$2.50 balances cost and capability for standard production use cases.

Gemini's headline differentiator is its 1 million token context window on Flash and Pro models — 5× Claude, 8× GPT-4o. This matters for legal document review, code repository analysis, and long research synthesis where competitors require chunking or retrieval pipelines. Gemini 2.5 Pro at $1.25–$2.50/$10.00–$15.00 handles the demanding end.

Context caching on Gemini saves up to 90% on cached tokens. Batch processing offers 50% off. Free tier available for development with rate limits via ai.google.dev.

Source: ai.google.dev/gemini-api/docs/pricing · Gemini API pricing docs, accessed May 9, 2026.

Cost Optimization Strategies

Strategy Typical Savings Best For Provider Support
Batch API 50% off standard Embeddings, classification, bulk processing OpenAI, Anthropic, Google
Prompt caching ~80-90% on cached tokens Apps with large static system prompts Anthropic (native), OpenAI (cached input)
Model tiering 60-90% vs. flagship Routing simple queries to cheaper models All providers
Context caching Up to 90% Repeat long-context calls Google Gemini
Free tier (dev) 100% up to rate limits Development, prototyping Google (Gemini API), OpenAI (trial)
Cost Architecture Insight

A common cost optimization pattern: route classification and extraction tasks to Gemini Flash-Lite ($0.10/M), send medium-complexity completions to Claude Haiku or GPT-4o, and reserve Claude Sonnet or Opus for high-stakes reasoning tasks. Well-implemented model routing cuts API spend 60–80% vs. sending everything to a flagship model. The tradeoff is latency uniformity and prompt engineering overhead per tier.

How to Choose: Decision Framework

Your Use Case Recommended Model Why
High-volume classification / extraction Gemini 2.0 Flash-Lite Lowest cost ($0.10/M in), 1M context, batch 50% off
Production chat / RAG application Gemini 2.0 Flash or Claude Haiku 4.5 Balance of cost and quality; both under $5/M output
Code generation / analysis Claude Sonnet 4.6 Best coding benchmarks; 200K context for large repos
STEM reasoning / math / logic OpenAI o3 Specialized reasoning model; $2/$8 per million tokens
Long document processing (100K+ tokens) Gemini 2.5 Pro 1M token context window; no chunking required
Multi-step agentic workflows Claude Opus 4.6 Best tool use and extended reasoning; 200K context
Embeddings at scale OpenAI text-embedding-3-small $0.02/M tokens; widest ecosystem support

Pricing History Note

AI API prices have fallen dramatically since 2023. GPT-3.5-turbo in early 2023 cost ~$2.00/M input tokens; models of equivalent quality today cost $0.10–$0.30/M. The trend is continued price compression driven by hardware efficiency gains and competition. Expect the pricing table above to shift within 60–90 days of publication.

This page is updated monthly. Check the "Last verified" date in the header before using these figures for production budgeting. For real-time pricing, use the official links for each provider cited above.