Master Pricing Table — All Models, May 2026
Prices sourced directly from official API pricing pages and verified May 9, 2026. Prices are per 1 million tokens unless noted. Check source links before budgeting — providers adjust prices without notice.
| Provider / Model | Input ($/1M tokens) | Output ($/1M tokens) | Context Window | Batch Discount | Best For |
|---|---|---|---|---|---|
| 🟢 Gemini 2.0 Flash-Lite | $0.10 | $0.40 | 1M tokens | 50% | High-volume classification |
| Gemini 2.0 Flash | $0.30 | $2.50 | 1M tokens | 50% | Production workloads |
| Gemini 2.5 Pro | $1.25–$2.50 | $10.00–$15.00 | 1M tokens | 50% | Long-doc reasoning |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K tokens | 50% | Fast, cheap completions |
| OpenAI o3 | $2.00 | $8.00 | 128K tokens | 50% | Math, code, reasoning |
| OpenAI GPT-4o | $2.50 | $10.00 | 128K tokens | 50% | General multimodal |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K tokens | 50% | Coding, analysis |
| Claude Opus 4.6 | $5.00 | $25.00 | 200K tokens | 50% | Complex agentic tasks |
Sources: openai.com/api/pricing · platform.claude.com/docs/pricing · ai.google.dev/gemini-api/docs/pricing — all accessed May 9, 2026.
Provider Deep Dives
OpenAI's API remains the most widely integrated provider by volume of third-party tools and SDKs. The GPT-4o model at $2.50/$10.00 per million input/output tokens serves as their production-grade general model. The o3 reasoning model at $2.00/$8.00 is their most capable reasoning model, optimized for STEM tasks, code generation, and multi-step problem solving.
OpenAI's Batch API delivers 50% off standard pricing for jobs processed within 24 hours — effective for embeddings, classification, and document processing at scale. Cached input pricing cuts repeat-context costs significantly for long system prompt patterns.
Source: openai.com/api/pricing, accessed May 9, 2026.
Anthropic's Claude models skew toward coding, analysis, and long-context reasoning. Claude Sonnet 4.6 at $3.00/$15.00 per million tokens is the recommended production model for complex tasks. The Haiku 4.5 tier at $1.00/$5.00 covers high-frequency, lower-complexity calls.
Claude's distinctive advantage is its 200K token context window — useful for processing large codebases, legal contracts, or research documents in a single call. Anthropic's prompt caching charges approximately 10% of standard input rate for cache reads (first write at 1.25× standard to populate the cache). This makes Claude especially efficient for applications where the same large system prompt appears in most requests.
The Batch API offers 50% off for 24-hour processing windows. Claude Opus 4.6 at $5.00/$25.00 is reserved for complex agentic tasks requiring extended reasoning and tool use.
Source: platform.claude.com/docs/en/about-claude/pricing, accessed May 9, 2026.
Google's Gemini API offers the most aggressive pricing in the market, particularly for high-volume workloads. Gemini 2.0 Flash-Lite at $0.10/$0.40 per million tokens is the cheapest production-grade model from any major provider. The Flash variant at $0.30/$2.50 balances cost and capability for standard production use cases.
Gemini's headline differentiator is its 1 million token context window on Flash and Pro models — 5× Claude, 8× GPT-4o. This matters for legal document review, code repository analysis, and long research synthesis where competitors require chunking or retrieval pipelines. Gemini 2.5 Pro at $1.25–$2.50/$10.00–$15.00 handles the demanding end.
Context caching on Gemini saves up to 90% on cached tokens. Batch processing offers 50% off. Free tier available for development with rate limits via ai.google.dev.
Source: ai.google.dev/gemini-api/docs/pricing · Gemini API pricing docs, accessed May 9, 2026.
Cost Optimization Strategies
| Strategy | Typical Savings | Best For | Provider Support |
|---|---|---|---|
| Batch API | 50% off standard | Embeddings, classification, bulk processing | OpenAI, Anthropic, Google |
| Prompt caching | ~80-90% on cached tokens | Apps with large static system prompts | Anthropic (native), OpenAI (cached input) |
| Model tiering | 60-90% vs. flagship | Routing simple queries to cheaper models | All providers |
| Context caching | Up to 90% | Repeat long-context calls | Google Gemini |
| Free tier (dev) | 100% up to rate limits | Development, prototyping | Google (Gemini API), OpenAI (trial) |
A common cost optimization pattern: route classification and extraction tasks to Gemini Flash-Lite ($0.10/M), send medium-complexity completions to Claude Haiku or GPT-4o, and reserve Claude Sonnet or Opus for high-stakes reasoning tasks. Well-implemented model routing cuts API spend 60–80% vs. sending everything to a flagship model. The tradeoff is latency uniformity and prompt engineering overhead per tier.
How to Choose: Decision Framework
| Your Use Case | Recommended Model | Why |
|---|---|---|
| High-volume classification / extraction | Gemini 2.0 Flash-Lite | Lowest cost ($0.10/M in), 1M context, batch 50% off |
| Production chat / RAG application | Gemini 2.0 Flash or Claude Haiku 4.5 | Balance of cost and quality; both under $5/M output |
| Code generation / analysis | Claude Sonnet 4.6 | Best coding benchmarks; 200K context for large repos |
| STEM reasoning / math / logic | OpenAI o3 | Specialized reasoning model; $2/$8 per million tokens |
| Long document processing (100K+ tokens) | Gemini 2.5 Pro | 1M token context window; no chunking required |
| Multi-step agentic workflows | Claude Opus 4.6 | Best tool use and extended reasoning; 200K context |
| Embeddings at scale | OpenAI text-embedding-3-small | $0.02/M tokens; widest ecosystem support |
Pricing History Note
AI API prices have fallen dramatically since 2023. GPT-3.5-turbo in early 2023 cost ~$2.00/M input tokens; models of equivalent quality today cost $0.10–$0.30/M. The trend is continued price compression driven by hardware efficiency gains and competition. Expect the pricing table above to shift within 60–90 days of publication.
This page is updated monthly. Check the "Last verified" date in the header before using these figures for production budgeting. For real-time pricing, use the official links for each provider cited above.