March 2026 Pricing

AI API Pricing
Calculator

Compare token costs across OpenAI GPT-5.4, Anthropic Claude, Google Gemini 3.1, and DeepSeek. Estimate your monthly API spend with batch pricing, prompt caching, and budget planning.

19+ ModelsReal-Time Estimates100% Free
Step 1

Set Your Usage

Enter your average input/output tokens per request and daily request volume.

Step 2

Compare Models

See real-time cost comparisons across all major AI providers with visual charts.

Step 3

Optimize Costs

Toggle batch pricing and cache hit rate to find the cheapest option for your workload.

Cache Hit:0%

Monthly Cost Comparison

GPT-4.1 Nano
$0.90
GPT-4o Mini
$1.35
DeepSeek V3
$2.46
Gemini 3.1 Flash Lite
$3.00
GPT-4.1 Mini
$3.60
Gemini 2.5 Flash
$4.65
DeepSeek R1
$4.94
Gemini 3 Flash
$6.00
o4 Mini
$9.90
Claude Haiku 4.5
$10.50
GPT-4.1
$18.00
Gemini 2.5 Pro
$18.75
GPT-4o
$22.50
Gemini 3.1 Pro
$24.00
GPT-5.4
$30.00
Claude Sonnet 4.6
$31.50
Claude Opus 4.6
$52.50
o3
$90.00
OpenAI
Anthropic
Google
DeepSeek
ModelDailyMonthly
GPT-4.1 Nano
$0.0300$0.9000
GPT-4o Mini
$0.0500$1.35
DeepSeek V3
$0.0800$2.46
Gemini 3.1 Flash Lite
$0.1000$3.00
GPT-4.1 Mini
$0.1200$3.60
Gemini 2.5 Flash
$0.1600$4.65
DeepSeek R1
$0.1600$4.94
Gemini 3 Flash
$0.2000$6.00
o4 Mini
$0.3300$9.90
Claude Haiku 4.5
$0.3500$10.50
GPT-4.1
$0.6000$18.00
Gemini 2.5 Pro
$0.6300$18.75
GPT-4o
$0.7500$22.50
Gemini 3.1 Pro
$0.8000$24.00
GPT-5.4
$1.00$30.00
Claude Sonnet 4.6
$1.05$31.50
Claude Opus 4.6
$1.75$52.50
o3
$3.00$90.00

Prices as of March 2026. Actual costs may vary. Check each provider's pricing page for the latest rates.

Advertisement

AI API Pricing Comparison (March 2026)

Per 1 million token pricing for major AI model providers. Prices updated March 2026.

ModelProviderInput / 1MOutput / 1M
GPT-5.4OpenAI$2.50$15.00
GPT-4.1OpenAI$2.00$8.00
GPT-4.1 MiniOpenAI$0.40$1.60
GPT-4.1 NanoOpenAI$0.10$0.40
o3OpenAI$10.00$40.00
o4 MiniOpenAI$1.10$4.40
Claude Opus 4.6Anthropic$5.00$25.00
Claude Sonnet 4.6Anthropic$3.00$15.00
Claude Haiku 4.5Anthropic$1.00$5.00
Gemini 3.1 ProGoogle$2.00$12.00
Gemini 3 FlashGoogle$0.50$3.00
Gemini 3.1 Flash LiteGoogle$0.25$1.50
Gemini 2.5 ProGoogle$1.25$10.00
Gemini 2.5 FlashGoogle$0.30$2.50
DeepSeek V3DeepSeek$0.27$1.10
DeepSeek R1DeepSeek$0.55$2.19

Prices shown are standard API rates per 1 million tokens as of March 2026. Prompt caching and batch processing can reduce costs by 50-90%. Check each provider's pricing page for the most current rates.

Advertisement

AI API Pricing FAQ

Tokens are the fundamental units that AI language models use to process text. Roughly, 1 token equals about 4 characters or 0.75 words in English. AI providers charge per token, with separate rates for input tokens (your prompt) and output tokens (the model's response). Prices are quoted per 1 million tokens. For example, GPT-5.4 charges $2.50 per 1M input tokens and $15.00 per 1M output tokens.
As of March 2026, the cheapest AI APIs are Google's Gemini 3.1 Flash Lite at $0.25/$1.50 per 1M tokens (input/output), GPT-4.1 Nano at $0.10/$0.40, and Gemini 2.5 Flash at $0.30/$2.50. DeepSeek V3 is also very competitive at $0.27/$1.10. For the best value, use prompt caching (up to 90% savings) and batch processing (50% off).
Prompt caching stores frequently repeated parts of your prompts (like system instructions) so the API doesn't reprocess them each time. Cached input tokens cost significantly less: Anthropic offers 90% off, OpenAI's GPT-5.4 family offers 50% off (GPT-4.1 family 75% off), and Google offers 75% off. If your application sends similar prompts repeatedly, caching can dramatically reduce your API bill.
Batch API lets you submit large numbers of requests to be processed asynchronously (typically within 24 hours). In exchange for the longer wait time, providers offer 50% off both input and output token costs. Batch pricing is ideal for non-time-sensitive tasks like data processing, content generation, and bulk analysis.
To estimate monthly costs, multiply your average input tokens per request by the input price, add average output tokens times the output price, then multiply by daily request volume and 30 days. Our calculator does this automatically — just enter your tokens per request and daily volume to see costs across all models. Don't forget to factor in prompt caching and batch pricing if applicable.
GPT-5.4 ($2.50/$15.00 per 1M tokens) and Claude Sonnet 4.6 ($3.00/$15.00) are both flagship-to-mid tier models. GPT-5.4 has a larger 1.05M token context window vs. Sonnet's 200K, and is OpenAI's latest frontier model. Sonnet 4.6 is known for strong coding and reasoning. For cost-sensitive applications, GPT-4.1 ($2.00/$8.00) offers better value; for tasks requiring nuanced reasoning, Sonnet 4.6 may justify the premium.
OpenAI's o-series models (o3, o4-mini) use internal 'reasoning tokens' — the model thinks step-by-step before generating its final response. These reasoning tokens are billed as output tokens but aren't visible in the API response. This means the actual cost of o-series requests can be higher than expected, especially for complex reasoning tasks. Our calculator uses the standard output rates; your actual costs may be higher due to hidden reasoning tokens.
Our token counter uses an approximation of ~1 token per 4 characters, which is accurate within 10-15% for English text. Different models use different tokenizers (BPE, SentencePiece, etc.), so exact counts vary slightly. For precise token counts, use the official tokenizer tools from each provider. Our estimates are reliable enough for cost planning and budgeting purposes.

5 Ways to Reduce Your AI API Costs

Practical strategies to optimize your AI spending without sacrificing quality.

1. Use Prompt Caching

Cache your system prompts and repeated context. Anthropic offers 90% off cached tokens, OpenAI's GPT-4.1 offers 75% off, and Google offers 75% off. If your system prompt is 2,000 tokens and you make 10,000 requests/day, caching saves $15-50/month on input tokens alone.

2. Batch Non-Urgent Requests

Use the Batch API for tasks that don't need real-time responses (data analysis, content generation, classification). You get 50% off both input and output tokens. Submit requests and get results within 24 hours.

3. Choose the Right Model

Don't use GPT-5.4 for tasks that GPT-4.1 Nano can handle. Start with the cheapest model and only upgrade if quality is insufficient. For simple classification or extraction, budget models are often 95%+ as accurate at 1/20th the cost.

4. Optimize Your Prompts

Shorter prompts = fewer input tokens = lower costs. Remove filler words, use concise instructions, and avoid repeating context. A well-crafted 500-token prompt often outperforms a verbose 2,000-token one.

5. Set Max Output Tokens

Always set a max_tokens limit on your API calls. Without it, the model might generate unnecessarily long responses. For structured outputs (JSON, short answers), setting a tight limit prevents wasted output tokens.