Skip to content
← AI Tools

LLM Comparison Table

Side-by-side reference for current Claude, GPT, Gemini, and Llama models — context windows, knowledge cutoffs, pricing, capabilities. Filter and sort to pick the right model for a task.

LLM Comparison Table

Side-by-side reference for current Claude, GPT, Gemini, and Llama models — context windows, knowledge cutoffs, pricing, capabilities. Filter and sort to pick the right model for a task.

Why a Static Reference, Updated Manually

Model specs change. Pricing changes. New tiers and deprecations land every few weeks. A live API-scraping comparison would always lag (because the lag is on the model providers, who do not all publish machine-readable change logs) and would require an account for the queries. A manually-curated table updated on a known cadence, with footer date showing exactly when it was last refreshed, is more honest. This tool lists current public-facing specs across the four major model families, with a filter row to narrow by capability (vision, code, long context, fast inference) and sort by any column. Last update date is shown at the bottom. Pair with the AI Token Counter for the cost-per-paste version of the same question.

How to Read the Table

Every row is a current production model from one of four families: Anthropic Claude, OpenAI GPT, Google Gemini, and Meta Llama. Columns cover the specs that most often drive a model choice: input context window in tokens, output token cap, knowledge cutoff date, input price per million tokens, output price per million tokens, and capability tags (vision, code, agentic, fast). Click any column header to sort. The filter row at the top hides rows that lack a tagged capability — checking 'vision' removes text-only models, checking 'long context' removes anything under 100K. Pricing is in US dollars at list rates as of the last update date shown at the bottom of the table; enterprise contracts and batch discounts can be lower.

See also: in-browser alternatives that bypass per-token costs entirely — the AI Summarizer covers summarization, the AI Paraphraser covers rewriting, and the AI Translator covers translation.

Frequently Asked Questions

How often is the table updated?+
The footer date shows the last refresh. Refreshes happen on a roughly monthly cadence and immediately after any major model launch. If you spot a stale entry, the feedback link at the bottom of the page goes straight to the maintainer.
Why is pricing listed at list rates only?+
Enterprise and committed-use rates vary too widely to summarize. The list rate is the public floor; if you have negotiated rates, they will be lower. The list rate is the right number for first-pass cost modeling.
What if a model has multiple context window options?+
Each tier is its own row. A 32K and a 128K version of the same base model show up separately because pricing and behavior differ.
How is the knowledge cutoff defined?+
It is the date publicly disclosed by the model provider for the training data freshness. Some providers state a specific date; others state a month. The table uses whatever the provider has published.
Are capability tags self-assessed?+
Vision means the model accepts image inputs in its standard API. Code means the provider markets it for coding workflows. Agentic means it has been released with tool-use or computer-use features. Fast means the provider sells it as a low-latency or high-throughput variant. The tags are descriptive, not benchmarked.
Is there a benchmark column?+
No — public benchmarks are inconsistent across providers and become stale within weeks. For task-specific choice, run your own evaluation. The table covers the structural facts that do not depend on benchmark methodology.
Why is Llama in here if it is not a managed API?+
Llama models are widely used via Bedrock, Together, Fireworks, and self-hosting. Pricing reflects the cheapest mainstream managed-API rate; self-hosted costs vary by hardware. Included because it is part of the practical model-choice landscape.
What about open models from Mistral, Qwen, Cohere, or others?+
Future updates may expand the table. The current four families cover the bulk of production traffic; adding more rows is balanced against keeping the table scannable. Open to expansion if specific gaps recur in feedback.

Built by Derek Giordano · Part of Ultimate Design Tools

Privacy Policy · Terms of Service