Every model from every major AI lab — grouped by provider, with each lab's current flagship highlighted.
296 models · 14 providers
34 active models · 28 archived
GPT-5.5 is OpenAI's most advanced flagship language model, offering state-of-the-art reasoning, creativity, and instruction following. It builds on GPT-5 with enhanced efficiency, lower latency, and improved performance across coding, analysis, and multimodal tasks. As the primary model in OpenAI's lineup, it powers ChatGPT and the API with superior accuracy and speed.
Released
May 2025
Input
$2.50 per 1M tokens
Context
256K tokens
GPT-5 is OpenAI's most advanced large language model, succeeding GPT-4 with significant improvements in reasoning, creativity, and accuracy. It offers enhanced multimodal capabilities, longer context windows, and better alignment with human intent. GPT-5 is designed to handle complex tasks across various domains with state-of-the-art performance.
Input
$10.00 per 1M tokens
Context
256K tokens
GPT-4.5 is OpenAI's largest and most capable language model, designed for complex reasoning, creative tasks, and nuanced conversation. It builds on GPT-4 with improved knowledge, emotional intelligence, and reduced hallucinations, while maintaining broad general-purpose performance.
Released
February 2025
Input
$75.00 per 1M tokens
Context
128K tokens
GPT-4.1 is OpenAI's latest flagship language model, offering improved performance across coding, instruction following, and long-context tasks. It features a 1M token context window, making it ideal for processing large documents and codebases. GPT-4.1 replaces GPT-4 Turbo as the most capable model in the GPT-4 series.
Released
April 2025
Input
$2.00 per 1M tokens
Context
1,000,000 tokens
GPT-4.1 Mini is a cost-efficient, low-latency model in the GPT-4.1 family, designed for fast and affordable inference while maintaining strong reasoning capabilities. It offers a 1M token context window and supports multimodal inputs (text and images), making it suitable for high-volume, real-time applications.
Released
April 2025
Input
$0.40 per 1M tokens
Context
1M tokens
o4 Mini is a cost-efficient reasoning model from OpenAI, designed for complex multi-step tasks that require strong reasoning and coding abilities. It offers a balance of performance and speed, making it suitable for applications where deep reasoning is needed but cost is a concern. As a smaller model in the o4 family, it provides faster responses than its larger counterpart while maintaining high accuracy on benchmarks.
Released
April 2025
Input
$1.10 per 1M tokens
Context
200K tokens
GPT-4o-audio-preview is a multimodal model that extends GPT-4o with native audio input and output capabilities, enabling real-time voice interactions and audio processing. It is designed for applications requiring low-latency speech-to-speech or audio understanding, such as voice assistants and audio transcription with reasoning.
Released
October 2024
Input
$5.00 per 1M tokens
Context
128K tokens
GPT-4o-mini-realtime-preview is a cost-efficient, low-latency multimodal model optimized for real-time voice and text interactions. It supports audio streaming and function calling, making it ideal for conversational AI applications. As a smaller variant of GPT-4o, it balances performance and affordability for production use.
Released
October 2024
Input
$0.60 per 1M tokens
Context
128K tokens
GPT-4o-realtime-preview is a multimodal model from OpenAI designed for low-latency, real-time interactions, supporting text, audio, and vision inputs. It is optimized for voice conversations and live applications, offering near-instantaneous responses. This model is part of the GPT-4o family, combining advanced reasoning with real-time capabilities.
Released
October 2024
Input
$5.00 per 1M tokens
Context
128K tokens
GPT-4o-search-preview is a variant of GPT-4o that integrates web search capabilities, allowing the model to retrieve and incorporate real-time information from the internet. It is designed for tasks requiring up-to-date knowledge, such as answering questions about recent events or accessing current data. This model combines the conversational and reasoning strengths of GPT-4o with live search functionality.
Released
October 2024
Input
$5.00 per 1M tokens
Context
128K tokens
GPT-4o-mini is a cost-efficient small model in the GPT-4o family, designed for fast and affordable inference while maintaining strong reasoning and multimodal capabilities. It supports text and vision inputs, making it suitable for a wide range of applications that require high throughput and low latency.
Released
July 2024
Input
$0.15 per 1M tokens
Context
128K tokens
GPT-4o-mini-search-preview is a cost-efficient, fast model from OpenAI that integrates web search capabilities directly into the model's responses. It is designed for applications requiring up-to-date information retrieval without external search APIs, offering a balance of performance and affordability.
Released
July 2024
Input
$0.15 per 1M tokens
Context
128K tokens
GPT-4o ('omni') is OpenAI's flagship multimodal model that accepts text, image, and audio inputs and produces text, image, and audio outputs. It matches GPT-4 Turbo performance on English text and code while being significantly faster and 50% cheaper in API pricing. GPT-4o achieves state-of-the-art results on vision and multilingual benchmarks, and offers improved reasoning over non-English languages.
Released
May 2024
Input
$5.00 per 1M tokens
Context
128K tokens
GPT-4 Turbo is OpenAI's most capable large language model, offering improved performance and a larger context window compared to GPT-4. It supports vision, function calling, and JSON mode, and is optimized for cost efficiency with lower pricing than its predecessor.
Released
April 2024
Input
$10.00 per 1M tokens
Context
128K tokens
GPT-4 Turbo Preview is a preview version of OpenAI's GPT-4 Turbo model, offering improved performance and a larger context window of 128K tokens compared to GPT-4. It is designed to be more cost-effective and capable, with knowledge up to April 2023. This model is part of OpenAI's frontier tier, providing advanced reasoning and language understanding.
Released
November 2023
Input
$10.00 per 1M tokens
Context
128K tokens
GPT-4 is OpenAI's most advanced large language model, capable of solving difficult problems with greater accuracy and creativity than previous models. It introduces multimodal capabilities (text and image input) and exhibits human-level performance on various professional and academic benchmarks. GPT-4 is the flagship model in OpenAI's lineup, offering enhanced reasoning, safety, and steerability.
Released
March 2023
Input
$30.00 per 1M tokens
Context
8192 tokens
GPT-3.5-turbo is a fast and cost-effective language model optimized for conversational tasks and chat completions. It offers a balance of performance and affordability, making it suitable for a wide range of applications. As part of OpenAI's GPT-3.5 series, it provides improved instruction following and lower latency compared to earlier models.
Released
March 2023
Input
$0.50 per 1M tokens
Context
16385 tokens
GPT-3.5 is a fast and cost-effective language model from OpenAI, optimized for conversational tasks and general-purpose text generation. It powers ChatGPT and offers a balance of performance and affordability, making it suitable for a wide range of applications. As a predecessor to GPT-4, it remains widely used for its speed and lower cost.
Released
March 2022
Input
$0.50 per 1M tokens
Context
16,385 tokens
o3-mini is a cost-efficient reasoning model by OpenAI that excels in STEM tasks, particularly coding and mathematics. It offers adjustable reasoning effort (low, medium, high) to balance speed and accuracy, making it a faster and cheaper alternative to o1-mini while maintaining strong performance. It supports function calling, structured outputs, and developer messages.
Released
January 2025
Input
$1.10 per 1M tokens
Context
200K tokens
o3 is OpenAI's most advanced reasoning model, designed to spend more time thinking before responding. It excels at complex problem-solving, coding, and scientific reasoning, outperforming its predecessor o1 on challenging benchmarks. o3 is part of OpenAI's reasoning model series, offering enhanced capabilities for tasks requiring deep logical deduction.
Released
December 2024
Input
$10.00 per 1M tokens
Context
200K tokens
OpenAI's most capable embedding model, offering 3072 dimensions for high-quality text embeddings. It is designed for tasks requiring nuanced semantic understanding, such as retrieval-augmented generation (RAG) and clustering. Part of the v3 family, it balances performance and cost for large-scale embedding needs.
Released
January 2024
Input
$0.13 per 1M tokens
Context
8191 tokens
OpenAI's most efficient embedding model, optimized for speed and cost while maintaining strong performance. It is part of the third-generation embedding family, offering a smaller size for lower latency and reduced computational requirements.
Released
January 2024
Input
$0.02 per 1M tokens
Context
8191 tokens
DALL-E 3 is OpenAI's latest text-to-image generation model, capable of creating highly detailed and accurate images from natural language descriptions. It integrates natively with ChatGPT for iterative prompt refinement and offers improved composition and detail over its predecessor. The model is designed for creative and professional use, with built-in safety features to prevent harmful content.
Released
October 2023
Input
$0.040 per image (standard), $0.080 per image (HD)
Context
—
DALL-E 2 is an AI system that can create realistic images and art from a description in natural language. It offers higher resolution and lower latency than its predecessor, DALL-E, and is capable of generating images with greater fidelity and coherence. DALL-E 2 is part of OpenAI's image generation lineup, now superseded by DALL-E 3.
Released
April 2022
Input
$0.020 per image (1024x1024)
Context
—
o1-pro is a premium reasoning model from OpenAI that extends the o1 series with enhanced inference compute for more reliable and higher-quality responses. It is designed for complex tasks requiring deep reasoning, such as advanced coding, scientific analysis, and multi-step problem solving. As a paid tier of o1, it offers improved performance over the standard o1 model at a higher cost.
Released
December 2024
Input
$150.00 per 1M tokens
Context
200K tokens
o1 is a large language model by OpenAI designed for complex reasoning tasks, using chain-of-thought reasoning to solve difficult problems in science, coding, and math. It is the first model in the o-series, emphasizing deep thinking and accuracy over speed. o1 is available in preview and mini variants, with o1-preview being the full reasoning model and o1-mini a faster, cheaper option for coding and STEM.
Released
September 2024
Input
$15.00 per 1M tokens
Context
128K tokens
o1-mini is a smaller, faster, and more cost-efficient reasoning model from OpenAI, designed to excel at coding and STEM tasks. It uses chain-of-thought reasoning to solve complex problems while being significantly cheaper than the full o1 model. It is part of the o1 series, which emphasizes advanced reasoning capabilities.
Released
September 2024
Input
$3.00 per 1M tokens
Context
128K tokens
o1-preview is a large language model from OpenAI designed for complex reasoning tasks. It uses chain-of-thought reasoning to solve difficult problems in science, coding, and math. As a preview model, it offers advanced reasoning capabilities but lacks some features of GPT-4o like vision and function calling.
Released
September 2024
Input
$15.00 per 1M tokens
Context
128K tokens
TTS-1 is OpenAI's text-to-speech model that converts text into natural-sounding spoken audio. It supports multiple voices and languages, offering low-latency streaming for real-time applications. As part of OpenAI's audio API, it provides a cost-effective solution for generating high-quality speech.
Released
November 2023
Input
$15.00 per 1M characters
Context
—
TTS-1-hd is OpenAI's high-definition text-to-speech model, offering superior audio quality with more natural intonation and clarity compared to the standard TTS-1 model. It supports multiple voices and languages, making it ideal for applications requiring lifelike speech synthesis.
Released
November 2023
Input
$100.00 per 1M characters
Context
—
computer-use-preview is a specialized model from OpenAI designed to control computer interfaces by interpreting screenshots and performing actions like clicking, typing, and scrolling. It is part of the GPT-4o family and enables AI agents to interact with software applications directly, bridging the gap between language models and GUI automation.
Released
October 2024
Input
$3.00 per 1M tokens
Context
128K tokens
OpenAI's most capable moderation model, designed to detect harmful content across text and images. It is the successor to the previous moderation models, offering improved accuracy and coverage for policy-violating content. This model is optimized for real-time content moderation use cases.
Released
August 2024
Input
Free
Context
8192 tokens
OpenAI's Moderation stable model is designed to detect harmful or unsafe content in text. It is optimized for accuracy and reliability in content moderation tasks, helping developers filter inappropriate material. This model is part of OpenAI's moderation endpoint and is the stable version recommended for production use.
Released
August 2023
Input
Free
Context
8192 tokens
Whisper is a general-purpose speech recognition model by OpenAI, capable of transcribing speech in multiple languages and translating it into English. It is trained on a large dataset of diverse audio and is robust to accents, background noise, and technical jargon. Whisper is available as an open-source model and also via OpenAI's API.
Released
September 2022
Input
$0.006 per minute (audio input)
Context
—
GPT-4-1106-preview
GPT-4-vision-preview
GPT-4-0613
GPT-4-0314
GPT-4-32k
GPT-3.5-turbo-1106
GPT-3.5-turbo-instruct
GPT-3.5-turbo-0613
GPT-3.5-turbo-16k
GPT-3.5-turbo-0301
text-davinci-003
GPT-3
Codex davinci-002
Embeddings v2 ada
text-davinci-002
text-curie-001
text-babbage-001
Codex cushman-001
text-ada-001
text-davinci-001
babbage
curie
ada
InstructGPT ada
InstructGPT babbage
InstructGPT curie
InstructGPT davinci
davinci
16 active models · 14 archived
Claude Opus 4.7 is Anthropic's most powerful flagship model, offering state-of-the-art performance across reasoning, coding, and creative tasks. It introduces extended thinking capabilities and is designed for complex enterprise workloads requiring deep analysis and high accuracy.
Released
May 2025
Input
$15.00 per 1M tokens
Context
200K tokens
Claude Opus 4.8 is Anthropic's most advanced and capable model, designed for complex reasoning, coding, and creative tasks. It represents the pinnacle of the Claude lineup with superior performance across benchmarks. This model excels in nuanced understanding, long-context analysis, and generating high-quality outputs.
Input
$15.00 per 1M tokens
Context
200K tokens
Claude Opus 4.6 is Anthropic's most advanced and capable model, designed for complex reasoning, creative tasks, and nuanced conversation. It represents the pinnacle of the Claude lineup, offering superior performance across a wide range of benchmarks. This model excels in deep analysis, coding, and handling intricate instructions with high accuracy.
Input
$15.00 per 1M tokens
Context
200K tokens
Claude Sonnet 4.6 is a hypothetical model not officially released by Anthropic. As of now, Anthropic's lineup includes Claude 3.5 Sonnet and Claude 3 Sonnet. This entry is based on a fictional version number.
Input
$3.00 per 1M tokens
Context
200K tokens
Claude Haiku 4.5 is Anthropic's fastest and most cost-effective model, designed for high-throughput, low-latency applications. It offers a balance of speed and intelligence, making it ideal for real-time tasks like customer support and content moderation. As the latest iteration in the Haiku family, it provides improved performance over its predecessor while maintaining competitive pricing.
Released
March 2025
Input
$0.80 per 1M tokens
Context
200K tokens
Claude Sonnet 4.5 is Anthropic's most intelligent model, offering state-of-the-art performance across a wide range of tasks including coding, analysis, and creative writing. It balances high capability with improved speed and efficiency, making it suitable for complex enterprise applications.
Released
February 2025
Input
$3.00 per 1M tokens
Context
200K tokens
Claude Haiku 4.0 is Anthropic's fastest and most cost-effective model in the Claude 4 family, designed for high-throughput, low-latency applications. It offers a strong balance of speed and intelligence, making it ideal for simple queries, classification, and real-time chat. As the successor to Claude 3 Haiku, it delivers improved performance across benchmarks while maintaining competitive pricing.
Released
May 2025
Input
$0.80 per 1M tokens
Context
200K tokens
Claude Opus 4.0 is Anthropic's most powerful and intelligent model, designed for complex reasoning, coding, and analysis tasks. It excels at nuanced understanding, creative problem-solving, and maintaining long context coherence. As the flagship model in the Claude lineup, it sets the standard for performance across a wide range of benchmarks.
Released
May 2024
Input
$15.00 per 1M tokens
Context
200K tokens
Claude Sonnet 4.0 is Anthropic's mid-tier model balancing performance and cost, offering strong reasoning, coding, and multilingual capabilities. It is designed for high-throughput enterprise applications requiring reliable, safe AI interactions. Sonnet 4.0 sits between the faster Haiku and the more powerful Opus in the Claude 4 family.
Released
May 2024
Input
$3.00 per 1M tokens
Context
200K tokens
Claude 3.7 Sonnet is Anthropic's most intelligent model to date, combining rapid responses with extended thinking for complex tasks. It is a hybrid reasoning model that can produce near-instant answers or engage in deep, step-by-step reasoning, making it ideal for coding, math, and nuanced analysis.
Released
February 2025
Input
$3.00 per 1M tokens
Context
200K tokens
Claude 3.5 Haiku is Anthropic's fastest model in the Claude 3.5 family, offering low latency and high throughput for everyday tasks. It combines speed with strong performance on coding, reasoning, and language tasks, making it ideal for real-time applications and cost-sensitive deployments.
Released
October 2024
Input
$0.80 per 1M tokens
Context
200K tokens
Claude 3.5 Sonnet is Anthropic's most intelligent model, balancing speed and cost while outperforming Claude 3 Opus on key benchmarks. It excels at nuanced tasks like coding, writing, and analysis with improved instruction following and safety.
Released
June 2024
Input
$3.00 per 1M tokens
Context
200K tokens
Claude 3 Haiku is Anthropic's fastest and most compact model in the Claude 3 family, designed for near-instant responsiveness and high throughput. It excels at lightweight tasks like customer service, content moderation, and real-time chat, offering a balance of speed and intelligence at a lower cost.
Released
March 2024
Input
$0.25 per 1M tokens
Context
200K tokens
Claude 3 Opus is Anthropic's most powerful model, offering state-of-the-art performance on complex tasks with deep reasoning, analysis, and coding. It is the flagship model in the Claude 3 family, designed for maximum intelligence and reliability.
Released
March 2024
Input
$15.00 per 1M tokens
Context
200K tokens
Claude 3 Sonnet is a balanced model in Anthropic's Claude 3 family, offering a strong combination of speed and intelligence for most enterprise workloads. It is ideal for tasks requiring nuanced understanding, rapid responses, and reliable performance at a lower cost than Opus.
Released
March 2024
Input
$3.00 per 1M tokens
Context
200K tokens
Claude Mythos Preview is Anthropic's most advanced and capable model, designed for complex reasoning, creative tasks, and nuanced dialogue. It represents a significant leap in performance across coding, mathematics, and multilingual understanding, while maintaining Anthropic's focus on safety and alignment. This preview offers early access to the next generation of Claude's capabilities.
Released
May 2025
Input
$15.00 per 1M tokens
Context
200K tokens
Claude Fable 5
Claude Mythos 5
Claude 2.1
Claude 2 (Legacy)
Claude 2.0
Claude 1.3
Claude Instant 1.2
Claude 1.2
Claude Instant 1.1
Claude 1.1
Claude Instant 1 (Legacy)
Claude Instant 1.0
Claude 1 (Legacy)
Claude 1.0
14 active models · 5 archived
Stable Diffusion 3.5 Large is Stability AI's most advanced text-to-image model, offering 8.1 billion parameters for high-quality, photorealistic image generation. It excels in prompt adherence, typography, and generating diverse outputs, making it the flagship model in the Stable Diffusion 3.5 family.
Released
October 2024
Input
Free (open source)
Context
—
Stable LM 7B is a compact, efficient language model designed for fast inference and accessibility. It balances performance with low computational requirements, making it suitable for a wide range of text generation tasks. As part of Stability AI's open model lineup, it offers a lightweight alternative to larger models.
Released
May 2024
Input
Free (open source)
Context
4096 tokens
Stable Diffusion 3.5 Large Turbo is a distilled version of SD3.5 Large, offering faster inference with 4 steps instead of the standard 28, while maintaining high image quality. It is part of Stability AI's latest generation of text-to-image models, optimized for speed and efficiency.
Released
October 2024
Input
—
Context
—
Stable Diffusion 3.5 Medium is a text-to-image model with 2.5 billion parameters, offering high-quality image generation with improved prompt adherence and typography. It is designed to run efficiently on consumer hardware, making advanced AI image generation accessible to a wider audience.
Released
October 2024
Input
Free (open source)
Context
—
Stable Diffusion 3.5 Turbo is a distilled version of Stable Diffusion 3.5 Large, optimized for faster inference while maintaining high image quality. It uses a Multimodal Diffusion Transformer (MMDiT) architecture and supports text-to-image generation with improved prompt adherence and typography.
Released
October 2024
Input
—
Context
—
Stable Diffusion 3 is a state-of-the-art text-to-image model that excels in generating high-quality, photorealistic images with improved typography and complex prompt understanding. It builds on the success of its predecessors with a new diffusion transformer architecture and is available in multiple sizes (800M to 8B parameters) to suit different use cases.
Released
June 2024
Input
—
Context
—
Stable Code Instruct 3B is a 3 billion parameter instruction-tuned language model specialized for code generation and software development tasks. It is designed to be lightweight and efficient, offering strong performance for code completion, explanation, and debugging while running on consumer-grade hardware.
Released
May 2024
Input
Free (open source)
Context
4096 tokens
Stable Code 3B is a 3 billion parameter large language model specialized for code generation and completion. It is designed to be lightweight and efficient, offering strong performance on code tasks while being deployable on consumer-grade hardware. It is part of Stability AI's lineup of open models for developers.
Released
January 2024
Input
Free (open source)
Context
8192 tokens
Stable Audio 2.0 is a generative audio model that produces high-quality, full-length music tracks (up to 3 minutes) from text prompts and optional reference audio. It introduces audio-to-audio generation, allowing users to transform existing audio clips, and supports stereo output at 44.1 kHz. The model is designed for musicians, content creators, and developers seeking efficient AI-powered music production.
Released
April 2024
Input
—
Context
—
Stable LM 2 1.6B is a compact, efficient language model designed for on-device and edge deployment. It offers strong performance for its size, with a focus on accessibility and low computational cost.
Released
March 2024
Input
Free (open source)
Context
4096 tokens
Stable LM 2 12B is a 12-billion parameter language model developed by Stability AI, designed for efficient text generation and understanding. It is part of the Stable LM 2 family, offering a balance of performance and speed for a wide range of natural language tasks. The model is available under an open-source license, making it accessible for research and commercial use.
Released
March 2024
Input
Free (open source)
Context
4096 tokens
Stable Audio 1.0 is a latent diffusion model for generating high-quality, variable-length audio from text prompts. It can produce music, sound effects, and ambient audio up to 90 seconds in length. The model is designed for creative professionals and developers seeking controllable audio generation.
Released
September 2023
Input
—
Context
—
Stable Video Diffusion is a generative AI model that converts static images into short video clips. It is built on the latent diffusion architecture and is designed for research and creative applications, offering controllable video generation from a single input image.
Released
November 2023
Input
Free (open source)
Context
—
Stable Diffusion XL (SDXL) is a state-of-the-art text-to-image generation model that produces high-resolution, photorealistic images with improved composition and detail compared to its predecessor. It uses a larger UNet backbone and a two-stage pipeline (base + refiner) for enhanced image quality. SDXL is the flagship image model in Stability AI's lineup, offering superior prompt adherence and aesthetic quality.
Released
July 2022
Input
Free (open source)
Context
—
9 active models · 2 archived
Jamba 1.6 Large is AI21 Labs' flagship model, combining a hybrid SSM-Transformer architecture for high performance and efficiency. It excels at long-context tasks with a 256K token context window and supports structured outputs like JSON mode and function calling. This model is designed for enterprise-grade reasoning, coding, and multilingual applications.
Released
February 2025
Input
$5.00 per 1M tokens
Context
256K tokens
AI21 Maestro is a state-of-the-art large language model by AI21 Labs, designed for complex reasoning and instruction following. It excels in tasks requiring deep understanding and generation of text, positioning itself as a competitive alternative to models like GPT-4 and Claude 3. Maestro is part of AI21's Jamba family, leveraging a hybrid architecture combining Mamba and Transformer components.
Released
May 2024
Input
$5.00 per 1M tokens
Context
256K tokens
Jurassic-2 Ultra is AI21 Labs' most powerful large language model, designed for complex reasoning, long-form content generation, and advanced instruction following. It excels at tasks requiring deep understanding and nuanced output, positioning itself as a top-tier model in the Jurassic-2 family.
Released
May 2023
Input
$5.00 per 1M tokens
Context
8192 tokens
Jurassic-2 Light is a fast, cost-effective language model from AI21 Labs, optimized for high-throughput applications requiring quick responses. It is the smallest and most efficient model in the Jurassic-2 family, offering a balance of speed and quality for simple tasks.
Released
March 2023
Input
$0.30 per 1M tokens
Context
8192 tokens
Jurassic-2 Mid is a mid-sized language model from AI21 Labs, offering a balance of performance and cost. It is part of the Jurassic-2 family, designed for a wide range of natural language tasks with high accuracy and speed. The model is optimized for efficient inference and is suitable for applications requiring quick responses.
Released
March 2023
Input
$0.50 per 1M tokens
Context
8192 tokens
Jamba 1.6 Mini is a lightweight, efficient large language model from AI21 Labs, optimized for speed and cost-effectiveness. It is part of the Jamba family, which uses a hybrid SSM-Transformer architecture for long context handling. This model is designed for high-throughput, low-latency applications where budget is a concern.
Released
August 2024
Input
$0.20 per 1M tokens
Context
256K tokens
Jamba 1.5 Large is a state-of-the-art large language model from AI21 Labs, built on a hybrid SSM-Transformer architecture that combines Mamba structured state space models with traditional transformer layers. It offers a 256K token context window and excels in long-context tasks, multilingual support, and structured output generation.
Released
August 2024
Input
$5.00 per 1M tokens
Context
256K tokens
Jamba 1.5 Mini is a lightweight, efficient large language model from AI21 Labs, designed for high-speed inference and cost-effective deployment. It leverages a hybrid architecture combining Mamba (state space model) and Transformer components, achieving strong performance on a variety of tasks with a 256K token context window. As the smaller sibling of Jamba 1.5 Large, it offers a balance of speed and capability for production use cases.
Released
August 2024
Input
$0.20 per 1M tokens
Context
256K tokens
Jamba-Instruct is a hybrid SSM-Transformer model by AI21 Labs, combining Mamba structured state space with transformer attention for efficient long-context processing. It offers a 256K token context window and is optimized for instruction-following tasks, making it suitable for enterprise applications requiring high throughput and low latency.
Released
March 2024
Input
$0.50 per 1M tokens
Context
256K tokens
9 active models
FLUX.2 is Black Forest Labs' flagship image generation model, succeeding FLUX.1. It introduces enhanced prompt adherence, improved image quality, and faster generation speeds. The model excels in rendering complex scenes with accurate text and fine details.
Released
November 2024
Input
—
Context
—
FLUX.2 [klein] is a fast, efficient image generation model from Black Forest Labs, optimized for speed and lower computational cost while maintaining high-quality outputs. It is part of the FLUX.2 family, offering a balance between performance and resource usage.
Released
November 2024
Input
—
Context
—
FLUX1.1 [pro] is the latest and most advanced text-to-image model from Black Forest Labs, offering state-of-the-art image generation with improved speed, quality, and prompt adherence. It builds upon the FLUX.1 architecture with enhanced performance and efficiency, making it ideal for professional creative workflows.
Released
October 2024
Input
—
Context
—
FLUX1.1 [pro] Ultra is the flagship image generation model from Black Forest Labs, offering the highest quality and detail in the FLUX family. It builds upon FLUX1.1 [pro] with enhanced resolution and fidelity, making it ideal for professional creative work.
Released
October 2024
Input
—
Context
—
FLUX.1 Fill is a state-of-the-art inpainting and outpainting model by Black Forest Labs, capable of seamlessly filling in missing or extending parts of images with high coherence. It builds on the FLUX.1 architecture, offering superior image completion quality compared to previous models.
Released
November 2024
Input
—
Context
—
FLUX.1 Tools [dev] is a suite of image generation tools built on the FLUX.1 architecture, offering capabilities like inpainting, outpainting, and depth-conditional generation. It is designed for developers who need flexible and controllable image synthesis with high fidelity.
Released
November 2024
Input
Free (open source)
Context
—
FLUX.1 [dev] is a 12-billion-parameter text-to-image model that excels in generating high-quality, diverse images with exceptional prompt adherence. It is the open-weight variant of the FLUX.1 suite, offering a balance between performance and accessibility for developers and researchers.
Released
August 2024
Input
Free (open source)
Context
—
FLUX.1 Kontext is an advanced image generation model by Black Forest Labs that excels at generating images with complex scenes and multiple characters while maintaining coherent context. It builds upon the FLUX.1 architecture with enhanced capabilities for handling detailed prompts and maintaining consistency across generated elements.
Released
August 2024
Input
Free (open source)
Context
—
FLUX.1 Kontext [dev] is a text-to-image model by Black Forest Labs that excels at generating images with complex scenes and multiple characters while maintaining high fidelity to the prompt. It is part of the FLUX.1 family, offering a balance between quality and efficiency for developers.
Released
August 2024
Input
—
Context
—
6 active models
Grok 4.3 is xAI's flagship frontier model, offering state-of-the-art reasoning, coding, and multimodal capabilities. It is designed for complex problem-solving and real-time data analysis, leveraging a massive context window and advanced tool use. As the latest iteration in the Grok series, it sets the benchmark for xAI's offerings.
Released
May 2025
Input
$3.00 per 1M tokens
Context
1M tokens
Grok 3 is xAI's most advanced large language model, offering state-of-the-art reasoning, coding, and multimodal capabilities. It is designed to be a highly capable and truthful AI assistant, with a focus on real-time knowledge and a humorous, engaging personality. Grok 3 represents a significant leap over its predecessor, Grok 2, in terms of performance and features.
Released
February 2025
Input
$3.00 per 1M tokens
Context
1M tokens
Grok 2 is a state-of-the-art language model by xAI, offering advanced reasoning, coding, and multimodal capabilities. It features a 128K context window and is designed for complex problem-solving and real-time information retrieval via X (formerly Twitter). Grok 2 represents xAI's flagship model, competing with leading frontier models from OpenAI and Google.
Released
August 2024
Input
$2.00 per 1M tokens
Context
128K tokens
Grok Build 0.1 is an early preview model from xAI, designed for fast and efficient text generation. It is a lightweight variant of the Grok family, optimized for quick responses and lower computational cost. This model serves as a stepping stone in xAI's development of more advanced AI systems.
Released
May 2024
Input
$0.15 per 1M tokens
Context
8192 tokens
Grok Voice API is a real-time speech-to-text and text-to-speech API from xAI, enabling voice interactions with the Grok assistant. It supports multiple languages and low-latency streaming, designed for conversational AI applications.
Released
February 2025
Input
$0.06 per minute (speech-to-text)
Context
—
Grok Imagine API is an image generation model by xAI that creates images from text prompts. It is designed for fast, high-quality image synthesis and integrates with the Grok ecosystem. This model represents xAI's entry into multimodal generation, complementing their text-based Grok models.
Released
December 2024
Input
—
Context
—
5 active models
DeepSeek-V4-Flash is DeepSeek's primary flagship model, optimized for fast and cost-effective inference. It offers a massive 1M token context window and supports multimodal inputs including images and audio. As the latest iteration, it balances high performance with significantly lower pricing compared to its predecessor.
Released
May 2025
Input
$0.05 per 1M tokens
Context
1M tokens
DeepSeek-V4-Pro is the latest flagship model from DeepSeek, offering state-of-the-art performance across a wide range of tasks including reasoning, coding, and general knowledge. It builds upon the success of previous DeepSeek models with enhanced capabilities and efficiency.
Input
$2.00 per 1M tokens
Context
128K tokens
DeepSeek V3 is a powerful Mixture-of-Experts (MoE) language model with 671B total parameters, activated 37B per token. It achieves top-tier performance on benchmarks like MMLU and HumanEval, rivaling leading closed-source models. It is designed for efficient inference and supports a 128K context window.
Released
December 2024
Input
$0.27 per 1M tokens
Context
128K tokens
DeepSeek Coder V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT-4 Turbo in coding tasks. It supports 338 programming languages and features a 128K context window, making it one of the most capable code-specific models available.
Released
June 2024
Input
$0.14 per 1M tokens
Context
128K tokens
DeepSeek R1 is a reasoning-focused large language model that excels in complex problem-solving, mathematics, and coding tasks. It uses a Mixture-of-Experts architecture with 671B total parameters (37B activated) and features a 128K token context window. The model is open-source under the MIT license and offers competitive pricing.
Released
January 2025
Input
$0.55 per 1M tokens (cache hit), $2.19 per 1M tokens (cache miss)
Context
128K tokens
41 active models · 8 archived
Gemini 3.5 Flash is a fast, cost-efficient multimodal model from Google, optimized for high-throughput tasks. It supports text, image, audio, and video inputs, making it suitable for a wide range of applications. As part of the Gemini family, it balances performance and affordability.
Input
$0.075 per 1M tokens
Context
1M tokens
Gemini 3.5 Live Translate is a specialized model from Google designed for real-time speech translation and transcription. It combines low-latency streaming with high accuracy across multiple languages, making it ideal for live conversations and events. This model is part of Google's Gemini family, optimized for speed and efficiency in translation tasks.
Input
$0.50 per 1M tokens
Context
32K tokens
Gemini 3.1 Flash Live is a low-latency multimodal model from Google optimized for real-time interactions, including live audio and video streaming. It is part of the Gemini 3.1 Flash family, designed for fast, cost-effective performance with a 1M token context window.
Released
May 2025
Input
$0.15 per 1M tokens
Context
1M tokens
Veo 3.1 Lite Preview is a lightweight, fast video generation model from Google, optimized for rapid prototyping and iterative content creation. It offers a more cost-effective and quicker alternative to the full Veo 3.1 model while maintaining high-quality video output. This preview version allows developers to experiment with video generation capabilities before the stable release.
Released
May 2025
Input
—
Context
—
Veo 3.1 Preview is Google's latest video generation model, capable of producing high-quality, realistic videos from text prompts. It offers enhanced temporal consistency, improved motion understanding, and supports longer video durations compared to its predecessor. This model is part of Google's Vertex AI and Gemini API ecosystem, targeting creative professionals and developers.
Released
May 2025
Input
—
Context
—
Gemini 3.1 Flash TTS is a text-to-speech model from Google that generates natural-sounding speech from text input. It is part of the Gemini Flash family, optimized for low-latency and cost-effective audio generation. The model supports multiple voices and languages, making it suitable for various voice applications.
Input
—
Context
—
Gemini 3.1 Flash-Lite is a lightweight, cost-efficient model in Google's Gemini lineup, optimized for high-throughput, low-latency tasks. It offers a balance of speed and quality for simple, high-volume applications.
Input
$0.075 per 1M tokens
Context
1M tokens
Gemini 3.1 Pro is Google's most advanced large language model, offering state-of-the-art performance across a wide range of tasks including reasoning, coding, and multimodal understanding. It features an extremely long context window and is designed for complex enterprise applications.
Input
$1.25 per 1M tokens
Context
2M tokens
Gemma 3 is a lightweight, state-of-the-art open model from Google, built from the same research and technology used to create the Gemini models. It is designed for responsible AI development and offers strong performance across a variety of text generation tasks, with support for function calling and structured outputs.
Released
March 2025
Input
Free (open source)
Context
32K tokens
Gemma 3 12B is a lightweight, open-source language model from Google, designed for efficient deployment on consumer hardware. It offers strong performance for its size, with support for multilingual text generation and a 128K context window. It is part of the Gemma 3 family, which includes 1B, 4B, and 12B parameter variants.
Released
March 2025
Input
Free (open source)
Context
128K tokens
Gemma 3 1B is a lightweight, open-source language model from Google, designed for efficient on-device and edge deployment. It offers strong performance for its size, with a 1 billion parameter count and support for a 128K token context window.
Released
March 2025
Input
Free (open source)
Context
128K tokens
Gemma 3 27B is Google's latest open model in the Gemma family, offering strong performance for its size with a 128K context window and multilingual support. It is designed for efficient deployment on consumer hardware while delivering competitive results on reasoning, coding, and instruction-following tasks.
Released
March 2025
Input
Free (open source)
Context
128K tokens
Gemma 3 7B is a lightweight, state-of-the-art open model from Google, built from the same research and technology used to create the Gemini models. It is designed for efficient deployment on resource-constrained environments like laptops and edge devices, offering strong performance for text generation and understanding tasks.
Released
March 2025
Input
Free (open source)
Context
8K tokens
Imagen 3 is Google's most advanced text-to-image model, capable of generating photorealistic images with exceptional detail and composition. It offers improved understanding of natural language prompts and better handling of text rendering within images. As the successor to Imagen 2, it sets a new standard for image generation quality in Google's lineup.
Released
August 2024
Input
—
Context
—
Gemini 3 is Google's most advanced multimodal AI model, designed to understand and generate text, images, audio, and video. It builds upon the capabilities of Gemini 2 with enhanced reasoning, coding, and creative abilities. As the flagship model in the Gemini lineup, it offers state-of-the-art performance across a wide range of tasks.
Input
$5.00 per 1M tokens
Context
1M tokens
Gemini 3 Flash is a lightweight, fast, and cost-efficient model from Google, optimized for high-throughput tasks and real-time applications. It balances performance and speed, making it ideal for simple reasoning, summarization, and chat use cases. As part of the Gemini 3 family, it offers lower latency and reduced pricing compared to larger models.
Input
$0.10 per 1M tokens
Context
1M tokens
Gemini 2.5 Flash is a cost-efficient, low-latency multimodal model designed for high-volume production workloads. It balances speed and quality, offering strong reasoning capabilities with reduced cost compared to Gemini 2.5 Pro. It supports text, image, audio, and video inputs with a 1M token context window.
Released
May 2025
Input
$0.15 per 1M tokens
Context
1M tokens
Gemini 2.5 Flash Experimental is a fast, cost-efficient multimodal model from Google, designed for high-volume, low-latency applications. It balances performance and affordability, making it suitable for real-time tasks like chat, summarization, and content generation. As an experimental model, it offers early access to the latest capabilities in the Gemini 2.5 Flash family.
Released
May 2025
Input
$0.15 per 1M tokens
Context
1M tokens
Gemini 2.5 Flash Live Preview is a fast, cost-efficient multimodal model from Google, optimized for low-latency, real-time applications. It supports text, image, audio, and video inputs with a 1M token context window, making it ideal for interactive and streaming use cases. As a preview release, it offers cutting-edge performance at a lower price point than Gemini 2.5 Pro.
Released
May 2025
Input
$0.15 per 1M tokens
Context
1M tokens
Gemini 2.5 Flash TTS Preview is a text-to-speech model that generates natural-sounding speech from text input. It is part of the Gemini 2.5 Flash family, optimized for low-latency, high-quality audio generation. This preview model allows developers to integrate expressive speech synthesis into applications.
Released
May 2025
Input
—
Context
—
Gemini 2.5 Flash-Lite is a cost-efficient, low-latency model in the Gemini 2.5 family, designed for high-volume, simple tasks. It offers the lowest cost and fastest speed among Gemini 2.5 models while maintaining a 1M token context window. It is ideal for use cases that require quick responses and minimal processing.
Released
May 2025
Input
$0.075 per 1M tokens
Context
1M tokens
Gemini 2.5 Pro TTS Preview is a multimodal model from Google that adds text-to-speech (TTS) capabilities to the Gemini 2.5 Pro reasoning model. It can generate spoken audio responses from text, enabling natural voice interactions. This preview model is designed for applications requiring high-quality, expressive speech synthesis.
Released
May 2025
Input
$1.25 per 1M tokens
Context
1M tokens
Gemini 2.5 Pro is Google's most advanced reasoning model, designed for complex tasks requiring deep thought and multi-step planning. It introduces a 'thinking' capability that allows the model to reason through problems before generating responses, leading to improved accuracy and performance on coding, math, and science benchmarks. It is the first model in the Gemini 2.5 family, setting a new standard for reasoning in Google's lineup.
Released
March 2025
Input
$1.25 per 1M tokens
Context
1M tokens
Gemini 2.5 Pro Experimental is Google's most advanced reasoning model, featuring a 1M token context window and native multimodal capabilities. It excels at complex tasks like coding, mathematics, and scientific reasoning, with a 'thinking' mode that improves accuracy. This model represents a significant leap in reasoning performance for the Gemini lineup.
Released
March 2025
Input
$1.25 per 1M tokens
Context
1M tokens
Gemini 2.0 Flash Lite is a cost-efficient, low-latency model optimized for high-volume, text-only tasks. It offers a 1M token context window and is designed to be the most affordable option in the Gemini 2.0 Flash family, ideal for scaling AI applications.
Released
February 2025
Input
$0.075 per 1M tokens
Context
1M tokens
Gemini 2.0 Flash is Google's next-generation multimodal model offering low latency and enhanced performance across text, image, audio, and video inputs. It introduces native tool use, including Google Search grounding and code execution, making it ideal for agentic and real-time applications.
Released
December 2024
Input
$0.10 per 1M tokens
Context
1M tokens
Gemini 2.0 Flash Thinking is an experimental model that uses chain-of-thought reasoning to improve performance on complex tasks. It is designed to think through problems step-by-step, making it particularly effective for math, coding, and logic. It is part of the Gemini 2.0 Flash family, optimized for speed and efficiency.
Released
December 2024
Input
$0.10 per 1M tokens
Context
1M tokens
Veo 2 is Google's most advanced video generation model, capable of producing high-quality, realistic videos from text prompts. It offers improved understanding of real-world physics, camera controls, and cinematic effects, setting a new standard in AI video generation. Veo 2 is part of Google's VideoFX suite and is available through VideoFX and Google Labs.
Released
December 2024
Input
—
Context
—
Gemma 2 is a family of lightweight, state-of-the-art open models from Google, available in 2B, 9B, and 27B parameter sizes. It builds on the original Gemma with significant performance improvements, including better reasoning, coding, and safety benchmarks. Gemma 2 is designed for developers and researchers who need efficient, high-quality language models for a wide range of tasks.
Released
June 2024
Input
Free (open source)
Context
8192 tokens
Gemma 2 27B is a lightweight, open-source language model from Google, built on the same research and technology as the Gemini models. It offers strong performance for its size, excelling in reasoning, coding, and safety benchmarks. As part of the Gemma 2 family, it provides a balance of efficiency and capability for developers and researchers.
Released
June 2024
Input
Free (open source)
Context
8192 tokens
Gemma 2 2B is a lightweight, open-source language model from Google, designed for efficient deployment on resource-constrained devices. It is part of the Gemma 2 family, offering strong performance for its size with a focus on accessibility and customization.
Released
June 2024
Input
Free (open source)
Context
8192 tokens
Gemma 2 9B is a lightweight, open-source language model from Google, designed for efficient text generation and understanding. It balances performance and accessibility, making it suitable for a wide range of NLP tasks with a 9 billion parameter count.
Released
June 2024
Input
Free (open source)
Context
8192 tokens
Gemini 1.5 Flash-8B is a smaller, faster, and more cost-efficient variant of the Gemini 1.5 Flash model, optimized for high-volume, low-latency tasks. It retains multimodal capabilities (text, image, audio, video) and a 1M token context window, making it ideal for applications requiring quick responses and reduced computational cost.
Released
October 2024
Input
$0.0375 per 1M tokens
Context
1M tokens
Gemini 1.5 Flash 002 is a fast and efficient multimodal model from Google, optimized for high-volume, low-latency tasks. It balances performance and cost, making it ideal for applications requiring quick responses across text, image, audio, and video inputs. As part of the Gemini 1.5 family, it offers a large context window and competitive pricing.
Released
September 2024
Input
$0.075 per 1M tokens
Context
1,048,576 tokens
Gemini 1.5 Pro 002 is Google's most capable general-purpose model, featuring a 2 million token context window and native multimodal understanding across text, images, audio, and video. It offers improved performance and lower latency compared to its predecessor, making it suitable for complex reasoning and large-scale data analysis.
Released
September 2024
Input
$1.25 per 1M tokens (up to 128K tokens), $2.50 per 1M tokens (over 128K tokens)
Context
2M tokens
Gemini 1.5 Flash is a lightweight, fast, and cost-efficient multimodal model designed for high-volume, latency-sensitive applications. It is optimized for tasks like summarization, chat, and image/video captioning, offering a balance of performance and speed. As part of the Gemini 1.5 family, it supports a 1 million token context window and native multimodal inputs.
Released
May 2024
Input
$0.075 per 1M tokens
Context
1M tokens
Gemini 1.5 Pro is Google's most advanced multimodal model, capable of understanding and processing text, images, audio, video, and code. It features a breakthrough 1 million token context window, enabling analysis of extremely long documents, videos, or codebases. It excels at complex reasoning, long-context tasks, and multimodal understanding.
Released
May 2024
Input
$3.50 per 1M tokens (text up to 128K tokens), $7.00 per 1M tokens (text over 128K tokens), $10.50 per 1M tokens (audio/image/video up to 128K tokens), $21.00 per 1M tokens (audio/image/video over 128K tokens)
Context
1,048,576 tokens
Gemini 1.0 Pro is a lightweight, cost-efficient model optimized for a wide range of text-based tasks, including summarization, classification, and extraction. It offers a balance of performance and speed, making it suitable for high-volume applications where latency and cost are critical.
Released
December 2023
Input
$0.50 per 1M tokens
Context
32K tokens
Veo is Google's most capable generative video model, capable of producing high-quality 1080p videos from text, image, or video prompts. It understands natural language and visual semantics to create realistic and imaginative videos with consistent style and motion.
Released
May 2024
Input
—
Context
—
Imagen is Google's text-to-image diffusion model that generates high-fidelity images from natural language descriptions. It leverages a large language model (T5-XXL) for text understanding and a diffusion model for image generation, producing photorealistic outputs. Imagen is known for its exceptional image quality and deep language comprehension.
Released
May 2022
Input
—
Context
—
Nano Banana Pro is a lightweight, efficient language model from Google designed for on-device and edge computing applications. It offers fast inference with low computational requirements while maintaining strong performance on common NLP tasks. This model is part of Google's Gemini family, optimized for mobile and embedded systems.
Input
Free (open source)
Context
8K tokens
42 active models
QwQ-32B is a 32-billion-parameter reasoning model developed by Alibaba's Qwen team, designed to excel in complex problem-solving and logical reasoning tasks. It is a distilled version of the larger QwQ-72B model, offering strong performance with reduced computational requirements. The model is open-source and available on Hugging Face.
Released
March 2025
Input
Free (open source)
Context
131072 tokens
Qwen3 0.6B is a compact, efficient language model from Alibaba's Qwen3 series, designed for lightweight deployment and fast inference. It balances performance with low resource requirements, making it suitable for edge devices and real-time applications. As the smallest model in the Qwen3 lineup, it offers strong capabilities for its size.
Released
May 2025
Input
Free (open source)
Context
32K tokens
Qwen3 1.7B is a compact, efficient language model from Alibaba's Qwen3 family, designed for fast inference and deployment on resource-constrained devices. It balances strong performance with low computational cost, making it suitable for edge computing and real-time applications.
Released
May 2025
Input
Free (open source)
Context
32K tokens
Qwen3 14B is a mid-sized reasoning model in the Qwen3 family, balancing strong performance with efficiency. It supports extended thinking (chain-of-thought) and is optimized for complex reasoning tasks while being deployable on consumer hardware.
Released
May 2025
Input
Free (open source)
Context
32K tokens
Qwen3 30B is a 30-billion-parameter large language model from Alibaba's Qwen3 series, designed for efficient reasoning and general-purpose tasks. It supports extended context lengths and is available as an open-weight model, balancing performance with computational efficiency.
Released
May 2025
Input
Free (open source)
Context
131072 tokens
Qwen3 32B is a mid-sized reasoning model in the Qwen3 family, balancing strong performance with efficiency. It excels in complex reasoning, coding, and multilingual tasks, and is available as an open-source model on HuggingFace.
Released
May 2025
Input
Free (open source)
Context
128K tokens
Qwen3 4B is a compact, efficient language model from Alibaba's Qwen3 series, designed for fast inference and deployment on resource-constrained devices. It balances strong performance with low computational cost, making it suitable for edge computing and real-time applications.
Released
May 2025
Input
Free (open source)
Context
32K tokens
Qwen3 72B is a large language model from Alibaba's Qwen3 series, featuring a Mixture-of-Experts (MoE) architecture with 72B total parameters and 12B activated parameters. It supports extended context lengths up to 128K tokens and offers strong performance in reasoning, coding, and multilingual tasks.
Released
May 2025
Input
Free (open source)
Context
128K tokens
Qwen3 8B is a 8-billion-parameter large language model from Alibaba's Qwen3 series, featuring a Mixture-of-Experts (MoE) architecture with 2.2B activated parameters. It supports extended thinking (reasoning) and is optimized for efficiency, outperforming similarly sized dense models on benchmarks like MMLU and coding tasks.
Released
May 2025
Input
Free (open source)
Context
32K tokens
Qwen3 235B is Alibaba's largest and most capable MoE (Mixture-of-Experts) language model, featuring 235 billion total parameters with 35 billion activated per token. It achieves state-of-the-art performance across reasoning, coding, and multilingual tasks, rivaling leading models like DeepSeek-R1 and GPT-4o.
Released
April 2025
Input
Free (open source)
Context
131072 tokens
Qwen3 235B-A22B is a Mixture-of-Experts (MoE) large language model with 235B total parameters and 22B activated parameters per token. It is designed for strong reasoning and multilingual capabilities, supporting 119 languages and 29 programming languages, and is part of Alibaba's Qwen3 series.
Released
April 2025
Input
Free (open source)
Context
131072 tokens
Qwen2.5-VL 3B is a compact multimodal vision-language model from Alibaba's Qwen series, designed for efficient image and video understanding. It excels in tasks like visual question answering, document parsing, and video analysis while maintaining a small footprint for deployment on edge devices.
Released
January 2025
Input
Free (open source)
Context
128K tokens
Qwen2.5-VL 72B is Alibaba's flagship multimodal model, excelling in vision-language tasks such as image and video understanding, document parsing, and visual reasoning. It builds on the Qwen2.5 language model with enhanced visual perception and dynamic resolution support.
Released
January 2025
Input
Free (open source)
Context
131072 tokens
Qwen2.5-VL 7B is a multimodal vision-language model from Alibaba's Qwen series, supporting image and video understanding. It excels in visual reasoning, document parsing, and real-time video analysis, offering strong performance in a compact 7B parameter size.
Released
January 2025
Input
Free (open source)
Context
131072 tokens
Qwen2.5-Coder 14B is a state-of-the-art code-specific large language model from Alibaba's Qwen series, designed to excel in code generation, reasoning, and fixing. It is part of the Qwen2.5-Coder family, which includes models ranging from 1.5B to 32B parameters, with the 14B variant offering a strong balance of performance and efficiency.
Released
November 2024
Input
Free (open source)
Context
32,768 tokens
Qwen2.5-Coder 32B is a state-of-the-art code-specific large language model from Alibaba's Qwen series, designed to excel in code generation, reasoning, and software engineering tasks. It is the largest model in the Qwen2.5-Coder family, offering competitive performance against specialized coding models like GPT-4o and Claude 3.5 Sonnet. This model is open-source and available on Hugging Face, making it accessible for both research and commercial use.
Released
November 2024
Input
Free (open source)
Context
131072 tokens
Qwen2.5-Coder 3B is a compact code-specialized language model from Alibaba's Qwen series, designed for efficient code generation and understanding. It balances strong coding performance with a small footprint, making it suitable for resource-constrained environments. As part of the Qwen2.5-Coder family, it offers competitive results on coding benchmarks while being significantly smaller than larger variants.
Released
November 2024
Input
Free (open source)
Context
32,768 tokens
Qwen2.5-Coder 7B is a specialized code generation model from Alibaba's Qwen2.5 series, fine-tuned for programming tasks. It offers strong performance in code completion, debugging, and multi-language support, making it a competitive open-source alternative for developers.
Released
November 2024
Input
Free (open source)
Context
32K tokens
Qwen2.5 0.5B is a compact language model from Alibaba's Qwen2.5 series, designed for efficient on-device and edge deployment. With only 0.5 billion parameters, it offers strong performance for its size, supporting a 32K token context window and multilingual capabilities.
Released
September 2024
Input
Free (open source)
Context
32K tokens
Qwen2.5 1.5B is a compact language model from Alibaba's Qwen2.5 series, designed for efficient deployment on edge devices and resource-constrained environments. It offers strong performance for its size, with support for multiple languages and a 32K token context window.
Released
September 2024
Input
Free (open source)
Context
32K tokens
Qwen2.5 110B is Alibaba's largest open-source language model, featuring 110 billion parameters and a 128K token context window. It excels in coding, mathematics, and multilingual tasks, outperforming many proprietary models of similar size. The model is available under a permissive Apache 2.0 license.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Qwen2.5 14B is a large language model from Alibaba's Qwen series, offering a strong balance of performance and efficiency. It features an extended context window of 128K tokens and supports multiple languages, making it suitable for a wide range of text-based tasks. As part of the Qwen2.5 family, it delivers improved reasoning and coding capabilities over its predecessor.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Qwen2.5 32B is a large language model from Alibaba's Qwen series, offering a strong balance of performance and efficiency with 32 billion parameters. It supports a 128K token context window and excels in coding, mathematics, and multilingual tasks. This model is part of the Qwen2.5 family, which includes sizes from 0.5B to 72B, and is available as an open-source model on Hugging Face.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Qwen2.5 3B is a compact yet powerful language model from Alibaba's Qwen2.5 series, offering strong performance in coding, mathematics, and general text generation. It is designed for efficient deployment on edge devices and resource-constrained environments while maintaining competitive benchmark scores.
Released
September 2024
Input
Free (open source)
Context
32768 tokens
Qwen2.5 72B is a large language model from Alibaba's Qwen series, featuring 72 billion parameters and a 128K token context window. It excels in coding, mathematics, and multilingual tasks, and is available as an open-source model under Apache 2.0 license.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Qwen2.5 7B is a large language model from Alibaba's Qwen series, offering strong performance in coding, mathematics, and general knowledge tasks. It features an extended context window of up to 128K tokens and supports over 29 languages, making it versatile for multilingual applications. As part of the Qwen2.5 family, it balances efficiency and capability for a wide range of use cases.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Qwen2.5-Coder 0.5B is a compact code-specific language model from Alibaba's Qwen2.5 series, designed for efficient code generation and understanding. With only 0.5 billion parameters, it offers fast inference and low resource requirements while maintaining competitive performance on coding tasks. It is part of a family that includes larger models up to 32B parameters.
Released
September 2024
Input
Free (open source)
Context
32K tokens
Qwen2.5-Coder 1.5B is a compact code-specific language model from Alibaba's Qwen series, designed for efficient code generation and understanding. It balances performance and resource usage, making it suitable for local deployment and lightweight coding tasks. As part of the Qwen2.5-Coder family, it offers strong code completion and infilling capabilities.
Released
September 2024
Input
Free (open source)
Context
32K tokens
Qwen2 0.5B is a small, efficient language model from Alibaba's Qwen2 series, designed for lightweight deployment and fast inference. It balances performance with low computational requirements, making it suitable for resource-constrained environments. As the smallest model in the Qwen2 family, it offers a compact alternative for basic text generation tasks.
Released
June 2024
Input
Free (open source)
Context
32K tokens
Qwen2 1.5B is a small, efficient language model from Alibaba's Qwen2 series, designed for lightweight deployment and fast inference. It balances performance and resource usage, making it suitable for edge devices and applications with limited compute. As the smallest model in the Qwen2 family, it offers strong multilingual capabilities and supports a 32K token context window.
Released
June 2024
Input
Free (open source)
Context
32K tokens
Qwen2 72B is Alibaba's largest open-source language model in the Qwen2 series, featuring 72 billion parameters and a 128K token context window. It achieves state-of-the-art performance across multiple benchmarks, excelling in reasoning, coding, and multilingual tasks. As the flagship model of the Qwen2 family, it represents Alibaba's commitment to open-source AI with competitive capabilities.
Released
June 2024
Input
Free (open source)
Context
128K tokens
Qwen2 7B is a 7-billion-parameter large language model from Alibaba's Qwen2 series, offering strong performance in multilingual tasks, coding, and reasoning. It features an extended context window of up to 128K tokens and is available as an open-source model under Apache 2.0 license.
Released
June 2024
Input
Free (open source)
Context
128K tokens
Qwen1.5 0.5B is a compact 0.5 billion parameter language model from Alibaba's Qwen1.5 series, designed for efficient deployment on resource-constrained devices. It offers a balance of performance and speed, making it suitable for lightweight applications and experimentation.
Released
February 2024
Input
Free (open source)
Context
32K tokens
Qwen1.5 1.8B is a compact decoder-only language model from Alibaba's Qwen1.5 series, featuring 1.8 billion parameters. It is designed for efficient deployment on edge devices and resource-constrained environments while maintaining strong performance on various NLP tasks. As the smallest model in the Qwen1.5 lineup, it offers a balance of speed and capability for lightweight applications.
Released
February 2024
Input
Free (open source)
Context
32K tokens
Qwen1.5 110B is a large language model with 110 billion parameters, part of Alibaba's Qwen1.5 series. It excels in multilingual tasks, particularly in Chinese and English, and supports a 32K token context window. As the largest model in the Qwen1.5 lineup, it offers strong performance across a wide range of NLP benchmarks.
Released
February 2024
Input
Free (open source)
Context
32K tokens
Qwen1.5 14B is a transformer-based decoder-only language model from Alibaba's Qwen series, featuring 14 billion parameters. It is part of the Qwen1.5 release that introduced a stable version with enhanced performance and support for multiple languages. The model excels in tasks like text generation, coding, and reasoning, and is available under a permissive license.
Released
February 2024
Input
Free (open source)
Context
32768 tokens
Qwen1.5 32B is a large language model from Alibaba's Qwen1.5 series, featuring 32 billion parameters. It supports a context length of 32,768 tokens and is designed for a wide range of natural language tasks. The model is open-source under Apache 2.0 license and is available in both base and chat versions.
Released
February 2024
Input
Free (open source)
Context
32768 tokens
Qwen1.5 4B is a compact 4-billion-parameter language model from Alibaba's Qwen1.5 series, designed for efficient deployment on resource-constrained devices. It supports a 32K context window and offers multilingual capabilities, including strong performance in Chinese and English. As the smallest model in the Qwen1.5 lineup, it balances speed and quality for lightweight applications.
Released
February 2024
Input
Free (open source)
Context
32768 tokens
Qwen1.5 72B is a large language model from Alibaba's Qwen series, featuring a mixture-of-experts architecture for efficient scaling. It excels in multilingual tasks, particularly Chinese and English, and offers strong performance across a wide range of benchmarks. As the largest model in the Qwen1.5 family, it provides state-of-the-art capabilities for complex reasoning and generation.
Released
February 2024
Input
Free (open source)
Context
32768 tokens
Qwen1.5 7B is a 7-billion-parameter language model from Alibaba's Qwen series, offering a balance of performance and efficiency. It supports a 32K context window and is available under Apache 2.0 license, making it suitable for a wide range of NLP tasks.
Released
February 2024
Input
Free (open source)
Context
32768 tokens
Qwen-Audio is a large audio-language model developed by Alibaba Cloud, designed to process and understand various types of audio inputs including speech, music, and environmental sounds. It extends the Qwen series by incorporating audio understanding capabilities, enabling tasks such as audio captioning, sound event detection, and speech recognition. The model is part of Alibaba's open-source Qwen family, offering a unified framework for audio and text interactions.
Released
August 2023
Input
Free (open source)
Context
8192 tokens
Qwen-VL is a multimodal large language model developed by Alibaba Cloud, capable of understanding and generating text based on visual inputs such as images. It integrates vision and language understanding, enabling tasks like image captioning, visual question answering, and document understanding. As part of the Qwen series, it offers strong performance in both Chinese and English contexts.
Released
August 2023
Input
Free (open source)
Context
32K tokens
19 active models · 2 archived
Code Llama 70B is Meta's largest code-focused language model, designed for code generation, understanding, and debugging. It is built on Llama 2 and fine-tuned on code-heavy datasets, offering state-of-the-art performance among open-source code models. It supports multiple programming languages and is available in base, Python-specialized, and instruction-tuned variants.
Released
January 2024
Input
Free (open source)
Context
100K tokens
Code Llama 34B is a large language model specialized for code generation and understanding, built on top of Llama 2. It is the largest variant in the Code Llama family, offering state-of-the-art performance on coding tasks. The model supports infilling, code completion, and instruction following for multiple programming languages.
Released
August 2023
Input
Free (open source)
Context
16K tokens
Code Llama 13B is a large language model specialized for code generation and understanding, based on Llama 2. It is part of Meta's Code Llama family, offering a balance between performance and efficiency for code-related tasks. The model excels in generating code from natural language prompts and supports multiple programming languages.
Released
August 2023
Input
Free (open source)
Context
16K tokens
Code Llama 7B is a 7 billion parameter large language model specialized for code generation and understanding, built on top of Llama 2. It is part of Meta's Code Llama family, which includes base, Python, and Instruct variants, designed to assist with code completion, debugging, and natural language to code tasks.
Released
August 2023
Input
Free (open source)
Context
16K tokens
Llama 4 Behemoth is Meta's most powerful and largest open-source multimodal model, designed to push the boundaries of AI research and development. It excels at complex reasoning, coding, and understanding both text and images, serving as a research-focused model for the community. As the flagship of the Llama 4 family, it offers unprecedented scale and capability for open-weight models.
Released
April 2025
Input
Free (open source)
Context
1M tokens
Llama 4 Maverick is Meta's most advanced multimodal model, designed for complex reasoning and vision-language tasks. It excels in coding, math, and instruction following, with a 10M token context window and native support for images and text. As the flagship of the Llama 4 family, it sets new standards for open-weight AI performance.
Released
April 2025
Input
Free (open source)
Context
10M tokens
Llama 4 Scout is a lightweight, efficient multimodal model from Meta, designed for fast inference and on-device deployment. It supports text and image inputs, making it suitable for vision-language tasks. As part of the Llama 4 family, Scout balances performance with resource efficiency.
Released
April 2025
Input
Free (open source)
Context
1M tokens
Llama 3.3 70B is Meta's most advanced open-source large language model, offering state-of-the-art performance in reasoning, coding, and multilingual tasks. It is a text-only model optimized for instruction following and safety, serving as a cost-effective alternative to larger models like Llama 3.1 405B.
Released
December 2024
Input
Free (open source)
Context
128K tokens
Llama 3.2 11B is a multimodal model that supports text and image inputs, enabling tasks like visual reasoning and document understanding. It is part of Meta's Llama 3.2 family, offering a balance of performance and efficiency for on-device and cloud applications. This model is open-source and optimized for instruction following and safety.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Llama 3.2 1B is a small, efficient language model from Meta, optimized for on-device and edge applications. It is part of the Llama 3.2 family, which includes both text-only and multimodal models, with the 1B variant focusing on lightweight text generation. This model is designed for high-speed inference and low resource consumption, making it suitable for mobile and embedded systems.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Llama 3.2 3B is a lightweight, multilingual text-only model from Meta, optimized for on-device and edge applications. It is part of the Llama 3.2 family, which includes both text-only and vision models, and offers strong performance for its size with support for 8 languages.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Llama 3.2 90B is Meta's largest multimodal model in the Llama 3.2 family, supporting text and image inputs. It excels at complex reasoning, multilingual tasks, and vision-language understanding, positioning it as a top-tier open-weight model for enterprise and research applications.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Llama 3.1 405B is Meta's largest and most capable open-source language model, featuring 405 billion parameters. It excels in general knowledge, reasoning, and multilingual tasks, and is designed for synthetic data generation and model distillation. As the flagship of the Llama 3.1 family, it offers state-of-the-art performance among open models.
Released
July 2024
Input
Free (open source)
Context
128K tokens
Llama 3.1 70B is a large language model from Meta, part of the Llama 3.1 family with a 128K context window and support for eight languages. It offers strong performance in reasoning, coding, and multilingual tasks, and is available as an open-weight model for research and commercial use.
Released
July 2024
Input
Free (open source)
Context
128K tokens
Llama 3.1 8B is a compact, efficient large language model from Meta, part of the Llama 3.1 family. It offers strong performance for its size, with a 128K token context window and multilingual support. It is designed for a wide range of text-based tasks, balancing quality and computational efficiency.
Released
July 2024
Input
Free (open source)
Context
128K tokens
Llama 3 70B is a large language model from Meta, part of the Llama 3 family, offering strong performance across a wide range of natural language tasks. It is an open-source model that balances capability with accessibility, making it suitable for both research and commercial applications. As the mid-sized variant in the Llama 3 lineup, it provides a compelling option for developers needing high-quality text generation without the computational demands of the 405B model.
Released
April 2024
Input
Free (open source)
Context
8K tokens
Llama 3 8B is a lightweight, open-source large language model from Meta, designed for efficient text generation and understanding. It offers strong performance for its size, rivaling larger models in many tasks, and is part of the Llama 3 family that includes 8B and 70B parameter variants. The model is optimized for a wide range of natural language processing applications with a focus on accessibility and customization.
Released
April 2024
Input
Free (open source)
Context
8K tokens
Llama 2 70B is a large language model from Meta, part of the Llama 2 family, with 70 billion parameters. It is designed for a wide range of natural language tasks and is available for both research and commercial use under a permissive license. As the largest model in the Llama 2 series, it offers strong performance on reasoning, coding, and general knowledge tasks.
Released
July 2023
Input
Free (open source)
Context
4096 tokens
Llama 2 7B is a foundational open-source large language model from Meta, part of the Llama 2 family. It is a 7 billion parameter model optimized for dialogue and text generation, available for both research and commercial use. As the smallest variant in the Llama 2 series, it offers efficient performance for a wide range of natural language tasks.
Released
July 2023
Input
Free (open source)
Context
4096 tokens
13 active models · 1 archived
Command A Reasoning is a specialized variant of Cohere's Command A model, optimized for complex reasoning tasks such as multi-step logic, mathematical problem-solving, and code generation. It features an extended reasoning chain that improves accuracy on tasks requiring deep deliberation, while maintaining the core strengths of the Command A family in enterprise-grade text generation and tool use.
Released
August 2025
Input
$2.50 per 1M tokens
Context
256K tokens
Command-A Translate is a specialized variant of Cohere's Command-A model optimized for high-quality translation across multiple languages. It leverages a 128K context window and is designed for efficient, accurate translation tasks, making it ideal for enterprise localization and multilingual content generation.
Released
August 2025
Input
$2.50 per 1M tokens
Context
128K tokens
Command R+ 08-2024 is an updated version of Cohere's high-performance generative model optimized for enterprise RAG and tool use. It offers improved efficiency and accuracy in multilingual tasks, with a focus on cost-effective deployment.
Released
August 2024
Input
$2.50 per 1M tokens
Context
128K tokens
Command R+ 08-2024 is Cohere's most powerful large language model, optimized for enterprise-grade RAG and tool use. It excels at complex reasoning, multilingual tasks, and long-context understanding with a 128K token context window. This model is designed for high-accuracy, scalable AI applications requiring robust retrieval-augmented generation.
Released
August 2024
Input
$5.00 per 1M tokens
Context
128K tokens
Command A Vision is Cohere's flagship multimodal model, combining advanced language understanding with native vision capabilities. It excels at tasks requiring both text and image analysis, such as document understanding, visual question answering, and multimodal reasoning. This model is part of the Command A family, designed for enterprise-grade performance and safety.
Released
July 2025
Input
$2.50 per 1M tokens
Context
128K tokens
Command R7B is a compact, efficient language model optimized for fast inference and low cost. It is designed for high-throughput applications requiring quick responses, such as real-time chatbots and simple text generation tasks. As part of Cohere's Command family, it balances performance with affordability.
Released
December 2024
Input
$0.15 per 1M tokens
Context
128K tokens
Command A+ is Cohere's most advanced large language model, optimized for enterprise-grade reasoning, long-context understanding, and multilingual tasks. It builds on the Command family with enhanced accuracy, lower latency, and stronger instruction following. This model is designed for complex workflows requiring high reliability and safety.
Released
May 2026
Input
$2.50 per 1M tokens
Context
256K tokens
Command R+ is Cohere's most powerful large language model, optimized for enterprise-grade RAG and tool use. It excels at complex reasoning, multilingual tasks, and long-context understanding with a 128K token context window. This model is designed for production workloads requiring high accuracy and reliability.
Released
April 2024
Input
$3.00 per 1M tokens
Context
128K tokens
Command A is Cohere's most advanced large language model, optimized for enterprise-grade agentic tasks and multilingual performance. It features a 256K context window, supports 23 languages, and excels in tool use, retrieval-augmented generation (RAG), and code generation. Command A is designed to be highly capable while remaining cost-efficient, outperforming models like GPT-4o and Llama 3.1 405B on key benchmarks.
Released
March 2025
Input
$2.50 per 1M tokens
Context
256K tokens
Command-R is a large language model optimized for retrieval-augmented generation (RAG) and tool use, designed for enterprise workloads. It balances high performance with cost efficiency, offering strong reasoning and multilingual capabilities. This model is part of Cohere's Command family, positioned as a faster and more affordable alternative to Command-R+.
Released
March 2024
Input
$0.50 per 1M tokens
Context
128K tokens
Command R is Cohere's flagship large language model optimized for enterprise use cases, including retrieval-augmented generation (RAG) and tool use. It balances high performance with efficiency, offering strong multilingual support and a 128K context window. Command R is designed to be a versatile workhorse for production applications.
Released
March 2024
Input
$0.50 per 1M tokens
Context
128K tokens
Command R+ is Cohere's most advanced large language model, optimized for enterprise-grade RAG (Retrieval-Augmented Generation) and tool use. It excels at complex reasoning, multi-step tool use, and long-context tasks, making it ideal for building production-ready AI assistants. It is the flagship model in Cohere's Command family, offering the highest performance and largest context window.
Released
March 2024
Input
$3.00 per 1M tokens
Context
128K tokens
Command is Cohere's flagship generative language model optimized for enterprise use cases such as text generation, summarization, and retrieval-augmented generation (RAG). It balances strong performance with cost efficiency, making it suitable for production deployments. Command is distinct for its native support of tool use and multi-step reasoning via chain-of-thought.
Released
March 2023
Input
$1.00 per 1M tokens
Context
4096 tokens
11 active models · 1 archived
Phi-4 Mini is a compact, efficient language model from Microsoft, designed for high-performance reasoning and coding tasks with a 128K token context window. It is part of the Phi-4 family, offering strong capabilities in a smaller footprint suitable for edge and resource-constrained environments. The model emphasizes speed and accuracy for text generation and code understanding.
Released
February 2025
Input
Free (open source)
Context
128K tokens
Phi-4 Multimodal is a compact, efficient multimodal model from Microsoft that processes text, images, and audio inputs. It is designed for on-device and edge scenarios, offering strong performance in vision and speech tasks while maintaining a small footprint. Part of the Phi-4 family, it balances capability with low computational cost.
Released
February 2025
Input
Free (open source)
Context
128K tokens
Phi-4 is a 14B parameter small language model developed by Microsoft, focusing on high-quality reasoning and instruction following. It outperforms larger models on math and coding benchmarks, and is designed for efficient deployment on edge devices. Phi-4 is part of Microsoft's Phi family, emphasizing data quality over scale.
Released
December 2024
Input
Free (open source)
Context
16K tokens
Phi-3.5 Mini is a lightweight, state-of-the-art open model with 3.8B parameters, optimized for high-quality reasoning and instruction following. It is part of Microsoft's Phi-3 family, designed to run efficiently on edge devices while delivering strong performance on benchmarks comparable to larger models.
Released
August 2024
Input
Free (open source)
Context
128K tokens
Phi-3.5 MoE is a lightweight, state-of-the-art Mixture-of-Experts model with 6.6B total parameters and 3.8B active parameters. It excels in multilingual tasks, code generation, and long-context reasoning, outperforming larger models in its class.
Released
August 2024
Input
Free (open source)
Context
128K tokens
Phi-3.5 Vision is a lightweight, state-of-the-art multimodal model that processes both text and images. It excels in reasoning over images, extracting information from charts and tables, and understanding video frames. As part of the Phi-3 family, it offers strong performance in a compact size, suitable for resource-constrained environments.
Released
August 2024
Input
Free (open source)
Context
128K tokens
Phi-3 Medium is a 14B parameter language model from Microsoft, part of the Phi-3 family, designed to deliver strong performance on a wide range of tasks while being efficient enough to run on consumer hardware. It is a compact model that rivals larger models in quality, making it suitable for resource-constrained environments and local deployment.
Released
May 2024
Input
Free (open source)
Context
128K tokens
Phi-3 Small is a 7B parameter language model from Microsoft's Phi-3 family, designed for efficient performance on edge devices and cloud inference. It achieves strong results on benchmarks while being small enough to run on smartphones, making it a cost-effective alternative to larger models.
Released
May 2024
Input
Free (open source)
Context
128K tokens
Phi-3 Mini is a small language model (3.8B parameters) from Microsoft that achieves strong performance on benchmarks comparable to larger models like Mixtral 8x7B and GPT-3.5. It is designed for efficient on-device and edge deployment, with a focus on high-quality training data and instruction tuning.
Released
April 2024
Input
Free (open source)
Context
128K tokens
Phi-2 is a 2.7 billion parameter language model developed by Microsoft Research, designed to demonstrate that smaller models can achieve competitive performance on reasoning and language understanding tasks. It is part of the Phi series, which focuses on training on high-quality synthetic data and filtered web data to maximize efficiency. Phi-2 excels in tasks like math, coding, and common sense reasoning despite its compact size.
Released
December 2023
Input
Free (open source)
Context
2048 tokens
Phi-1.5 is a small language model with 1.3 billion parameters, trained on a blend of synthetic and textbook-quality data. It is designed for efficient reasoning and code generation, achieving strong performance on benchmarks despite its compact size.
Released
September 2023
Input
Free (open source)
Context
2048 tokens
12 active models
Codestral 2025 is Mistral AI's latest code-focused model, optimized for code generation, completion, and reasoning. It offers a 256K context window and is designed for high-performance coding tasks with low latency. This model is part of Mistral's specialized lineup for developers.
Released
May 2025
Input
$0.20 per 1M tokens
Context
256K tokens
Pixtral 12B is Mistral AI's first multimodal model, capable of processing both text and images. It is a 12-billion parameter model that excels at tasks like document understanding, image captioning, and visual question answering. Pixtral 12B is designed to be efficient and accessible, offering strong performance in a compact size.
Released
September 2024
Input
Free (open source)
Context
128K tokens
Ministral 8B is a compact, efficient language model designed for on-device and edge computing applications. It offers strong performance for its size, with a 128k context window and support for multiple languages, making it suitable for tasks like summarization, classification, and retrieval.
Released
October 2024
Input
Free (open source)
Context
128K tokens
Mixtral 8x22B is a sparse mixture-of-experts (MoE) model with 141 billion total parameters, of which 39 billion are active per token. It is Mistral AI's largest open-weight model, offering strong performance across reasoning, coding, and multilingual tasks while maintaining efficiency through its MoE architecture.
Released
April 2024
Input
Free (open source)
Context
65K tokens
Mixtral 8x7B is a sparse mixture-of-experts (MoE) model that achieves high performance with low latency by activating only 12.9B parameters per token. It is Mistral's first open-weight MoE model, offering strong multilingual capabilities and a 32K token context window.
Released
December 2023
Input
Free (open source)
Context
32K tokens
Mistral 7B is a small yet powerful language model that outperforms larger models like Llama 2 13B on many benchmarks. It is designed for efficient deployment and fine-tuning, making it a popular choice for developers seeking high performance with lower computational costs.
Released
September 2023
Input
Free (open source)
Context
8K tokens
Mistral Small 3.1 is a fast, efficient language model optimized for low-latency applications and on-device deployment. It offers strong performance on reasoning, coding, and multilingual tasks while maintaining a small footprint. This model is part of Mistral's 'Small' lineup, balancing quality and speed for cost-sensitive use cases.
Released
March 2025
Input
$0.20 per 1M tokens
Context
128K tokens
Mistral Small 3 is a 24B-parameter dense language model optimized for low-latency, high-throughput inference, offering strong performance at a fraction of the cost of larger models. It is designed to be a fast, efficient alternative for tasks that do not require the full power of frontier models, and it is released under the Apache 2.0 license.
Released
January 2025
Input
Free (open source)
Context
32K tokens
Ministral 3B is a small, efficient language model designed for on-device and edge computing applications. It offers strong performance for its size, with a 128K token context window and support for function calling, making it suitable for low-latency, privacy-sensitive tasks.
Released
October 2024
Input
Free (open source)
Context
128K tokens
Mistral Large 2 is Mistral AI's flagship large language model, offering state-of-the-art reasoning, multilingual support, and advanced coding capabilities. It features a 128K context window and excels in nuanced tasks like instruction following and function calling, positioning itself as a strong competitor to GPT-4 and Claude 3.5 Sonnet.
Released
July 2024
Input
$3.00 per 1M tokens
Context
128K tokens
Pixtral Large is Mistral AI's most advanced multimodal model, combining a 124 billion parameter decoder with a dedicated vision encoder. It excels at understanding text, images, and documents, and is designed for complex reasoning tasks that require both visual and textual understanding.
Released
November 2024
Input
$2.00 per 1M tokens
Context
128K tokens
Mistral NeMo is a state-of-the-art 12B parameter language model developed in collaboration with NVIDIA, offering a large context window of 128K tokens. It is designed for high performance on multilingual tasks, coding, and reasoning, and is available as an open-weight model under the Apache 2.0 license. Mistral NeMo serves as a strong alternative to larger models like Llama 3 70B, providing competitive performance with lower computational requirements.
Released
July 2024
Input
Free (open source)
Context
128K tokens
4 active models
Nemotron-4 340B is NVIDIA's largest and most capable language model, designed for synthetic data generation and training of smaller models. It achieves state-of-the-art performance on benchmarks like MMLU and HumanEval, and is released under an open model license for research and commercial use.
Released
June 2024
Input
Free (open source)
Context
4096 tokens
Llama 3.3 Nemotron Super 49B is a large language model developed by NVIDIA, fine-tuned from Meta's Llama 3.3 70B using NVIDIA's Nemotron-4 340B training pipeline. It offers competitive performance to leading models like GPT-4o and Llama 3.1 405B while being more efficient with 49 billion parameters.
Released
December 2024
Input
Free (open source)
Context
128K tokens
Llama 3.1 Nemotron Nano 8B is a small, efficient language model optimized for low-latency inference on NVIDIA GPUs. It is part of NVIDIA's Nemotron family, designed for edge and real-time applications where speed and resource efficiency are critical.
Released
October 2024
Input
Free (open source)
Context
128K tokens
Llama 3.1 Nemotron Ultra 253B is a large language model developed by NVIDIA, based on Meta's Llama 3.1 architecture with enhancements for improved reasoning and instruction following. It is part of NVIDIA's Nemotron model family, designed for enterprise-grade AI applications requiring high accuracy and reliability.
Released
October 2024
Input
$5.00 per 1M tokens
Context
128K tokens
Use this page as research, then reach out if you need help narrowing model options for a real workflow, team, or budget.