Tasks Tools Workflows Compare

Submit tool Sign in

The editorial directory for AI tools. Curated by humans, organised by what you're actually trying to do.

3,826+ tools indexedUpdated daily

Discover

Tasks
Tools
Workflows
Compare
Alternatives
Editor Picks
Best Tools by Persona
Best Tools by Role
Models
Agents
Stacks
AI News
Newsletter

Company

About
Blog
FAQ
Contact
Editorial Policy
Privacy
Terms

Contribute

Submit Tool
Manage Tool
Request Tool

Stay Updated

New tools, workflows, and AI updates in your inbox.

© 2026 findAIList · All rights reserved.

Some outbound links include referral or affiliate attribution. If you buy through them, FindAIList may earn a commission at no extra cost to you.

Privacy Policy Terms of Service Editorial Policy Refund Policy

v3.0 · editorial

Models

Editorial snapshot — verify vendor pricing before buying

Foundational
LLM Directory

Every model from every major AI lab — grouped by provider, with each lab's current flagship highlighted.

296 models · 14 providers

Filter:

OpenAI

Flagship: GPT-5.5

34 active models · 28 archived

GPT-5.5

GPT-5.5 is OpenAI's most advanced flagship language model, offering state-of-the-art reasoning, creativity, and instruction following. It builds on GPT-5 with enhanced efficiency, lower latency, and improved performance across coding, analysis, and multimodal tasks. As the primary model in OpenAI's lineup, it powers ChatGPT and the API with superior accuracy and speed.

Released

May 2025

Input

$2.50 per 1M tokens

Context

256K tokens

GPT-5

GPT-5 is OpenAI's most advanced large language model, succeeding GPT-4 with significant improvements in reasoning, creativity, and accuracy. It offers enhanced multimodal capabilities, longer context windows, and better alignment with human intent. GPT-5 is designed to handle complex tasks across various domains with state-of-the-art performance.

Input

$10.00 per 1M tokens

Context

256K tokens

GPT-4.5

GPT-4.5 is OpenAI's largest and most capable language model, designed for complex reasoning, creative tasks, and nuanced conversation. It builds on GPT-4 with improved knowledge, emotional intelligence, and reduced hallucinations, while maintaining broad general-purpose performance.

Released

February 2025

Input

$75.00 per 1M tokens

Context

128K tokens

GPT-4.1

GPT-4.1 is OpenAI's latest flagship language model, offering improved performance across coding, instruction following, and long-context tasks. It features a 1M token context window, making it ideal for processing large documents and codebases. GPT-4.1 replaces GPT-4 Turbo as the most capable model in the GPT-4 series.

Released

April 2025

Input

$2.00 per 1M tokens

Context

1,000,000 tokens

GPT-4.1 Mini

GPT-4.1 Mini is a cost-efficient, low-latency model in the GPT-4.1 family, designed for fast and affordable inference while maintaining strong reasoning capabilities. It offers a 1M token context window and supports multimodal inputs (text and images), making it suitable for high-volume, real-time applications.

Released

April 2025

Input

$0.40 per 1M tokens

Context

1M tokens

o4 Mini

o4 Mini is a cost-efficient reasoning model from OpenAI, designed for complex multi-step tasks that require strong reasoning and coding abilities. It offers a balance of performance and speed, making it suitable for applications where deep reasoning is needed but cost is a concern. As a smaller model in the o4 family, it provides faster responses than its larger counterpart while maintaining high accuracy on benchmarks.

Released

April 2025

Input

$1.10 per 1M tokens

Context

200K tokens

GPT-4o-audio-preview

Activemultimodal

GPT-4o-audio-preview is a multimodal model that extends GPT-4o with native audio input and output capabilities, enabling real-time voice interactions and audio processing. It is designed for applications requiring low-latency speech-to-speech or audio understanding, such as voice assistants and audio transcription with reasoning.

Released

October 2024

Input

$5.00 per 1M tokens

Context

128K tokens

GPT-4o-mini-realtime-preview

Activemultimodal

GPT-4o-mini-realtime-preview is a cost-efficient, low-latency multimodal model optimized for real-time voice and text interactions. It supports audio streaming and function calling, making it ideal for conversational AI applications. As a smaller variant of GPT-4o, it balances performance and affordability for production use.

Released

October 2024

Input

$0.60 per 1M tokens

Context

128K tokens

GPT-4o-realtime-preview

Activemultimodal

GPT-4o-realtime-preview is a multimodal model from OpenAI designed for low-latency, real-time interactions, supporting text, audio, and vision inputs. It is optimized for voice conversations and live applications, offering near-instantaneous responses. This model is part of the GPT-4o family, combining advanced reasoning with real-time capabilities.

Released

October 2024

Input

$5.00 per 1M tokens

Context

128K tokens

GPT-4o-search-preview

GPT-4o-search-preview is a variant of GPT-4o that integrates web search capabilities, allowing the model to retrieve and incorporate real-time information from the internet. It is designed for tasks requiring up-to-date knowledge, such as answering questions about recent events or accessing current data. This model combines the conversational and reasoning strengths of GPT-4o with live search functionality.

Released

October 2024

Input

$5.00 per 1M tokens

Context

128K tokens

GPT-4o-mini

GPT-4o-mini is a cost-efficient small model in the GPT-4o family, designed for fast and affordable inference while maintaining strong reasoning and multimodal capabilities. It supports text and vision inputs, making it suitable for a wide range of applications that require high throughput and low latency.

Released

July 2024

Input

$0.15 per 1M tokens

Context

128K tokens

GPT-4o-mini-search-preview

GPT-4o-mini-search-preview is a cost-efficient, fast model from OpenAI that integrates web search capabilities directly into the model's responses. It is designed for applications requiring up-to-date information retrieval without external search APIs, offering a balance of performance and affordability.

Released

July 2024

Input

$0.15 per 1M tokens

Context

128K tokens

GPT-4o

Activemultimodal

GPT-4o ('omni') is OpenAI's flagship multimodal model that accepts text, image, and audio inputs and produces text, image, and audio outputs. It matches GPT-4 Turbo performance on English text and code while being significantly faster and 50% cheaper in API pricing. GPT-4o achieves state-of-the-art results on vision and multilingual benchmarks, and offers improved reasoning over non-English languages.

Released

May 2024

Input

$5.00 per 1M tokens

Context

128K tokens

GPT-4-turbo

GPT-4 Turbo is OpenAI's most capable large language model, offering improved performance and a larger context window compared to GPT-4. It supports vision, function calling, and JSON mode, and is optimized for cost efficiency with lower pricing than its predecessor.

Released

April 2024

Input

$10.00 per 1M tokens

Context

128K tokens

GPT-4-turbo-preview

GPT-4 Turbo Preview is a preview version of OpenAI's GPT-4 Turbo model, offering improved performance and a larger context window of 128K tokens compared to GPT-4. It is designed to be more cost-effective and capable, with knowledge up to April 2023. This model is part of OpenAI's frontier tier, providing advanced reasoning and language understanding.

Released

November 2023

Input

$10.00 per 1M tokens

Context

128K tokens

GPT-4

GPT-4 is OpenAI's most advanced large language model, capable of solving difficult problems with greater accuracy and creativity than previous models. It introduces multimodal capabilities (text and image input) and exhibits human-level performance on various professional and academic benchmarks. GPT-4 is the flagship model in OpenAI's lineup, offering enhanced reasoning, safety, and steerability.

Released

March 2023

Input

$30.00 per 1M tokens

Context

8192 tokens

GPT-3.5-turbo

GPT-3.5-turbo is a fast and cost-effective language model optimized for conversational tasks and chat completions. It offers a balance of performance and affordability, making it suitable for a wide range of applications. As part of OpenAI's GPT-3.5 series, it provides improved instruction following and lower latency compared to earlier models.

Released

March 2023

Input

$0.50 per 1M tokens

Context

16385 tokens

GPT-3.5

GPT-3.5 is a fast and cost-effective language model from OpenAI, optimized for conversational tasks and general-purpose text generation. It powers ChatGPT and offers a balance of performance and affordability, making it suitable for a wide range of applications. As a predecessor to GPT-4, it remains widely used for its speed and lower cost.

Released

March 2022

Input

$0.50 per 1M tokens

Context

16,385 tokens

o3-mini

o3-mini is a cost-efficient reasoning model by OpenAI that excels in STEM tasks, particularly coding and mathematics. It offers adjustable reasoning effort (low, medium, high) to balance speed and accuracy, making it a faster and cheaper alternative to o1-mini while maintaining strong performance. It supports function calling, structured outputs, and developer messages.

Released

January 2025

Input

$1.10 per 1M tokens

Context

200K tokens

o3

o3 is OpenAI's most advanced reasoning model, designed to spend more time thinking before responding. It excels at complex problem-solving, coding, and scientific reasoning, outperforming its predecessor o1 on challenging benchmarks. o3 is part of OpenAI's reasoning model series, offering enhanced capabilities for tasks requiring deep logical deduction.

Released

December 2024

Input

$10.00 per 1M tokens

Context

200K tokens

Embeddings v3 large

Activeembedding

OpenAI's most capable embedding model, offering 3072 dimensions for high-quality text embeddings. It is designed for tasks requiring nuanced semantic understanding, such as retrieval-augmented generation (RAG) and clustering. Part of the v3 family, it balances performance and cost for large-scale embedding needs.

Released

January 2024

Input

$0.13 per 1M tokens

Context

8191 tokens

Embeddings v3 small

Activeembedding

OpenAI's most efficient embedding model, optimized for speed and cost while maintaining strong performance. It is part of the third-generation embedding family, offering a smaller size for lower latency and reduced computational requirements.

Released

January 2024

Input

$0.02 per 1M tokens

Context

8191 tokens

DALL-E 3

DALL-E 3 is OpenAI's latest text-to-image generation model, capable of creating highly detailed and accurate images from natural language descriptions. It integrates natively with ChatGPT for iterative prompt refinement and offers improved composition and detail over its predecessor. The model is designed for creative and professional use, with built-in safety features to prevent harmful content.

Released

October 2023

Input

$0.040 per image (standard), $0.080 per image (HD)

Context

—

DALL-E 2

DALL-E 2 is an AI system that can create realistic images and art from a description in natural language. It offers higher resolution and lower latency than its predecessor, DALL-E, and is capable of generating images with greater fidelity and coherence. DALL-E 2 is part of OpenAI's image generation lineup, now superseded by DALL-E 3.

Released

April 2022

Input

$0.020 per image (1024x1024)

Context

—

o1-pro

o1-pro is a premium reasoning model from OpenAI that extends the o1 series with enhanced inference compute for more reliable and higher-quality responses. It is designed for complex tasks requiring deep reasoning, such as advanced coding, scientific analysis, and multi-step problem solving. As a paid tier of o1, it offers improved performance over the standard o1 model at a higher cost.

Released

December 2024

Input

$150.00 per 1M tokens

Context

200K tokens

o1

o1 is a large language model by OpenAI designed for complex reasoning tasks, using chain-of-thought reasoning to solve difficult problems in science, coding, and math. It is the first model in the o-series, emphasizing deep thinking and accuracy over speed. o1 is available in preview and mini variants, with o1-preview being the full reasoning model and o1-mini a faster, cheaper option for coding and STEM.

Released

September 2024

Input

$15.00 per 1M tokens

Context

128K tokens

o1-mini

o1-mini is a smaller, faster, and more cost-efficient reasoning model from OpenAI, designed to excel at coding and STEM tasks. It uses chain-of-thought reasoning to solve complex problems while being significantly cheaper than the full o1 model. It is part of the o1 series, which emphasizes advanced reasoning capabilities.

Released

September 2024

Input

$3.00 per 1M tokens

Context

128K tokens

o1-preview

o1-preview is a large language model from OpenAI designed for complex reasoning tasks. It uses chain-of-thought reasoning to solve difficult problems in science, coding, and math. As a preview model, it offers advanced reasoning capabilities but lacks some features of GPT-4o like vision and function calling.

Released

September 2024

Input

$15.00 per 1M tokens

Context

128K tokens

TTS-1

TTS-1 is OpenAI's text-to-speech model that converts text into natural-sounding spoken audio. It supports multiple voices and languages, offering low-latency streaming for real-time applications. As part of OpenAI's audio API, it provides a cost-effective solution for generating high-quality speech.

Released

November 2023

Input

$15.00 per 1M characters

Context

—

TTS-1-hd

TTS-1-hd is OpenAI's high-definition text-to-speech model, offering superior audio quality with more natural intonation and clarity compared to the standard TTS-1 model. It supports multiple voices and languages, making it ideal for applications requiring lifelike speech synthesis.

Released

November 2023

Input

$100.00 per 1M characters

Context

—

computer-use-preview

Activemultimodal

computer-use-preview is a specialized model from OpenAI designed to control computer interfaces by interpreting screenshots and performing actions like clicking, typing, and scrolling. It is part of the GPT-4o family and enables AI agents to interact with software applications directly, bridging the gap between language models and GUI automation.

Released

October 2024

Input

$3.00 per 1M tokens

Context

128K tokens

Moderation latest

OpenAI's most capable moderation model, designed to detect harmful content across text and images. It is the successor to the previous moderation models, offering improved accuracy and coverage for policy-violating content. This model is optimized for real-time content moderation use cases.

Released

August 2024

Input

Free

Context

8192 tokens

Moderation stable

OpenAI's Moderation stable model is designed to detect harmful or unsafe content in text. It is optimized for accuracy and reliability in content moderation tasks, helping developers filter inappropriate material. This model is part of OpenAI's moderation endpoint and is the stable version recommended for production use.

Released

August 2023

Input

Free

Context

8192 tokens

Whisper

Whisper is a general-purpose speech recognition model by OpenAI, capable of transcribing speech in multiple languages and translating it into English. It is trained on a large dataset of diverse audio and is robust to accents, background noise, and technical jargon. Whisper is available as an open-source model and also via OpenAI's API.

Released

September 2022

Input

$0.006 per minute (audio input)

Context

—

Archives28— deprecated & retired

GPT-4-1106-preview

deprecatedNovember 2023

GPT-4-vision-preview

legacyNovember 2023

GPT-4-0613

legacyJune 2023

GPT-4-0314

retiredMarch 2023

GPT-4-32k

legacyMarch 2023

GPT-3.5-turbo-1106

legacyNovember 2023

GPT-3.5-turbo-instruct

legacyNovember 2023

GPT-3.5-turbo-0613

legacyJune 2023

GPT-3.5-turbo-16k

legacyJune 2023

GPT-3.5-turbo-0301

retiredMarch 2023

text-davinci-003

retiredNovember 2022

GPT-3

legacyJune 2020

Codex davinci-002

retiredMarch 2023

Embeddings v2 ada

legacyDecember 2022

text-davinci-002

retiredMarch 2022

text-curie-001

legacyMarch 2022

text-babbage-001

retiredSeptember 2021

Codex cushman-001

retiredAugust 2021

text-ada-001

retiredJune 2020

text-davinci-001

retiredJune 2020

babbage

legacyMarch 2023

curie

legacyMarch 2023

ada

legacyDecember 2022

InstructGPT ada

retiredJanuary 2022

InstructGPT babbage

deprecatedJanuary 2022

InstructGPT curie

deprecatedJanuary 2022

InstructGPT davinci

retiredJanuary 2022

davinci

retiredJune 2020

Anthropic

Flagship: Claude Opus 4.7

16 active models · 14 archived

Claude Opus 4.7

Claude Opus 4.7 is Anthropic's most powerful flagship model, offering state-of-the-art performance across reasoning, coding, and creative tasks. It introduces extended thinking capabilities and is designed for complex enterprise workloads requiring deep analysis and high accuracy.

Released

May 2025

Input

$15.00 per 1M tokens

Context

200K tokens

Claude Opus 4.8

Claude Opus 4.8 is Anthropic's most advanced and capable model, designed for complex reasoning, coding, and creative tasks. It represents the pinnacle of the Claude lineup with superior performance across benchmarks. This model excels in nuanced understanding, long-context analysis, and generating high-quality outputs.

Input

$15.00 per 1M tokens

Context

200K tokens

Claude Opus 4.6

Claude Opus 4.6 is Anthropic's most advanced and capable model, designed for complex reasoning, creative tasks, and nuanced conversation. It represents the pinnacle of the Claude lineup, offering superior performance across a wide range of benchmarks. This model excels in deep analysis, coding, and handling intricate instructions with high accuracy.

Input

$15.00 per 1M tokens

Context

200K tokens

Claude Sonnet 4.6

Claude Sonnet 4.6 is a hypothetical model not officially released by Anthropic. As of now, Anthropic's lineup includes Claude 3.5 Sonnet and Claude 3 Sonnet. This entry is based on a fictional version number.

Input

$3.00 per 1M tokens

Context

200K tokens

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's fastest and most cost-effective model, designed for high-throughput, low-latency applications. It offers a balance of speed and intelligence, making it ideal for real-time tasks like customer support and content moderation. As the latest iteration in the Haiku family, it provides improved performance over its predecessor while maintaining competitive pricing.

Released

March 2025

Input

$0.80 per 1M tokens

Context

200K tokens

Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic's most intelligent model, offering state-of-the-art performance across a wide range of tasks including coding, analysis, and creative writing. It balances high capability with improved speed and efficiency, making it suitable for complex enterprise applications.

Released

February 2025

Input

$3.00 per 1M tokens

Context

200K tokens

Claude Haiku 4.0

Claude Haiku 4.0 is Anthropic's fastest and most cost-effective model in the Claude 4 family, designed for high-throughput, low-latency applications. It offers a strong balance of speed and intelligence, making it ideal for simple queries, classification, and real-time chat. As the successor to Claude 3 Haiku, it delivers improved performance across benchmarks while maintaining competitive pricing.

Released

May 2025

Input

$0.80 per 1M tokens

Context

200K tokens

Claude Opus 4.0

Claude Opus 4.0 is Anthropic's most powerful and intelligent model, designed for complex reasoning, coding, and analysis tasks. It excels at nuanced understanding, creative problem-solving, and maintaining long context coherence. As the flagship model in the Claude lineup, it sets the standard for performance across a wide range of benchmarks.

Released

May 2024

Input

$15.00 per 1M tokens

Context

200K tokens

Claude Sonnet 4.0

Claude Sonnet 4.0 is Anthropic's mid-tier model balancing performance and cost, offering strong reasoning, coding, and multilingual capabilities. It is designed for high-throughput enterprise applications requiring reliable, safe AI interactions. Sonnet 4.0 sits between the faster Haiku and the more powerful Opus in the Claude 4 family.

Released

May 2024

Input

$3.00 per 1M tokens

Context

200K tokens

Claude 3.7 Sonnet

Claude 3.7 Sonnet is Anthropic's most intelligent model to date, combining rapid responses with extended thinking for complex tasks. It is a hybrid reasoning model that can produce near-instant answers or engage in deep, step-by-step reasoning, making it ideal for coding, math, and nuanced analysis.

Released

February 2025

Input

$3.00 per 1M tokens

Context

200K tokens

Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic's fastest model in the Claude 3.5 family, offering low latency and high throughput for everyday tasks. It combines speed with strong performance on coding, reasoning, and language tasks, making it ideal for real-time applications and cost-sensitive deployments.

Released

October 2024

Input

$0.80 per 1M tokens

Context

200K tokens

Claude 3.5 Sonnet

Claude 3.5 Sonnet is Anthropic's most intelligent model, balancing speed and cost while outperforming Claude 3 Opus on key benchmarks. It excels at nuanced tasks like coding, writing, and analysis with improved instruction following and safety.

Released

June 2024

Input

$3.00 per 1M tokens

Context

200K tokens

Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest and most compact model in the Claude 3 family, designed for near-instant responsiveness and high throughput. It excels at lightweight tasks like customer service, content moderation, and real-time chat, offering a balance of speed and intelligence at a lower cost.

Released

March 2024

Input

$0.25 per 1M tokens

Context

200K tokens

Claude 3 Opus

Claude 3 Opus is Anthropic's most powerful model, offering state-of-the-art performance on complex tasks with deep reasoning, analysis, and coding. It is the flagship model in the Claude 3 family, designed for maximum intelligence and reliability.

Released

March 2024

Input

$15.00 per 1M tokens

Context

200K tokens

Claude 3 Sonnet

Claude 3 Sonnet is a balanced model in Anthropic's Claude 3 family, offering a strong combination of speed and intelligence for most enterprise workloads. It is ideal for tasks requiring nuanced understanding, rapid responses, and reliable performance at a lower cost than Opus.

Released

March 2024

Input

$3.00 per 1M tokens

Context

200K tokens

Claude Mythos Preview

Claude Mythos Preview is Anthropic's most advanced and capable model, designed for complex reasoning, creative tasks, and nuanced dialogue. It represents a significant leap in performance across coding, mathematics, and multilingual understanding, while maintaining Anthropic's focus on safety and alignment. This preview offers early access to the next generation of Claude's capabilities.

Released

May 2025

Input

$15.00 per 1M tokens

Context

200K tokens

Archives14— deprecated & retired

Claude Fable 5

Claude Mythos 5

Claude 2.1

legacyNovember 2023

Claude 2 (Legacy)

legacyJuly 2023

Claude 2.0

deprecatedJuly 2023

Claude 1.3

deprecatedMarch 2023

Claude Instant 1.2

deprecatedAugust 2023

Claude 1.2

deprecatedMarch 2023

Claude Instant 1.1

deprecatedFebruary 2024

Claude 1.1

deprecatedMarch 2023

Claude Instant 1 (Legacy)

legacyAugust 2023

Claude Instant 1.0

deprecatedAugust 2023

Claude 1 (Legacy)

retiredMarch 2023

Claude 1.0

retiredMarch 2023

Stability AI

Flagship: Stable Diffusion 3.5 Large

14 active models · 5 archived

Stable Diffusion 3.5 Large

Latest Activeimage

Stable Diffusion 3.5 Large is Stability AI's most advanced text-to-image model, offering 8.1 billion parameters for high-quality, photorealistic image generation. It excels in prompt adherence, typography, and generating diverse outputs, making it the flagship model in the Stable Diffusion 3.5 family.

Released

October 2024

Input

Free (open source)

Context

—

Stable LM 7B

Stable LM 7B is a compact, efficient language model designed for fast inference and accessibility. It balances performance with low computational requirements, making it suitable for a wide range of text generation tasks. As part of Stability AI's open model lineup, it offers a lightweight alternative to larger models.

Released

May 2024

Input

Free (open source)

Context

4096 tokens

Stable Diffusion 3.5 Large Turbo

Stable Diffusion 3.5 Large Turbo is a distilled version of SD3.5 Large, offering faster inference with 4 steps instead of the standard 28, while maintaining high image quality. It is part of Stability AI's latest generation of text-to-image models, optimized for speed and efficiency.

Released

October 2024

Input

—

Context

—

Stable Diffusion 3.5 Medium

Stable Diffusion 3.5 Medium is a text-to-image model with 2.5 billion parameters, offering high-quality image generation with improved prompt adherence and typography. It is designed to run efficiently on consumer hardware, making advanced AI image generation accessible to a wider audience.

Released

October 2024

Input

Free (open source)

Context

—

Stable Diffusion 3.5 Turbo

Stable Diffusion 3.5 Turbo is a distilled version of Stable Diffusion 3.5 Large, optimized for faster inference while maintaining high image quality. It uses a Multimodal Diffusion Transformer (MMDiT) architecture and supports text-to-image generation with improved prompt adherence and typography.

Released

October 2024

Input

—

Context

—

Stable Diffusion 3

Stable Diffusion 3 is a state-of-the-art text-to-image model that excels in generating high-quality, photorealistic images with improved typography and complex prompt understanding. It builds on the success of its predecessors with a new diffusion transformer architecture and is available in multiple sizes (800M to 8B parameters) to suit different use cases.

Released

June 2024

Input

—

Context

—

Stable Code Instruct 3B

Stable Code Instruct 3B is a 3 billion parameter instruction-tuned language model specialized for code generation and software development tasks. It is designed to be lightweight and efficient, offering strong performance for code completion, explanation, and debugging while running on consumer-grade hardware.

Released

May 2024

Input

Free (open source)

Context

4096 tokens

Stable Code 3B

Stable Code 3B is a 3 billion parameter large language model specialized for code generation and completion. It is designed to be lightweight and efficient, offering strong performance on code tasks while being deployable on consumer-grade hardware. It is part of Stability AI's lineup of open models for developers.

Released

January 2024

Input

Free (open source)

Context

8192 tokens

Stable Audio 2.0

Stable Audio 2.0 is a generative audio model that produces high-quality, full-length music tracks (up to 3 minutes) from text prompts and optional reference audio. It introduces audio-to-audio generation, allowing users to transform existing audio clips, and supports stereo output at 44.1 kHz. The model is designed for musicians, content creators, and developers seeking efficient AI-powered music production.

Released

April 2024

Input

—

Context

—

Stable LM 2 1.6B

Stable LM 2 1.6B is a compact, efficient language model designed for on-device and edge deployment. It offers strong performance for its size, with a focus on accessibility and low computational cost.

Released

March 2024

Input

Free (open source)

Context

4096 tokens

Stable LM 2 12B

Stable LM 2 12B is a 12-billion parameter language model developed by Stability AI, designed for efficient text generation and understanding. It is part of the Stable LM 2 family, offering a balance of performance and speed for a wide range of natural language tasks. The model is available under an open-source license, making it accessible for research and commercial use.

Released

March 2024

Input

Free (open source)

Context

4096 tokens

Stable Audio 1.0

Stable Audio 1.0 is a latent diffusion model for generating high-quality, variable-length audio from text prompts. It can produce music, sound effects, and ambient audio up to 90 seconds in length. The model is designed for creative professionals and developers seeking controllable audio generation.

Released

September 2023

Input

—

Context

—

Stable Video Diffusion

Stable Video Diffusion is a generative AI model that converts static images into short video clips. It is built on the latent diffusion architecture and is designed for research and creative applications, offering controllable video generation from a single input image.

Released

November 2023

Input

Free (open source)

Context

—

Stable Diffusion XL

Stable Diffusion XL (SDXL) is a state-of-the-art text-to-image generation model that produces high-resolution, photorealistic images with improved composition and detail compared to its predecessor. It uses a larger UNet backbone and a two-stage pipeline (base + refiner) for enhanced image quality. SDXL is the flagship image model in Stability AI's lineup, offering superior prompt adherence and aesthetic quality.

Released

July 2022

Input

Free (open source)

Context

—

Archives5— deprecated & retired

Stable LM 3B

Stable Diffusion 2.1

legacyDecember 2022

Stable Diffusion 2.0

legacyNovember 2022

Stable Diffusion 1.5

legacyOctober 2022

Stable Diffusion 1.4

legacyAugust 2022

AI21 Labs

Flagship: Jamba 1.6 Large

9 active models · 2 archived

Jamba 1.6 Large

Jamba 1.6 Large is AI21 Labs' flagship model, combining a hybrid SSM-Transformer architecture for high performance and efficiency. It excels at long-context tasks with a 256K token context window and supports structured outputs like JSON mode and function calling. This model is designed for enterprise-grade reasoning, coding, and multilingual applications.

Released

February 2025

Input

$5.00 per 1M tokens

Context

256K tokens

AI21 Maestro

AI21 Maestro is a state-of-the-art large language model by AI21 Labs, designed for complex reasoning and instruction following. It excels in tasks requiring deep understanding and generation of text, positioning itself as a competitive alternative to models like GPT-4 and Claude 3. Maestro is part of AI21's Jamba family, leveraging a hybrid architecture combining Mamba and Transformer components.

Released

May 2024

Input

$5.00 per 1M tokens

Context

256K tokens

Jurassic-2 Ultra

Jurassic-2 Ultra is AI21 Labs' most powerful large language model, designed for complex reasoning, long-form content generation, and advanced instruction following. It excels at tasks requiring deep understanding and nuanced output, positioning itself as a top-tier model in the Jurassic-2 family.

Released

May 2023

Input

$5.00 per 1M tokens

Context

8192 tokens

Jurassic-2 Light

Jurassic-2 Light is a fast, cost-effective language model from AI21 Labs, optimized for high-throughput applications requiring quick responses. It is the smallest and most efficient model in the Jurassic-2 family, offering a balance of speed and quality for simple tasks.

Released

March 2023

Input

$0.30 per 1M tokens

Context

8192 tokens

Jurassic-2 Mid

Jurassic-2 Mid is a mid-sized language model from AI21 Labs, offering a balance of performance and cost. It is part of the Jurassic-2 family, designed for a wide range of natural language tasks with high accuracy and speed. The model is optimized for efficient inference and is suitable for applications requiring quick responses.

Released

March 2023

Input

$0.50 per 1M tokens

Context

8192 tokens

Jamba 1.6 Mini

Jamba 1.6 Mini is a lightweight, efficient large language model from AI21 Labs, optimized for speed and cost-effectiveness. It is part of the Jamba family, which uses a hybrid SSM-Transformer architecture for long context handling. This model is designed for high-throughput, low-latency applications where budget is a concern.

Released

August 2024

Input

$0.20 per 1M tokens

Context

256K tokens

Jamba 1.5 Large

Jamba 1.5 Large is a state-of-the-art large language model from AI21 Labs, built on a hybrid SSM-Transformer architecture that combines Mamba structured state space models with traditional transformer layers. It offers a 256K token context window and excels in long-context tasks, multilingual support, and structured output generation.

Released

August 2024

Input

$5.00 per 1M tokens

Context

256K tokens

Jamba 1.5 Mini

Jamba 1.5 Mini is a lightweight, efficient large language model from AI21 Labs, designed for high-speed inference and cost-effective deployment. It leverages a hybrid architecture combining Mamba (state space model) and Transformer components, achieving strong performance on a variety of tasks with a 256K token context window. As the smaller sibling of Jamba 1.5 Large, it offers a balance of speed and capability for production use cases.

Released

August 2024

Input

$0.20 per 1M tokens

Context

256K tokens

Jamba-Instruct

Jamba-Instruct is a hybrid SSM-Transformer model by AI21 Labs, combining Mamba structured state space with transformer attention for efficient long-context processing. It offers a 256K token context window and is optimized for instruction-following tasks, making it suitable for enterprise applications requiring high throughput and low latency.

Released

March 2024

Input

$0.50 per 1M tokens

Context

256K tokens

Archives2— deprecated & retired

Jurassic-1 Large

legacySeptember 2021

Jurassic-1 Jumbo

legacyAugust 2021

Black Forest Labs

Flagship: FLUX.2

9 active models

FLUX.2

Latest Activeimage

FLUX.2 is Black Forest Labs' flagship image generation model, succeeding FLUX.1. It introduces enhanced prompt adherence, improved image quality, and faster generation speeds. The model excels in rendering complex scenes with accurate text and fine details.

Released

November 2024

Input

—

Context

—

FLUX.2 [klein]

FLUX.2 [klein] is a fast, efficient image generation model from Black Forest Labs, optimized for speed and lower computational cost while maintaining high-quality outputs. It is part of the FLUX.2 family, offering a balance between performance and resource usage.

Released

November 2024

Input

—

Context

—

FLUX1.1 [pro]

FLUX1.1 [pro] is the latest and most advanced text-to-image model from Black Forest Labs, offering state-of-the-art image generation with improved speed, quality, and prompt adherence. It builds upon the FLUX.1 architecture with enhanced performance and efficiency, making it ideal for professional creative workflows.

Released

October 2024

Input

—

Context

—

FLUX1.1 [pro] Ultra

FLUX1.1 [pro] Ultra is the flagship image generation model from Black Forest Labs, offering the highest quality and detail in the FLUX family. It builds upon FLUX1.1 [pro] with enhanced resolution and fidelity, making it ideal for professional creative work.

Released

October 2024

Input

—

Context

—

FLUX.1 Fill

FLUX.1 Fill is a state-of-the-art inpainting and outpainting model by Black Forest Labs, capable of seamlessly filling in missing or extending parts of images with high coherence. It builds on the FLUX.1 architecture, offering superior image completion quality compared to previous models.

Released

November 2024

Input

—

Context

—

FLUX.1 Tools [dev]

FLUX.1 Tools [dev] is a suite of image generation tools built on the FLUX.1 architecture, offering capabilities like inpainting, outpainting, and depth-conditional generation. It is designed for developers who need flexible and controllable image synthesis with high fidelity.

Released

November 2024

Input

Free (open source)

Context

—

FLUX.1 [dev]

FLUX.1 [dev] is a 12-billion-parameter text-to-image model that excels in generating high-quality, diverse images with exceptional prompt adherence. It is the open-weight variant of the FLUX.1 suite, offering a balance between performance and accessibility for developers and researchers.

Released

August 2024

Input

Free (open source)

Context

—

FLUX.1 Kontext

FLUX.1 Kontext is an advanced image generation model by Black Forest Labs that excels at generating images with complex scenes and multiple characters while maintaining coherent context. It builds upon the FLUX.1 architecture with enhanced capabilities for handling detailed prompts and maintaining consistency across generated elements.

Released

August 2024

Input

Free (open source)

Context

—

FLUX.1 Kontext [dev]

FLUX.1 Kontext [dev] is a text-to-image model by Black Forest Labs that excels at generating images with complex scenes and multiple characters while maintaining high fidelity to the prompt. It is part of the FLUX.1 family, offering a balance between quality and efficiency for developers.

Released

August 2024

Input

—

Context

—

xAI

Flagship: Grok 4.3

6 active models

Grok 4.3

Grok 4.3 is xAI's flagship frontier model, offering state-of-the-art reasoning, coding, and multimodal capabilities. It is designed for complex problem-solving and real-time data analysis, leveraging a massive context window and advanced tool use. As the latest iteration in the Grok series, it sets the benchmark for xAI's offerings.

Released

May 2025

Input

$3.00 per 1M tokens

Context

1M tokens

Grok 3

Grok 3 is xAI's most advanced large language model, offering state-of-the-art reasoning, coding, and multimodal capabilities. It is designed to be a highly capable and truthful AI assistant, with a focus on real-time knowledge and a humorous, engaging personality. Grok 3 represents a significant leap over its predecessor, Grok 2, in terms of performance and features.

Released

February 2025

Input

$3.00 per 1M tokens

Context

1M tokens

Grok 2

Grok 2 is a state-of-the-art language model by xAI, offering advanced reasoning, coding, and multimodal capabilities. It features a 128K context window and is designed for complex problem-solving and real-time information retrieval via X (formerly Twitter). Grok 2 represents xAI's flagship model, competing with leading frontier models from OpenAI and Google.

Released

August 2024

Input

$2.00 per 1M tokens

Context

128K tokens

Grok Build 0.1

Grok Build 0.1 is an early preview model from xAI, designed for fast and efficient text generation. It is a lightweight variant of the Grok family, optimized for quick responses and lower computational cost. This model serves as a stepping stone in xAI's development of more advanced AI systems.

Released

May 2024

Input

$0.15 per 1M tokens

Context

8192 tokens

Grok Voice API

Grok Voice API is a real-time speech-to-text and text-to-speech API from xAI, enabling voice interactions with the Grok assistant. It supports multiple languages and low-latency streaming, designed for conversational AI applications.

Released

February 2025

Input

$0.06 per minute (speech-to-text)

Context

—

Grok Imagine API

Grok Imagine API is an image generation model by xAI that creates images from text prompts. It is designed for fast, high-quality image synthesis and integrates with the Grok ecosystem. This model represents xAI's entry into multimodal generation, complementing their text-based Grok models.

Released

December 2024

Input

—

Context

—

DeepSeek

Flagship: DeepSeek-V4-Flash

5 active models

DeepSeek-V4-Flash

DeepSeek-V4-Flash is DeepSeek's primary flagship model, optimized for fast and cost-effective inference. It offers a massive 1M token context window and supports multimodal inputs including images and audio. As the latest iteration, it balances high performance with significantly lower pricing compared to its predecessor.

Released

May 2025

Input

$0.05 per 1M tokens

Context

1M tokens

DeepSeek-V4-Pro

DeepSeek-V4-Pro is the latest flagship model from DeepSeek, offering state-of-the-art performance across a wide range of tasks including reasoning, coding, and general knowledge. It builds upon the success of previous DeepSeek models with enhanced capabilities and efficiency.

Input

$2.00 per 1M tokens

Context

128K tokens

DeepSeek V3

DeepSeek V3 is a powerful Mixture-of-Experts (MoE) language model with 671B total parameters, activated 37B per token. It achieves top-tier performance on benchmarks like MMLU and HumanEval, rivaling leading closed-source models. It is designed for efficient inference and supports a 128K context window.

Released

December 2024

Input

$0.27 per 1M tokens

Context

128K tokens

DeepSeek Coder V2

DeepSeek Coder V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT-4 Turbo in coding tasks. It supports 338 programming languages and features a 128K context window, making it one of the most capable code-specific models available.

Released

June 2024

Input

$0.14 per 1M tokens

Context

128K tokens

DeepSeek R1

DeepSeek R1 is a reasoning-focused large language model that excels in complex problem-solving, mathematics, and coding tasks. It uses a Mixture-of-Experts architecture with 671B total parameters (37B activated) and features a 128K token context window. The model is open-source under the MIT license and offers competitive pricing.

Released

January 2025

Input

$0.55 per 1M tokens (cache hit), $2.19 per 1M tokens (cache miss)

Context

128K tokens

Google

41 active models · 8 archived

Gemini 3.5 Flash

Activemultimodal

Gemini 3.5 Flash is a fast, cost-efficient multimodal model from Google, optimized for high-throughput tasks. It supports text, image, audio, and video inputs, making it suitable for a wide range of applications. As part of the Gemini family, it balances performance and affordability.

Input

$0.075 per 1M tokens

Context

1M tokens

Gemini 3.5 Live Translate

Activemultimodal

Gemini 3.5 Live Translate is a specialized model from Google designed for real-time speech translation and transcription. It combines low-latency streaming with high accuracy across multiple languages, making it ideal for live conversations and events. This model is part of Google's Gemini family, optimized for speed and efficiency in translation tasks.

Input

$0.50 per 1M tokens

Context

32K tokens

Gemini 3.1 Flash Live

Activemultimodal

Gemini 3.1 Flash Live is a low-latency multimodal model from Google optimized for real-time interactions, including live audio and video streaming. It is part of the Gemini 3.1 Flash family, designed for fast, cost-effective performance with a 1M token context window.

Released

May 2025

Input

$0.15 per 1M tokens

Context

1M tokens

Veo 3.1 Lite Preview

Veo 3.1 Lite Preview is a lightweight, fast video generation model from Google, optimized for rapid prototyping and iterative content creation. It offers a more cost-effective and quicker alternative to the full Veo 3.1 model while maintaining high-quality video output. This preview version allows developers to experiment with video generation capabilities before the stable release.

Released

May 2025

Input

—

Context

—

Veo 3.1 Preview

Veo 3.1 Preview is Google's latest video generation model, capable of producing high-quality, realistic videos from text prompts. It offers enhanced temporal consistency, improved motion understanding, and supports longer video durations compared to its predecessor. This model is part of Google's Vertex AI and Gemini API ecosystem, targeting creative professionals and developers.

Released

May 2025

Input

—

Context

—

Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS is a text-to-speech model from Google that generates natural-sounding speech from text input. It is part of the Gemini Flash family, optimized for low-latency and cost-effective audio generation. The model supports multiple voices and languages, making it suitable for various voice applications.

Input

—

Context

—

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is a lightweight, cost-efficient model in Google's Gemini lineup, optimized for high-throughput, low-latency tasks. It offers a balance of speed and quality for simple, high-volume applications.

Input

$0.075 per 1M tokens

Context

1M tokens

Gemini 3.1 Pro

Gemini 3.1 Pro is Google's most advanced large language model, offering state-of-the-art performance across a wide range of tasks including reasoning, coding, and multimodal understanding. It features an extremely long context window and is designed for complex enterprise applications.

Input

$1.25 per 1M tokens

Context

2M tokens

Gemma 3

Gemma 3 is a lightweight, state-of-the-art open model from Google, built from the same research and technology used to create the Gemini models. It is designed for responsible AI development and offers strong performance across a variety of text generation tasks, with support for function calling and structured outputs.

Released

March 2025

Input

Free (open source)

Context

32K tokens

Gemma 3 12B

Gemma 3 12B is a lightweight, open-source language model from Google, designed for efficient deployment on consumer hardware. It offers strong performance for its size, with support for multilingual text generation and a 128K context window. It is part of the Gemma 3 family, which includes 1B, 4B, and 12B parameter variants.

Released

March 2025

Input

Free (open source)

Context

128K tokens

Gemma 3 1B

Gemma 3 1B is a lightweight, open-source language model from Google, designed for efficient on-device and edge deployment. It offers strong performance for its size, with a 1 billion parameter count and support for a 128K token context window.

Released

March 2025

Input

Free (open source)

Context

128K tokens

Gemma 3 27B

Gemma 3 27B is Google's latest open model in the Gemma family, offering strong performance for its size with a 128K context window and multilingual support. It is designed for efficient deployment on consumer hardware while delivering competitive results on reasoning, coding, and instruction-following tasks.

Released

March 2025

Input

Free (open source)

Context

128K tokens

Gemma 3 7B

Gemma 3 7B is a lightweight, state-of-the-art open model from Google, built from the same research and technology used to create the Gemini models. It is designed for efficient deployment on resource-constrained environments like laptops and edge devices, offering strong performance for text generation and understanding tasks.

Released

March 2025

Input

Free (open source)

Context

8K tokens

Imagen 3

Imagen 3 is Google's most advanced text-to-image model, capable of generating photorealistic images with exceptional detail and composition. It offers improved understanding of natural language prompts and better handling of text rendering within images. As the successor to Imagen 2, it sets a new standard for image generation quality in Google's lineup.

Released

August 2024

Input

—

Context

—

Gemini 3

Activemultimodal

Gemini 3 is Google's most advanced multimodal AI model, designed to understand and generate text, images, audio, and video. It builds upon the capabilities of Gemini 2 with enhanced reasoning, coding, and creative abilities. As the flagship model in the Gemini lineup, it offers state-of-the-art performance across a wide range of tasks.

Input

$5.00 per 1M tokens

Context

1M tokens

Gemini 3 Flash

Gemini 3 Flash is a lightweight, fast, and cost-efficient model from Google, optimized for high-throughput tasks and real-time applications. It balances performance and speed, making it ideal for simple reasoning, summarization, and chat use cases. As part of the Gemini 3 family, it offers lower latency and reduced pricing compared to larger models.

Input

$0.10 per 1M tokens

Context

1M tokens

Gemini 2.5 Flash

Activemultimodal

Gemini 2.5 Flash is a cost-efficient, low-latency multimodal model designed for high-volume production workloads. It balances speed and quality, offering strong reasoning capabilities with reduced cost compared to Gemini 2.5 Pro. It supports text, image, audio, and video inputs with a 1M token context window.

Released

May 2025

Input

$0.15 per 1M tokens

Context

1M tokens

Gemini 2.5 Flash Experimental

Activemultimodal

Gemini 2.5 Flash Experimental is a fast, cost-efficient multimodal model from Google, designed for high-volume, low-latency applications. It balances performance and affordability, making it suitable for real-time tasks like chat, summarization, and content generation. As an experimental model, it offers early access to the latest capabilities in the Gemini 2.5 Flash family.

Released

May 2025

Input

$0.15 per 1M tokens

Context

1M tokens

Gemini 2.5 Flash Live Preview

Activemultimodal

Gemini 2.5 Flash Live Preview is a fast, cost-efficient multimodal model from Google, optimized for low-latency, real-time applications. It supports text, image, audio, and video inputs with a 1M token context window, making it ideal for interactive and streaming use cases. As a preview release, it offers cutting-edge performance at a lower price point than Gemini 2.5 Pro.

Released

May 2025

Input

$0.15 per 1M tokens

Context

1M tokens

Gemini 2.5 Flash TTS Preview

Gemini 2.5 Flash TTS Preview is a text-to-speech model that generates natural-sounding speech from text input. It is part of the Gemini 2.5 Flash family, optimized for low-latency, high-quality audio generation. This preview model allows developers to integrate expressive speech synthesis into applications.

Released

May 2025

Input

—

Context

—

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is a cost-efficient, low-latency model in the Gemini 2.5 family, designed for high-volume, simple tasks. It offers the lowest cost and fastest speed among Gemini 2.5 models while maintaining a 1M token context window. It is ideal for use cases that require quick responses and minimal processing.

Released

May 2025

Input

$0.075 per 1M tokens

Context

1M tokens

Gemini 2.5 Pro TTS Preview

Activemultimodal

Gemini 2.5 Pro TTS Preview is a multimodal model from Google that adds text-to-speech (TTS) capabilities to the Gemini 2.5 Pro reasoning model. It can generate spoken audio responses from text, enabling natural voice interactions. This preview model is designed for applications requiring high-quality, expressive speech synthesis.

Released

May 2025

Input

$1.25 per 1M tokens

Context

1M tokens

Gemini 2.5 Pro

Gemini 2.5 Pro is Google's most advanced reasoning model, designed for complex tasks requiring deep thought and multi-step planning. It introduces a 'thinking' capability that allows the model to reason through problems before generating responses, leading to improved accuracy and performance on coding, math, and science benchmarks. It is the first model in the Gemini 2.5 family, setting a new standard for reasoning in Google's lineup.

Released

March 2025

Input

$1.25 per 1M tokens

Context

1M tokens

Gemini 2.5 Pro Experimental

Gemini 2.5 Pro Experimental is Google's most advanced reasoning model, featuring a 1M token context window and native multimodal capabilities. It excels at complex tasks like coding, mathematics, and scientific reasoning, with a 'thinking' mode that improves accuracy. This model represents a significant leap in reasoning performance for the Gemini lineup.

Released

March 2025

Input

$1.25 per 1M tokens

Context

1M tokens

Gemini 2.0 Flash Lite

Activemultimodal

Gemini 2.0 Flash Lite is a cost-efficient, low-latency model optimized for high-volume, text-only tasks. It offers a 1M token context window and is designed to be the most affordable option in the Gemini 2.0 Flash family, ideal for scaling AI applications.

Released

February 2025

Input

$0.075 per 1M tokens

Context

1M tokens

Gemini 2.0 Flash

Activemultimodal

Gemini 2.0 Flash is Google's next-generation multimodal model offering low latency and enhanced performance across text, image, audio, and video inputs. It introduces native tool use, including Google Search grounding and code execution, making it ideal for agentic and real-time applications.

Released

December 2024

Input

$0.10 per 1M tokens

Context

1M tokens

Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is an experimental model that uses chain-of-thought reasoning to improve performance on complex tasks. It is designed to think through problems step-by-step, making it particularly effective for math, coding, and logic. It is part of the Gemini 2.0 Flash family, optimized for speed and efficiency.

Released

December 2024

Input

$0.10 per 1M tokens

Context

1M tokens

Veo 2

Veo 2 is Google's most advanced video generation model, capable of producing high-quality, realistic videos from text prompts. It offers improved understanding of real-world physics, camera controls, and cinematic effects, setting a new standard in AI video generation. Veo 2 is part of Google's VideoFX suite and is available through VideoFX and Google Labs.

Released

December 2024

Input

—

Context

—

Gemma 2

Gemma 2 is a family of lightweight, state-of-the-art open models from Google, available in 2B, 9B, and 27B parameter sizes. It builds on the original Gemma with significant performance improvements, including better reasoning, coding, and safety benchmarks. Gemma 2 is designed for developers and researchers who need efficient, high-quality language models for a wide range of tasks.

Released

June 2024

Input

Free (open source)

Context

8192 tokens

Gemma 2 27B

Gemma 2 27B is a lightweight, open-source language model from Google, built on the same research and technology as the Gemini models. It offers strong performance for its size, excelling in reasoning, coding, and safety benchmarks. As part of the Gemma 2 family, it provides a balance of efficiency and capability for developers and researchers.

Released

June 2024

Input

Free (open source)

Context

8192 tokens

Gemma 2 2B

Gemma 2 2B is a lightweight, open-source language model from Google, designed for efficient deployment on resource-constrained devices. It is part of the Gemma 2 family, offering strong performance for its size with a focus on accessibility and customization.

Released

June 2024

Input

Free (open source)

Context

8192 tokens

Gemma 2 9B

Gemma 2 9B is a lightweight, open-source language model from Google, designed for efficient text generation and understanding. It balances performance and accessibility, making it suitable for a wide range of NLP tasks with a 9 billion parameter count.

Released

June 2024

Input

Free (open source)

Context

8192 tokens

Gemini 1.5 Flash-8B

Activemultimodal

Gemini 1.5 Flash-8B is a smaller, faster, and more cost-efficient variant of the Gemini 1.5 Flash model, optimized for high-volume, low-latency tasks. It retains multimodal capabilities (text, image, audio, video) and a 1M token context window, making it ideal for applications requiring quick responses and reduced computational cost.

Released

October 2024

Input

$0.0375 per 1M tokens

Context

1M tokens

Gemini 1.5 Flash 002

Activemultimodal

Gemini 1.5 Flash 002 is a fast and efficient multimodal model from Google, optimized for high-volume, low-latency tasks. It balances performance and cost, making it ideal for applications requiring quick responses across text, image, audio, and video inputs. As part of the Gemini 1.5 family, it offers a large context window and competitive pricing.

Released

September 2024

Input

$0.075 per 1M tokens

Context

1,048,576 tokens

Gemini 1.5 Pro 002

Activemultimodal

Gemini 1.5 Pro 002 is Google's most capable general-purpose model, featuring a 2 million token context window and native multimodal understanding across text, images, audio, and video. It offers improved performance and lower latency compared to its predecessor, making it suitable for complex reasoning and large-scale data analysis.

Released

September 2024

Input

$1.25 per 1M tokens (up to 128K tokens), $2.50 per 1M tokens (over 128K tokens)

Context

2M tokens

Gemini 1.5 Flash

Activemultimodal

Gemini 1.5 Flash is a lightweight, fast, and cost-efficient multimodal model designed for high-volume, latency-sensitive applications. It is optimized for tasks like summarization, chat, and image/video captioning, offering a balance of performance and speed. As part of the Gemini 1.5 family, it supports a 1 million token context window and native multimodal inputs.

Released

May 2024

Input

$0.075 per 1M tokens

Context

1M tokens

Gemini 1.5 Pro

Activemultimodal

Gemini 1.5 Pro is Google's most advanced multimodal model, capable of understanding and processing text, images, audio, video, and code. It features a breakthrough 1 million token context window, enabling analysis of extremely long documents, videos, or codebases. It excels at complex reasoning, long-context tasks, and multimodal understanding.

Released

May 2024

Input

$3.50 per 1M tokens (text up to 128K tokens), $7.00 per 1M tokens (text over 128K tokens), $10.50 per 1M tokens (audio/image/video up to 128K tokens), $21.00 per 1M tokens (audio/image/video over 128K tokens)

Context

1,048,576 tokens

Gemini 1.0 Pro

Gemini 1.0 Pro is a lightweight, cost-efficient model optimized for a wide range of text-based tasks, including summarization, classification, and extraction. It offers a balance of performance and speed, making it suitable for high-volume applications where latency and cost are critical.

Released

December 2023

Input

$0.50 per 1M tokens

Context

32K tokens

Veo

Veo is Google's most capable generative video model, capable of producing high-quality 1080p videos from text, image, or video prompts. It understands natural language and visual semantics to create realistic and imaginative videos with consistent style and motion.

Released

May 2024

Input

—

Context

—

Imagen

Imagen is Google's text-to-image diffusion model that generates high-fidelity images from natural language descriptions. It leverages a large language model (T5-XXL) for text understanding and a diffusion model for image generation, producing photorealistic outputs. Imagen is known for its exceptional image quality and deep language comprehension.

Released

May 2022

Input

—

Context

—

Nano Banana Pro

Nano Banana Pro is a lightweight, efficient language model from Google designed for on-device and edge computing applications. It offers fast inference with low computational requirements while maintaining strong performance on common NLP tasks. This model is part of Google's Gemini family, optimized for mobile and embedded systems.

Input

Free (open source)

Context

8K tokens

Archives8— deprecated & retired

PaLM 2 Otter

deprecatedMay 2024

PaLM 2

PaLM 2 Bison

PaLM 2 Gecko

Nano Banana 2

Gemini 1.0 Nano

legacyFebruary 2024

Gemini 1.0 Ultra

deprecatedFebruary 2024

Nano Banana

Alibaba

42 active models

QwQ-32B

QwQ-32B is a 32-billion-parameter reasoning model developed by Alibaba's Qwen team, designed to excel in complex problem-solving and logical reasoning tasks. It is a distilled version of the larger QwQ-72B model, offering strong performance with reduced computational requirements. The model is open-source and available on Hugging Face.

Released

March 2025

Input

Free (open source)

Context

131072 tokens

Qwen3 0.6B

Qwen3 0.6B is a compact, efficient language model from Alibaba's Qwen3 series, designed for lightweight deployment and fast inference. It balances performance with low resource requirements, making it suitable for edge devices and real-time applications. As the smallest model in the Qwen3 lineup, it offers strong capabilities for its size.

Released

May 2025

Input

Free (open source)

Context

32K tokens

Qwen3 1.7B

Qwen3 1.7B is a compact, efficient language model from Alibaba's Qwen3 family, designed for fast inference and deployment on resource-constrained devices. It balances strong performance with low computational cost, making it suitable for edge computing and real-time applications.

Released

May 2025

Input

Free (open source)

Context

32K tokens

Qwen3 14B

Qwen3 14B is a mid-sized reasoning model in the Qwen3 family, balancing strong performance with efficiency. It supports extended thinking (chain-of-thought) and is optimized for complex reasoning tasks while being deployable on consumer hardware.

Released

May 2025

Input

Free (open source)

Context

32K tokens

Qwen3 30B

Qwen3 30B is a 30-billion-parameter large language model from Alibaba's Qwen3 series, designed for efficient reasoning and general-purpose tasks. It supports extended context lengths and is available as an open-weight model, balancing performance with computational efficiency.

Released

May 2025

Input

Free (open source)

Context

131072 tokens

Qwen3 32B

Qwen3 32B is a mid-sized reasoning model in the Qwen3 family, balancing strong performance with efficiency. It excels in complex reasoning, coding, and multilingual tasks, and is available as an open-source model on HuggingFace.

Released

May 2025

Input

Free (open source)

Context

128K tokens

Qwen3 4B

Qwen3 4B is a compact, efficient language model from Alibaba's Qwen3 series, designed for fast inference and deployment on resource-constrained devices. It balances strong performance with low computational cost, making it suitable for edge computing and real-time applications.

Released

May 2025

Input

Free (open source)

Context

32K tokens

Qwen3 72B

Qwen3 72B is a large language model from Alibaba's Qwen3 series, featuring a Mixture-of-Experts (MoE) architecture with 72B total parameters and 12B activated parameters. It supports extended context lengths up to 128K tokens and offers strong performance in reasoning, coding, and multilingual tasks.

Released

May 2025

Input

Free (open source)

Context

128K tokens

Qwen3 8B

Qwen3 8B is a 8-billion-parameter large language model from Alibaba's Qwen3 series, featuring a Mixture-of-Experts (MoE) architecture with 2.2B activated parameters. It supports extended thinking (reasoning) and is optimized for efficiency, outperforming similarly sized dense models on benchmarks like MMLU and coding tasks.

Released

May 2025

Input

Free (open source)

Context

32K tokens

Qwen3 235B

Qwen3 235B is Alibaba's largest and most capable MoE (Mixture-of-Experts) language model, featuring 235 billion total parameters with 35 billion activated per token. It achieves state-of-the-art performance across reasoning, coding, and multilingual tasks, rivaling leading models like DeepSeek-R1 and GPT-4o.

Released

April 2025

Input

Free (open source)

Context

131072 tokens

Qwen3 235B-A22B

Qwen3 235B-A22B is a Mixture-of-Experts (MoE) large language model with 235B total parameters and 22B activated parameters per token. It is designed for strong reasoning and multilingual capabilities, supporting 119 languages and 29 programming languages, and is part of Alibaba's Qwen3 series.

Released

April 2025

Input

Free (open source)

Context

131072 tokens

Qwen2.5-VL 3B

Activemultimodal

Qwen2.5-VL 3B is a compact multimodal vision-language model from Alibaba's Qwen series, designed for efficient image and video understanding. It excels in tasks like visual question answering, document parsing, and video analysis while maintaining a small footprint for deployment on edge devices.

Released

January 2025

Input

Free (open source)

Context

128K tokens

Qwen2.5-VL 72B

Activemultimodal

Qwen2.5-VL 72B is Alibaba's flagship multimodal model, excelling in vision-language tasks such as image and video understanding, document parsing, and visual reasoning. It builds on the Qwen2.5 language model with enhanced visual perception and dynamic resolution support.

Released

January 2025

Input

Free (open source)

Context

131072 tokens

Qwen2.5-VL 7B

Activemultimodal

Qwen2.5-VL 7B is a multimodal vision-language model from Alibaba's Qwen series, supporting image and video understanding. It excels in visual reasoning, document parsing, and real-time video analysis, offering strong performance in a compact 7B parameter size.

Released

January 2025

Input

Free (open source)

Context

131072 tokens

Qwen2.5-Coder 14B

Qwen2.5-Coder 14B is a state-of-the-art code-specific large language model from Alibaba's Qwen series, designed to excel in code generation, reasoning, and fixing. It is part of the Qwen2.5-Coder family, which includes models ranging from 1.5B to 32B parameters, with the 14B variant offering a strong balance of performance and efficiency.

Released

November 2024

Input

Free (open source)

Context

32,768 tokens

Qwen2.5-Coder 32B

Qwen2.5-Coder 32B is a state-of-the-art code-specific large language model from Alibaba's Qwen series, designed to excel in code generation, reasoning, and software engineering tasks. It is the largest model in the Qwen2.5-Coder family, offering competitive performance against specialized coding models like GPT-4o and Claude 3.5 Sonnet. This model is open-source and available on Hugging Face, making it accessible for both research and commercial use.

Released

November 2024

Input

Free (open source)

Context

131072 tokens

Qwen2.5-Coder 3B

Qwen2.5-Coder 3B is a compact code-specialized language model from Alibaba's Qwen series, designed for efficient code generation and understanding. It balances strong coding performance with a small footprint, making it suitable for resource-constrained environments. As part of the Qwen2.5-Coder family, it offers competitive results on coding benchmarks while being significantly smaller than larger variants.

Released

November 2024

Input

Free (open source)

Context

32,768 tokens

Qwen2.5-Coder 7B

Qwen2.5-Coder 7B is a specialized code generation model from Alibaba's Qwen2.5 series, fine-tuned for programming tasks. It offers strong performance in code completion, debugging, and multi-language support, making it a competitive open-source alternative for developers.

Released

November 2024

Input

Free (open source)

Context

32K tokens

Qwen2.5 0.5B

Qwen2.5 0.5B is a compact language model from Alibaba's Qwen2.5 series, designed for efficient on-device and edge deployment. With only 0.5 billion parameters, it offers strong performance for its size, supporting a 32K token context window and multilingual capabilities.

Released

September 2024

Input

Free (open source)

Context

32K tokens

Qwen2.5 1.5B

Qwen2.5 1.5B is a compact language model from Alibaba's Qwen2.5 series, designed for efficient deployment on edge devices and resource-constrained environments. It offers strong performance for its size, with support for multiple languages and a 32K token context window.

Released

September 2024

Input

Free (open source)

Context

32K tokens

Qwen2.5 110B

Qwen2.5 110B is Alibaba's largest open-source language model, featuring 110 billion parameters and a 128K token context window. It excels in coding, mathematics, and multilingual tasks, outperforming many proprietary models of similar size. The model is available under a permissive Apache 2.0 license.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Qwen2.5 14B

Qwen2.5 14B is a large language model from Alibaba's Qwen series, offering a strong balance of performance and efficiency. It features an extended context window of 128K tokens and supports multiple languages, making it suitable for a wide range of text-based tasks. As part of the Qwen2.5 family, it delivers improved reasoning and coding capabilities over its predecessor.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Qwen2.5 32B

Qwen2.5 32B is a large language model from Alibaba's Qwen series, offering a strong balance of performance and efficiency with 32 billion parameters. It supports a 128K token context window and excels in coding, mathematics, and multilingual tasks. This model is part of the Qwen2.5 family, which includes sizes from 0.5B to 72B, and is available as an open-source model on Hugging Face.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Qwen2.5 3B

Qwen2.5 3B is a compact yet powerful language model from Alibaba's Qwen2.5 series, offering strong performance in coding, mathematics, and general text generation. It is designed for efficient deployment on edge devices and resource-constrained environments while maintaining competitive benchmark scores.

Released

September 2024

Input

Free (open source)

Context

32768 tokens

Qwen2.5 72B

Qwen2.5 72B is a large language model from Alibaba's Qwen series, featuring 72 billion parameters and a 128K token context window. It excels in coding, mathematics, and multilingual tasks, and is available as an open-source model under Apache 2.0 license.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Qwen2.5 7B

Qwen2.5 7B is a large language model from Alibaba's Qwen series, offering strong performance in coding, mathematics, and general knowledge tasks. It features an extended context window of up to 128K tokens and supports over 29 languages, making it versatile for multilingual applications. As part of the Qwen2.5 family, it balances efficiency and capability for a wide range of use cases.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Qwen2.5-Coder 0.5B

Qwen2.5-Coder 0.5B is a compact code-specific language model from Alibaba's Qwen2.5 series, designed for efficient code generation and understanding. With only 0.5 billion parameters, it offers fast inference and low resource requirements while maintaining competitive performance on coding tasks. It is part of a family that includes larger models up to 32B parameters.

Released

September 2024

Input

Free (open source)

Context

32K tokens

Qwen2.5-Coder 1.5B

Qwen2.5-Coder 1.5B is a compact code-specific language model from Alibaba's Qwen series, designed for efficient code generation and understanding. It balances performance and resource usage, making it suitable for local deployment and lightweight coding tasks. As part of the Qwen2.5-Coder family, it offers strong code completion and infilling capabilities.

Released

September 2024

Input

Free (open source)

Context

32K tokens

Qwen2 0.5B

Qwen2 0.5B is a small, efficient language model from Alibaba's Qwen2 series, designed for lightweight deployment and fast inference. It balances performance with low computational requirements, making it suitable for resource-constrained environments. As the smallest model in the Qwen2 family, it offers a compact alternative for basic text generation tasks.

Released

June 2024

Input

Free (open source)

Context

32K tokens

Qwen2 1.5B

Qwen2 1.5B is a small, efficient language model from Alibaba's Qwen2 series, designed for lightweight deployment and fast inference. It balances performance and resource usage, making it suitable for edge devices and applications with limited compute. As the smallest model in the Qwen2 family, it offers strong multilingual capabilities and supports a 32K token context window.

Released

June 2024

Input

Free (open source)

Context

32K tokens

Qwen2 72B

Qwen2 72B is Alibaba's largest open-source language model in the Qwen2 series, featuring 72 billion parameters and a 128K token context window. It achieves state-of-the-art performance across multiple benchmarks, excelling in reasoning, coding, and multilingual tasks. As the flagship model of the Qwen2 family, it represents Alibaba's commitment to open-source AI with competitive capabilities.

Released

June 2024

Input

Free (open source)

Context

128K tokens

Qwen2 7B

Qwen2 7B is a 7-billion-parameter large language model from Alibaba's Qwen2 series, offering strong performance in multilingual tasks, coding, and reasoning. It features an extended context window of up to 128K tokens and is available as an open-source model under Apache 2.0 license.

Released

June 2024

Input

Free (open source)

Context

128K tokens

Qwen1.5 0.5B

Qwen1.5 0.5B is a compact 0.5 billion parameter language model from Alibaba's Qwen1.5 series, designed for efficient deployment on resource-constrained devices. It offers a balance of performance and speed, making it suitable for lightweight applications and experimentation.

Released

February 2024

Input

Free (open source)

Context

32K tokens

Qwen1.5 1.8B

Qwen1.5 1.8B is a compact decoder-only language model from Alibaba's Qwen1.5 series, featuring 1.8 billion parameters. It is designed for efficient deployment on edge devices and resource-constrained environments while maintaining strong performance on various NLP tasks. As the smallest model in the Qwen1.5 lineup, it offers a balance of speed and capability for lightweight applications.

Released

February 2024

Input

Free (open source)

Context

32K tokens

Qwen1.5 110B

Qwen1.5 110B is a large language model with 110 billion parameters, part of Alibaba's Qwen1.5 series. It excels in multilingual tasks, particularly in Chinese and English, and supports a 32K token context window. As the largest model in the Qwen1.5 lineup, it offers strong performance across a wide range of NLP benchmarks.

Released

February 2024

Input

Free (open source)

Context

32K tokens

Qwen1.5 14B

Qwen1.5 14B is a transformer-based decoder-only language model from Alibaba's Qwen series, featuring 14 billion parameters. It is part of the Qwen1.5 release that introduced a stable version with enhanced performance and support for multiple languages. The model excels in tasks like text generation, coding, and reasoning, and is available under a permissive license.

Released

February 2024

Input

Free (open source)

Context

32768 tokens

Qwen1.5 32B

Qwen1.5 32B is a large language model from Alibaba's Qwen1.5 series, featuring 32 billion parameters. It supports a context length of 32,768 tokens and is designed for a wide range of natural language tasks. The model is open-source under Apache 2.0 license and is available in both base and chat versions.

Released

February 2024

Input

Free (open source)

Context

32768 tokens

Qwen1.5 4B

Qwen1.5 4B is a compact 4-billion-parameter language model from Alibaba's Qwen1.5 series, designed for efficient deployment on resource-constrained devices. It supports a 32K context window and offers multilingual capabilities, including strong performance in Chinese and English. As the smallest model in the Qwen1.5 lineup, it balances speed and quality for lightweight applications.

Released

February 2024

Input

Free (open source)

Context

32768 tokens

Qwen1.5 72B

Qwen1.5 72B is a large language model from Alibaba's Qwen series, featuring a mixture-of-experts architecture for efficient scaling. It excels in multilingual tasks, particularly Chinese and English, and offers strong performance across a wide range of benchmarks. As the largest model in the Qwen1.5 family, it provides state-of-the-art capabilities for complex reasoning and generation.

Released

February 2024

Input

Free (open source)

Context

32768 tokens

Qwen1.5 7B

Qwen1.5 7B is a 7-billion-parameter language model from Alibaba's Qwen series, offering a balance of performance and efficiency. It supports a 32K context window and is available under Apache 2.0 license, making it suitable for a wide range of NLP tasks.

Released

February 2024

Input

Free (open source)

Context

32768 tokens

Qwen-Audio

Activemultimodal

Qwen-Audio is a large audio-language model developed by Alibaba Cloud, designed to process and understand various types of audio inputs including speech, music, and environmental sounds. It extends the Qwen series by incorporating audio understanding capabilities, enabling tasks such as audio captioning, sound event detection, and speech recognition. The model is part of Alibaba's open-source Qwen family, offering a unified framework for audio and text interactions.

Released

August 2023

Input

Free (open source)

Context

8192 tokens

Qwen-VL

Activemultimodal

Qwen-VL is a multimodal large language model developed by Alibaba Cloud, capable of understanding and generating text based on visual inputs such as images. It integrates vision and language understanding, enabling tasks like image captioning, visual question answering, and document understanding. As part of the Qwen series, it offers strong performance in both Chinese and English contexts.

Released

August 2023

Input

Free (open source)

Context

32K tokens

Meta

19 active models · 2 archived

Code Llama 70B

Code Llama 70B is Meta's largest code-focused language model, designed for code generation, understanding, and debugging. It is built on Llama 2 and fine-tuned on code-heavy datasets, offering state-of-the-art performance among open-source code models. It supports multiple programming languages and is available in base, Python-specialized, and instruction-tuned variants.

Released

January 2024

Input

Free (open source)

Context

100K tokens

Code Llama 34B

Code Llama 34B is a large language model specialized for code generation and understanding, built on top of Llama 2. It is the largest variant in the Code Llama family, offering state-of-the-art performance on coding tasks. The model supports infilling, code completion, and instruction following for multiple programming languages.

Released

August 2023

Input

Free (open source)

Context

16K tokens

Code Llama 13B

Code Llama 13B is a large language model specialized for code generation and understanding, based on Llama 2. It is part of Meta's Code Llama family, offering a balance between performance and efficiency for code-related tasks. The model excels in generating code from natural language prompts and supports multiple programming languages.

Released

August 2023

Input

Free (open source)

Context

16K tokens

Code Llama 7B

Code Llama 7B is a 7 billion parameter large language model specialized for code generation and understanding, built on top of Llama 2. It is part of Meta's Code Llama family, which includes base, Python, and Instruct variants, designed to assist with code completion, debugging, and natural language to code tasks.

Released

August 2023

Input

Free (open source)

Context

16K tokens

Llama 4 Behemoth

Activemultimodal

Llama 4 Behemoth is Meta's most powerful and largest open-source multimodal model, designed to push the boundaries of AI research and development. It excels at complex reasoning, coding, and understanding both text and images, serving as a research-focused model for the community. As the flagship of the Llama 4 family, it offers unprecedented scale and capability for open-weight models.

Released

April 2025

Input

Free (open source)

Context

1M tokens

Llama 4 Maverick

Activemultimodal

Llama 4 Maverick is Meta's most advanced multimodal model, designed for complex reasoning and vision-language tasks. It excels in coding, math, and instruction following, with a 10M token context window and native support for images and text. As the flagship of the Llama 4 family, it sets new standards for open-weight AI performance.

Released

April 2025

Input

Free (open source)

Context

10M tokens

Llama 4 Scout

Activemultimodal

Llama 4 Scout is a lightweight, efficient multimodal model from Meta, designed for fast inference and on-device deployment. It supports text and image inputs, making it suitable for vision-language tasks. As part of the Llama 4 family, Scout balances performance with resource efficiency.

Released

April 2025

Input

Free (open source)

Context

1M tokens

Llama 3.3 70B

Llama 3.3 70B is Meta's most advanced open-source large language model, offering state-of-the-art performance in reasoning, coding, and multilingual tasks. It is a text-only model optimized for instruction following and safety, serving as a cost-effective alternative to larger models like Llama 3.1 405B.

Released

December 2024

Input

Free (open source)

Context

128K tokens

Llama 3.2 11B

Activemultimodal

Llama 3.2 11B is a multimodal model that supports text and image inputs, enabling tasks like visual reasoning and document understanding. It is part of Meta's Llama 3.2 family, offering a balance of performance and efficiency for on-device and cloud applications. This model is open-source and optimized for instruction following and safety.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Llama 3.2 1B

Llama 3.2 1B is a small, efficient language model from Meta, optimized for on-device and edge applications. It is part of the Llama 3.2 family, which includes both text-only and multimodal models, with the 1B variant focusing on lightweight text generation. This model is designed for high-speed inference and low resource consumption, making it suitable for mobile and embedded systems.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Llama 3.2 3B

Llama 3.2 3B is a lightweight, multilingual text-only model from Meta, optimized for on-device and edge applications. It is part of the Llama 3.2 family, which includes both text-only and vision models, and offers strong performance for its size with support for 8 languages.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Llama 3.2 90B

Llama 3.2 90B is Meta's largest multimodal model in the Llama 3.2 family, supporting text and image inputs. It excels at complex reasoning, multilingual tasks, and vision-language understanding, positioning it as a top-tier open-weight model for enterprise and research applications.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Llama 3.1 405B

Llama 3.1 405B is Meta's largest and most capable open-source language model, featuring 405 billion parameters. It excels in general knowledge, reasoning, and multilingual tasks, and is designed for synthetic data generation and model distillation. As the flagship of the Llama 3.1 family, it offers state-of-the-art performance among open models.

Released

July 2024

Input

Free (open source)

Context

128K tokens

Llama 3.1 70B

Llama 3.1 70B is a large language model from Meta, part of the Llama 3.1 family with a 128K context window and support for eight languages. It offers strong performance in reasoning, coding, and multilingual tasks, and is available as an open-weight model for research and commercial use.

Released

July 2024

Input

Free (open source)

Context

128K tokens

Llama 3.1 8B

Llama 3.1 8B is a compact, efficient large language model from Meta, part of the Llama 3.1 family. It offers strong performance for its size, with a 128K token context window and multilingual support. It is designed for a wide range of text-based tasks, balancing quality and computational efficiency.

Released

July 2024

Input

Free (open source)

Context

128K tokens

Llama 3 70B

Llama 3 70B is a large language model from Meta, part of the Llama 3 family, offering strong performance across a wide range of natural language tasks. It is an open-source model that balances capability with accessibility, making it suitable for both research and commercial applications. As the mid-sized variant in the Llama 3 lineup, it provides a compelling option for developers needing high-quality text generation without the computational demands of the 405B model.

Released

April 2024

Input

Free (open source)

Context

8K tokens

Llama 3 8B

Llama 3 8B is a lightweight, open-source large language model from Meta, designed for efficient text generation and understanding. It offers strong performance for its size, rivaling larger models in many tasks, and is part of the Llama 3 family that includes 8B and 70B parameter variants. The model is optimized for a wide range of natural language processing applications with a focus on accessibility and customization.

Released

April 2024

Input

Free (open source)

Context

8K tokens

Llama 2 70B

Llama 2 70B is a large language model from Meta, part of the Llama 2 family, with 70 billion parameters. It is designed for a wide range of natural language tasks and is available for both research and commercial use under a permissive license. As the largest model in the Llama 2 series, it offers strong performance on reasoning, coding, and general knowledge tasks.

Released

July 2023

Input

Free (open source)

Context

4096 tokens

Llama 2 7B

Llama 2 7B is a foundational open-source large language model from Meta, part of the Llama 2 family. It is a 7 billion parameter model optimized for dialogue and text generation, available for both research and commercial use. As the smallest variant in the Llama 2 series, it offers efficient performance for a wide range of natural language tasks.

Released

July 2023

Input

Free (open source)

Context

4096 tokens

Archives2— deprecated & retired

Llama 2 13B

legacyJuly 2023

Llama 1

legacyFebruary 2023

Cohere

13 active models · 1 archived

command-a-reasoning-08-2025

Command A Reasoning is a specialized variant of Cohere's Command A model, optimized for complex reasoning tasks such as multi-step logic, mathematical problem-solving, and code generation. It features an extended reasoning chain that improves accuracy on tasks requiring deep deliberation, while maintaining the core strengths of the Command A family in enterprise-grade text generation and tool use.

Released

August 2025

Input

$2.50 per 1M tokens

Context

256K tokens

command-a-translate-08-2025

Command-A Translate is a specialized variant of Cohere's Command-A model optimized for high-quality translation across multiple languages. It leverages a 128K context window and is designed for efficient, accurate translation tasks, making it ideal for enterprise localization and multilingual content generation.

Released

August 2025

Input

$2.50 per 1M tokens

Context

128K tokens

command-r-08-2024

Command R+ 08-2024 is an updated version of Cohere's high-performance generative model optimized for enterprise RAG and tool use. It offers improved efficiency and accuracy in multilingual tasks, with a focus on cost-effective deployment.

Released

August 2024

Input

$2.50 per 1M tokens

Context

128K tokens

command-r-plus-08-2024

Command R+ 08-2024 is Cohere's most powerful large language model, optimized for enterprise-grade RAG and tool use. It excels at complex reasoning, multilingual tasks, and long-context understanding with a 128K token context window. This model is designed for high-accuracy, scalable AI applications requiring robust retrieval-augmented generation.

Released

August 2024

Input

$5.00 per 1M tokens

Context

128K tokens

command-a-vision-07-2025

Activemultimodal

Command A Vision is Cohere's flagship multimodal model, combining advanced language understanding with native vision capabilities. It excels at tasks requiring both text and image analysis, such as document understanding, visual question answering, and multimodal reasoning. This model is part of the Command A family, designed for enterprise-grade performance and safety.

Released

July 2025

Input

$2.50 per 1M tokens

Context

128K tokens

command-r7b-12-2024

Command R7B is a compact, efficient language model optimized for fast inference and low cost. It is designed for high-throughput applications requiring quick responses, such as real-time chatbots and simple text generation tasks. As part of Cohere's Command family, it balances performance with affordability.

Released

December 2024

Input

$0.15 per 1M tokens

Context

128K tokens

command-a-plus-05-2026

Command A+ is Cohere's most advanced large language model, optimized for enterprise-grade reasoning, long-context understanding, and multilingual tasks. It builds on the Command family with enhanced accuracy, lower latency, and stronger instruction following. This model is designed for complex workflows requiring high reliability and safety.

Released

May 2026

Input

$2.50 per 1M tokens

Context

256K tokens

command-r-plus-04-2024

Command R+ is Cohere's most powerful large language model, optimized for enterprise-grade RAG and tool use. It excels at complex reasoning, multilingual tasks, and long-context understanding with a 128K token context window. This model is designed for production workloads requiring high accuracy and reliability.

Released

April 2024

Input

$3.00 per 1M tokens

Context

128K tokens

command-a-03-2025

Command A is Cohere's most advanced large language model, optimized for enterprise-grade agentic tasks and multilingual performance. It features a 256K context window, supports 23 languages, and excels in tool use, retrieval-augmented generation (RAG), and code generation. Command A is designed to be highly capable while remaining cost-efficient, outperforming models like GPT-4o and Llama 3.1 405B on key benchmarks.

Released

March 2025

Input

$2.50 per 1M tokens

Context

256K tokens

command-r-03-2024

Command-R is a large language model optimized for retrieval-augmented generation (RAG) and tool use, designed for enterprise workloads. It balances high performance with cost efficiency, offering strong reasoning and multilingual capabilities. This model is part of Cohere's Command family, positioned as a faster and more affordable alternative to Command-R+.

Released

March 2024

Input

$0.50 per 1M tokens

Context

128K tokens

Command R

Command R is Cohere's flagship large language model optimized for enterprise use cases, including retrieval-augmented generation (RAG) and tool use. It balances high performance with efficiency, offering strong multilingual support and a 128K context window. Command R is designed to be a versatile workhorse for production applications.

Released

March 2024

Input

$0.50 per 1M tokens

Context

128K tokens

command-r-plus

Command R+ is Cohere's most advanced large language model, optimized for enterprise-grade RAG (Retrieval-Augmented Generation) and tool use. It excels at complex reasoning, multi-step tool use, and long-context tasks, making it ideal for building production-ready AI assistants. It is the flagship model in Cohere's Command family, offering the highest performance and largest context window.

Released

March 2024

Input

$3.00 per 1M tokens

Context

128K tokens

command

Command is Cohere's flagship generative language model optimized for enterprise use cases such as text generation, summarization, and retrieval-augmented generation (RAG). It balances strong performance with cost efficiency, making it suitable for production deployments. Command is distinct for its native support of tool use and multi-step reasoning via chain-of-thought.

Released

March 2023

Input

$1.00 per 1M tokens

Context

4096 tokens

Archives1— deprecated & retired

command-light

legacyMarch 2023

Microsoft

11 active models · 1 archived

Phi-4 Mini

Phi-4 Mini is a compact, efficient language model from Microsoft, designed for high-performance reasoning and coding tasks with a 128K token context window. It is part of the Phi-4 family, offering strong capabilities in a smaller footprint suitable for edge and resource-constrained environments. The model emphasizes speed and accuracy for text generation and code understanding.

Released

February 2025

Input

Free (open source)

Context

128K tokens

Phi-4 Multimodal

Activemultimodal

Phi-4 Multimodal is a compact, efficient multimodal model from Microsoft that processes text, images, and audio inputs. It is designed for on-device and edge scenarios, offering strong performance in vision and speech tasks while maintaining a small footprint. Part of the Phi-4 family, it balances capability with low computational cost.

Released

February 2025

Input

Free (open source)

Context

128K tokens

Phi-4

Phi-4 is a 14B parameter small language model developed by Microsoft, focusing on high-quality reasoning and instruction following. It outperforms larger models on math and coding benchmarks, and is designed for efficient deployment on edge devices. Phi-4 is part of Microsoft's Phi family, emphasizing data quality over scale.

Released

December 2024

Input

Free (open source)

Context

16K tokens

Phi-3.5 Mini

Phi-3.5 Mini is a lightweight, state-of-the-art open model with 3.8B parameters, optimized for high-quality reasoning and instruction following. It is part of Microsoft's Phi-3 family, designed to run efficiently on edge devices while delivering strong performance on benchmarks comparable to larger models.

Released

August 2024

Input

Free (open source)

Context

128K tokens

Phi-3.5 MoE

Phi-3.5 MoE is a lightweight, state-of-the-art Mixture-of-Experts model with 6.6B total parameters and 3.8B active parameters. It excels in multilingual tasks, code generation, and long-context reasoning, outperforming larger models in its class.

Released

August 2024

Input

Free (open source)

Context

128K tokens

Phi-3.5 Vision

Activemultimodal

Phi-3.5 Vision is a lightweight, state-of-the-art multimodal model that processes both text and images. It excels in reasoning over images, extracting information from charts and tables, and understanding video frames. As part of the Phi-3 family, it offers strong performance in a compact size, suitable for resource-constrained environments.

Released

August 2024

Input

Free (open source)

Context

128K tokens

Phi-3 Medium

Phi-3 Medium is a 14B parameter language model from Microsoft, part of the Phi-3 family, designed to deliver strong performance on a wide range of tasks while being efficient enough to run on consumer hardware. It is a compact model that rivals larger models in quality, making it suitable for resource-constrained environments and local deployment.

Released

May 2024

Input

Free (open source)

Context

128K tokens

Phi-3 Small

Phi-3 Small is a 7B parameter language model from Microsoft's Phi-3 family, designed for efficient performance on edge devices and cloud inference. It achieves strong results on benchmarks while being small enough to run on smartphones, making it a cost-effective alternative to larger models.

Released

May 2024

Input

Free (open source)

Context

128K tokens

Phi-3 Mini

Phi-3 Mini is a small language model (3.8B parameters) from Microsoft that achieves strong performance on benchmarks comparable to larger models like Mixtral 8x7B and GPT-3.5. It is designed for efficient on-device and edge deployment, with a focus on high-quality training data and instruction tuning.

Released

April 2024

Input

Free (open source)

Context

128K tokens

Phi-2

Phi-2 is a 2.7 billion parameter language model developed by Microsoft Research, designed to demonstrate that smaller models can achieve competitive performance on reasoning and language understanding tasks. It is part of the Phi series, which focuses on training on high-quality synthetic data and filtered web data to maximize efficiency. Phi-2 excels in tasks like math, coding, and common sense reasoning despite its compact size.

Released

December 2023

Input

Free (open source)

Context

2048 tokens

Phi-1.5

Phi-1.5 is a small language model with 1.3 billion parameters, trained on a blend of synthetic and textbook-quality data. It is designed for efficient reasoning and code generation, achieving strong performance on benchmarks despite its compact size.

Released

September 2023

Input

Free (open source)

Context

2048 tokens

Archives1— deprecated & retired

Phi-1

legacyJune 2023

Mistral AI

12 active models

Codestral 2025

Codestral 2025 is Mistral AI's latest code-focused model, optimized for code generation, completion, and reasoning. It offers a 256K context window and is designed for high-performance coding tasks with low latency. This model is part of Mistral's specialized lineup for developers.

Released

May 2025

Input

$0.20 per 1M tokens

Context

256K tokens

Pixtral 12B

Activemultimodal

Pixtral 12B is Mistral AI's first multimodal model, capable of processing both text and images. It is a 12-billion parameter model that excels at tasks like document understanding, image captioning, and visual question answering. Pixtral 12B is designed to be efficient and accessible, offering strong performance in a compact size.

Released

September 2024

Input

Free (open source)

Context

128K tokens

Ministral 8B

Ministral 8B is a compact, efficient language model designed for on-device and edge computing applications. It offers strong performance for its size, with a 128k context window and support for multiple languages, making it suitable for tasks like summarization, classification, and retrieval.

Released

October 2024

Input

Free (open source)

Context

128K tokens

Mixtral 8x22B

Mixtral 8x22B is a sparse mixture-of-experts (MoE) model with 141 billion total parameters, of which 39 billion are active per token. It is Mistral AI's largest open-weight model, offering strong performance across reasoning, coding, and multilingual tasks while maintaining efficiency through its MoE architecture.

Released

April 2024

Input

Free (open source)

Context

65K tokens

Mixtral 8x7B

Mixtral 8x7B is a sparse mixture-of-experts (MoE) model that achieves high performance with low latency by activating only 12.9B parameters per token. It is Mistral's first open-weight MoE model, offering strong multilingual capabilities and a 32K token context window.

Released

December 2023

Input

Free (open source)

Context

32K tokens

Mistral 7B

Mistral 7B is a small yet powerful language model that outperforms larger models like Llama 2 13B on many benchmarks. It is designed for efficient deployment and fine-tuning, making it a popular choice for developers seeking high performance with lower computational costs.

Released

September 2023

Input

Free (open source)

Context

8K tokens

Mistral Small 3.1

Mistral Small 3.1 is a fast, efficient language model optimized for low-latency applications and on-device deployment. It offers strong performance on reasoning, coding, and multilingual tasks while maintaining a small footprint. This model is part of Mistral's 'Small' lineup, balancing quality and speed for cost-sensitive use cases.

Released

March 2025

Input

$0.20 per 1M tokens

Context

128K tokens

Mistral Small 3

Mistral Small 3 is a 24B-parameter dense language model optimized for low-latency, high-throughput inference, offering strong performance at a fraction of the cost of larger models. It is designed to be a fast, efficient alternative for tasks that do not require the full power of frontier models, and it is released under the Apache 2.0 license.

Released

January 2025

Input

Free (open source)

Context

32K tokens

Ministral 3B

Ministral 3B is a small, efficient language model designed for on-device and edge computing applications. It offers strong performance for its size, with a 128K token context window and support for function calling, making it suitable for low-latency, privacy-sensitive tasks.

Released

October 2024

Input

Free (open source)

Context

128K tokens

Mistral Large 2

Mistral Large 2 is Mistral AI's flagship large language model, offering state-of-the-art reasoning, multilingual support, and advanced coding capabilities. It features a 128K context window and excels in nuanced tasks like instruction following and function calling, positioning itself as a strong competitor to GPT-4 and Claude 3.5 Sonnet.

Released

July 2024

Input

$3.00 per 1M tokens

Context

128K tokens

Pixtral Large

Activemultimodal

Pixtral Large is Mistral AI's most advanced multimodal model, combining a 124 billion parameter decoder with a dedicated vision encoder. It excels at understanding text, images, and documents, and is designed for complex reasoning tasks that require both visual and textual understanding.

Released

November 2024

Input

$2.00 per 1M tokens

Context

128K tokens

Mistral NeMo

Mistral NeMo is a state-of-the-art 12B parameter language model developed in collaboration with NVIDIA, offering a large context window of 128K tokens. It is designed for high performance on multilingual tasks, coding, and reasoning, and is available as an open-weight model under the Apache 2.0 license. Mistral NeMo serves as a strong alternative to larger models like Llama 3 70B, providing competitive performance with lower computational requirements.

Released

July 2024

Input

Free (open source)

Context

128K tokens

NVIDIA

4 active models

Nemotron-4 340B

Nemotron-4 340B is NVIDIA's largest and most capable language model, designed for synthetic data generation and training of smaller models. It achieves state-of-the-art performance on benchmarks like MMLU and HumanEval, and is released under an open model license for research and commercial use.

Released

June 2024

Input

Free (open source)

Context

4096 tokens

Llama 3.3 Nemotron Super 49B

Llama 3.3 Nemotron Super 49B is a large language model developed by NVIDIA, fine-tuned from Meta's Llama 3.3 70B using NVIDIA's Nemotron-4 340B training pipeline. It offers competitive performance to leading models like GPT-4o and Llama 3.1 405B while being more efficient with 49 billion parameters.

Released

December 2024

Input

Free (open source)

Context

128K tokens

Llama 3.1 Nemotron Nano 8B

Llama 3.1 Nemotron Nano 8B is a small, efficient language model optimized for low-latency inference on NVIDIA GPUs. It is part of NVIDIA's Nemotron family, designed for edge and real-time applications where speed and resource efficiency are critical.

Released

October 2024

Input

Free (open source)

Context

128K tokens

Llama 3.1 Nemotron Ultra 253B

Llama 3.1 Nemotron Ultra 253B is a large language model developed by NVIDIA, based on Meta's Llama 3.1 architecture with enhancements for improved reasoning and instruction following. It is part of NVIDIA's Nemotron model family, designed for enterprise-grade AI applications requiring high accuracy and reliability.

Released

October 2024

Input

$5.00 per 1M tokens

Context

128K tokens

Need help choosing a model stack?

Use this page as research, then reach out if you need help narrowing model options for a real workflow, team, or budget.

Request Advisory