by Cohere· Released July 2025· Cutoff April 2025
Command A Vision is Cohere's flagship multimodal model, combining advanced language understanding with native vision capabilities. It excels at tasks requiring both text and image analysis, such as document understanding, visual question answering, and multimodal reasoning. This model is part of the Command A family, designed for enterprise-grade performance and safety.
Input cost
$2.50 per 1M tokens
Output cost
$10.00 per 1M tokens
Context window
128K tokens
Max output
4096 tokens
Modalities
License
proprietary
Enterprise applications requiring multimodal understanding, such as document analysis, visual Q&A, and content moderation.