Home Content News Mistral AI Launches OCR 4, Turning Unstructured Documents Into Layered Data

Mistral AI Launches OCR 4, Turning Unstructured Documents Into Layered Data

0
2
Mistral AI

Mistral AI has released Mistral OCR 4, a document intelligence model that extracts text alongside bounding boxes, block classifications, and confidence scores across 170 languages for advanced RAG pipelines

On 23 June 2026, Mistral AI released Mistral OCR 4, a model that moves beyond raw text extraction to return fully structured representations of entire documents. The system natively outputs paragraph-level bounding boxes, typed-block labels (like titles, tables, equations, and signatures), and confidence scores.

It directly ingests common unstructured enterprise formats, such as PDF, DOC, PPT, and OpenDocument (ODT) files, eliminating intermediate conversions. It supports 170 languages across 10 groups, showcasing performance breakthroughs on rare, specialised, and low-resource languages.

In blind evaluations across more than 600 real-world documents, independent annotators preferred OCR 4 over competing systems, yielding a 72% average win rate. It also secured the top position on automated benchmarks, scoring 85.20 on the public OlmOCRBench, 93.07 on OmniDocBench, and 0.98 on Mistral’s internal Crawl Multilingual evaluation.

OCR 4 integrates natively as the ingestion layer for Mistral’s open-source Search Toolkit, providing clean, citation-ready text units directly to RAG and agentic workflows. For regulated enterprise clients, the model can be deployed as a single, fully self-hosted container. Mistral explicitly frames OCR 4 as a document-understanding tool rather than an autonomous decision-maker, making high-stakes automations like medical diagnoses out of scope.

Standard API access costs $4 per 1,000 pages, dropping to $2 per 1,000 pages with the Batch-API discount, whilst the Document AI tier within Mistral Studio costs $5 per 1,000 pages. The model is immediately available via the Mistral API, Mistral Studio, Amazon SageMaker, and Microsoft Foundry, with Snowflake integration coming soon.

LEAVE A REPLY

Please enter your comment!
Please enter your name here