
ABBYY, IBM, NVIDIA, Red Hat, HumanSignal and the LF AI & Data Foundation have launched DocLang, an open standard designed to give AI systems a common language for understanding enterprise documents, improving interoperability while reducing processing complexity.
ABBYY has joined IBM, NVIDIA, Red Hat, HumanSignal and the LF AI & Data Foundation to launch DocLang, a new open standard designed to create an AI-native representation of documents.
The initiative aims to establish a universal language that enables AI systems to understand documents more consistently and efficiently. Supporters compare its potential role in document AI to HTML’s standardisation of web content.
Positioned as an open specification rather than a proprietary format, DocLang seeks to reduce fragmentation caused by vendors creating their own document representations. The standard is intended to improve interoperability across AI models, platforms, applications and autonomous agents, while building on Docling, the open-source document processing toolkit released in 2024.
The effort addresses a longstanding challenge in enterprise AI. Most business knowledge exists in PDFs, scanned documents, spreadsheets, presentations, forms and reports that were designed for human readers rather than AI systems. As a result, AI workflows often require OCR, layout analysis, document parsing and post-processing before information can be reliably understood.
“DocLang is designed to solve one of the foundational problems in enterprise AI: documents were built for humans, not machines,” said Maxime Vermeir, Vice President of AI Strategy at ABBYY.
DocLang preserves semantic meaning, document structure, geometric layout, tables, metadata and governance controls within a standardised format. This allows AI systems to better understand relationships between content and context while supporting privacy policies, content extraction permissions, AI model training permissions and usage restrictions.
Backers believe the initiative could become a foundational layer for document intelligence and agentic AI workflows, much as HTML did for the web and Kubernetes did for cloud-native infrastructure.














































































