ABBYY IBM Red Hat Push Open Source AI Document Standard

0
12
Open Vendor Neutral AI Document Standard To Fix Enterprise Data Chaos Announced By ABBYY, IBM And Red Hat Under LF AI & Data Foundation
Open Vendor Neutral AI Document Standard To Fix Enterprise Data Chaos Announced By ABBYY, IBM And Red Hat Under LF AI & Data Foundation

ABBYY, IBM, and Red Hat launch DocLang as an open source AI document standard to solve fragmented enterprise data and make documents AI-readable at scale.

ABBYY, IBM, and Red Hat have announced DocLang, a universal AI-native document format, as an open-source initiative under the LF AI & Data Foundation at ABBYY Ascend 2026. A dedicated working group has also been formed, inviting technology providers and enterprises to participate in shaping the standard.

Positioned as a vendor-neutral specification, DocLang aims to create a unified, AI-readable format for enterprise documents, addressing the long-standing disconnect between unstructured formats such as PDFs and JPEGs and modern AI systems. Backed by the Linux Foundation ecosystem, the initiative emphasises neutral governance, community-driven development, and interoperability to enable scalable AI innovation without fragmentation.

DocLang introduces an abstraction layer that converts unstructured documents into structured, machine-readable data while preserving semantic meaning and geometric layout. It encodes elements such as headings, paragraphs, and tables with precise positioning, optimised for AI tokenisation and efficient processing. The format also embeds governance controls, allowing enforcement of privacy limits, extraction rules, and model training permissions.

“DocLang is specifically engineered to address industry challenges with a minimal, standardised, and AI-native method for representing document structure, meaning, layout, and governance,” said Maxime Vermeir, Vice President, AI Strategy at ABBYY. “Being designed for efficient machine processing provides a predictable structure optimised for modern AI tokenisation and modelling techniques. Organisations will see a significant difference with more reliable interpretation, reduced hallucinations, and lower computational costs.”

Mark Collier, GM of AI & Infrastructure and ED of LF AI & Data, added, “Standards matter most when the technology landscape is moving fastest. They create the common language that allows innovation to scale without increasing fragmentation. Open, vendor-neutral specifications are the backbone of Kubernetes, cloud-native platforms, AI systems, and increasingly complex data workflows; they have enabled a high-value ecosystem while supporting interoperability, performance, and security.”

He further noted, “Efforts like DocLang are important because they help bring structure, interoperability, and trust to a part of the stack that has become critical for AI, but remains highly inconsistent across tools and environments.”

ABBYY also demonstrated DocLang integration within ABBYY FineReader beta, showcasing capabilities such as AI-powered OCR, document conversion, PDF editing, and enhanced text recognition, pointing to DocLang’s role as a foundational layer for future AI data pipelines.

LEAVE A REPLY

Please enter your comment!
Please enter your name here