Baidu has open-sourced Unlimited-OCR, a long-document OCR model that can process over 40 pages at once using a new memory-efficient architecture, strengthening China’s push in open-source document AI.
Baidu has open-sourced Unlimited-OCR, a new optical character recognition (OCR) model capable of reading more than 40 pages of documents simultaneously, marking a significant advance in open-source document AI. The project has already attracted more than 4,000 GitHub stars within two days of its release, reflecting strong developer interest.
At the heart of Unlimited-OCR is Reference Sliding Window Attention (R-SWA), a new architecture that references the full document image and only the previous 128 output tokens during text generation. This approach delivers consistent memory usage, stable inference speed and improved scalability for long documents without the performance degradation seen in conventional OCR systems.
Although built on a 3-billion-parameter architecture, only 500 million parameters are active during inference, making the model lightweight and computationally efficient. It also improves document parsing by preserving tables, reading order, document structure and context—capabilities that are increasingly important for generating high-quality training data for large language models and connecting enterprise document repositories with AI systems.
The release builds on Baidu’s widely adopted open-source PaddleOCR project and reinforces China’s growing leadership in open-source OCR infrastructure. Chinese firms including DeepSeek, Tencent and Alibaba have also released open-source OCR technologies, highlighting a broader industry shift towards foundational document AI. The trend suggests open-source innovation is expanding beyond language models to technologies that enable AI systems to efficiently understand long, complex documents.















































































