DeepSeek Switches From OpenAI CLIP To Alibaba Open Source Model In OCR Upgrade

0
1
DeepSeek Replaces OpenAI-Linked CLIP With Alibaba Open Source Model To Boost OCR Performance
DeepSeek Replaces OpenAI-Linked CLIP With Alibaba Open Source Model To Boost OCR Performance

DeepSeek has boosted OCR accuracy by replacing OpenAI-linked CLIP with Alibaba’s open-source Qwen2-0.5b model, underscoring how China’s open source AI stack is delivering real performance gains.

Chinese artificial intelligence start-up DeepSeek has upgraded its optical character recognition system by adopting Alibaba Cloud’s open source AI technology, delivering measurable performance gains.

The company unveiled DeepSeek-OCR 2, an updated version of its OCR model, which replaces a core architectural component with Alibaba Cloud’s lightweight open-source Qwen2-0.5b model, according to a research paper released on Tuesday. The new model replaces CLIP (Contrastive Language Image Pre-training), a framework developed by Microsoft-backed OpenAI in 2021.

Benchmark tests show a 3.73 per cent performance improvement over the previous version. DeepSeek described the uplift as “a meaningful gain on an already high accuracy base.”

According to the research, replacing CLIP enabled the OCR system to process documents by following “flexible yet semantically coherent scanning patterns driven by inherent logical structures,” mimicking how humans read.

DeepSeek has open-sourced DeepSeek-OCR 2 on Hugging Face, making the model publicly available to developers worldwide. The upgrade was announced just over three months after the launch of the original DeepSeek-OCR system.

The move highlights the growing role of China’s domestic open-source ecosystem in advancing AI development, with start-ups increasingly drawing on locally developed open-source models. Alibaba Cloud is the AI and cloud computing arm of Alibaba Group Holding.

The update follows academic scrutiny of DeepSeek’s original OCR approach. Researchers from China and Japan previously reported inconsistent performance under certain conditions, noting accuracy drops in visual question-answering tasks when misleading text was introduced.

DeepSeek said it will continue refining its OCR architecture for broader applications while pushing “towards a more comprehensive vision of multimodal intelligence.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here