DeepSeek OCR 2: Visual Causal Flow for Documents

DeepSeek OCR 2 is an open-source OCR and document understanding model built for complex layouts, Markdown output, and human-like reading order.

What Is DeepSeek OCR 2?

DeepSeek OCR 2 is an open-source image-to-text and document understanding model from DeepSeek-AI.

It is designed to read documents, screenshots, PDFs, tables, formulas, and complex page layouts more intelligently than traditional OCR systems. Instead of only scanning pixels from left to right, it uses Visual Causal Flow to follow a more logical reading order.

Model Overview

Item Details
Model name DeepSeek OCR 2
Developer DeepSeek-AI
Model type OCR, image-to-text, document understanding, vision-language model
Main idea Visual Causal Flow for human-like visual encoding
Core encoder DeepEncoder V2
Model size 3B parameters
Tensor type BF16
License Apache-2.0
Availability Open-source weights on Hugging Face and code on GitHub
Main input Document images, screenshots, scanned pages, PDFs
Main output Plain text or structured Markdown
Inference support Transformers, vLLM, SGLang, Docker-based workflows
Best use cases Complex OCR, document parsing, Markdown conversion, layout-aware extraction
Price Free model weights; running cost depends on your own GPU or hosting provider

Features

Visual Causal Flow

DeepSeek OCR 2 introduces Visual Causal Flow, a reading mechanism that reorders visual tokens based on document meaning.

This helps the model handle pages where the natural reading order is not the same as the pixel order.

Visual Causal Flow
From: https://github.com/deepseek-ai/DeepSeek-OCR-2

Better Complex Layout Understanding

The model is built for documents with columns, tables, figures, formulas, and mixed visual structure.

This makes it more useful for research papers, reports, scanned documents, and pages where standard OCR often returns broken text.

Markdown Document Conversion

DeepSeek OCR 2 can convert document images into Markdown using a dedicated prompt.

This is useful when you want clean headings, paragraphs, tables, and formulas instead of raw extracted text.

Plain Text OCR Mode

For simpler tasks, the model also supports plain OCR without layout grounding.

This is useful for quick text extraction from screenshots, scanned notes, and basic document images.

Dynamic Resolution Support

DeepSeek OCR 2 supports dynamic resolution, using multiple 768×768 crops plus a 1024×1024 global view by default.

This helps the model process detailed pages without forcing every image into a single fixed resolution.

Open-Source Deployment

The model is available through Hugging Face and GitHub under the Apache-2.0 license.

Developers can run it locally with Transformers, use vLLM for faster inference, or build OCR tools around the model weights.

FAQ

Is DeepSeek OCR 2 open source?

Yes. DeepSeek OCR 2 is available with model weights and code, and the Hugging Face model page lists the license as Apache-2.0.

What is DeepSeek OCR 2 used for?

It is used for OCR, document parsing, image-to-text conversion, and converting complex document images into structured Markdown.

It is especially useful for PDFs, academic papers, tables, formulas, and multi-column pages.

How is DeepSeek OCR 2 different from traditional OCR?

Traditional OCR usually reads text in a fixed visual order.

DeepSeek OCR 2 tries to understand the logical structure of the page first, then follows a more human-like reading order.

Can DeepSeek OCR 2 convert PDFs to Markdown?

Yes. The official GitHub workflow includes PDF processing, and the model’s main document prompt is designed to convert documents into Markdown.

Does DeepSeek OCR 2 support tables and formulas?

Yes. Its layout-aware design makes it suitable for tables, mathematical formulas, and structured documents.

Results still depend on image quality, layout complexity, and inference settings.

Can I run DeepSeek OCR 2 locally?

Yes. The model can be used with Hugging Face Transformers, and the official repository also provides vLLM inference guidance.

A modern NVIDIA GPU is recommended for practical local inference.

Is DeepSeek OCR 2 free?

The model weights are free to access under an open-source license.

You still need to pay for your own GPU, server, or third-party inference provider if you deploy it at scale.

Is DeepSeek OCR 2 better than DeepSeek OCR?

DeepSeek OCR 2 is the newer version and focuses on Visual Causal Flow, a more advanced approach to reading order and layout understanding.

For complex documents, this is the main reason to choose DeepSeek OCR 2 over the first version.