1. Document understanding
Real-world documents are messy, unstructured, and difficult to parse — often filled with interleaved images, illegible handwritten text, nested tables, complex mathematical notation and non-linear layouts. Gemini 3 Pro represents a major leap forward in this domain, excelling across the entire document processing pipeline — from highly accurate Optical Character Recognition (OCR) to complex visual reasoning.
Intelligent perception
To truly understand a document, a model must accurately detect and recognize text, tables, math formulas, figures and charts regardless of noise or format.
A fundamental capability is "derendering" — the ability to reverse-engineer a visual document back into structured code (HTML, LaTeX, Markdown) that would recreate it. As illustrated below, Gemini 3 demonstrates accurate perception across diverse modalities including converting an 18th-century merchant log into a complex table, or transforming a raw image with mathematical annotation into precise LaTeX code.