Glyph: Scaling Context Windows via Visual-Text Compression

Glyph is a framework for scaling the context length through visual-text compression. Instead of extending token-based context windows, Glyph renders long textual sequences into images and processes them using vision–language models (VLMs). This design transforms the challenge of long-context modeling into a multimodal problem, substantially reducing computational and memory costs while preserving semantic information.

(Upper) Comparison of two paradigms for long-context tasks: conventional approaches directly feeding plain text into LLMs, and the proposed VLM-based paradigm, Glyph, which renders text as compact images to achieve substantial input-token compression. (Lower) Glyph attains competitive performance on LongBench and MRCR, while offering significant compression and inference speedup over its text backbone model on 128K-token inputs.

Table of Contents

Demo

We provide a ready-to-run demo script that deploys both a baseline text model (Qwen3/GLM4 etc.) and Glyph, enabling comparison of long-context inference efficiency.

After downloading the model, to see a side-by-side comparison of the output from Qwen3 and Glyph, run:

cd demo bash run_demo_compared.sh

This demo will:

Start a text-only LLM

... continue reading