Glyph: Scaling Context Windows via Visual-Text Compression
Glyph is a framework for scaling the context length through visual-text compression. Instead of extending token-based context windows, Glyph renders long textual sequences into images and processes them using vision–language models (VLMs). This design transforms the challenge of long-context modeling into a multimodal problem, substantially reducing computational and memory costs while preserving semantic information.
(Upper) Comparison of two paradigms for long-context tasks: conventional approaches directly feeding plain text into LLMs, and the proposed VLM-based paradigm, Glyph, which renders text as compact images to achieve substantial input-token compression. (Lower) Glyph attains competitive performance on LongBench and MRCR, while offering significant compression and inference speedup over its text backbone model on 128K-token inputs.
Table of Contents
Demo
We provide a ready-to-run demo script that deploys both a baseline text model (Qwen3/GLM4 etc.) and Glyph, enabling comparison of long-context inference efficiency.
After downloading the model, to see a side-by-side comparison of the output from Qwen3 and Glyph, run:
cd demo bash run_demo_compared.sh
This demo will:
Start a text-only LLM
... continue reading