VOID: Video Object and Interaction Deletion

VOID removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed. It is built on top of CogVideoX and fine-tuned for video inpainting with interaction-aware mask conditioning.

Example: If a person holding a guitar is removed, VOID also removes the person's effect on the guitar — causing it to fall naturally.

teaser-with-name.mp4

🤖 Models

VOID uses two transformer checkpoints, trained sequentially. You can run inference with Pass 1 alone or chain both passes for higher temporal consistency.

Model Description HuggingFace VOID Pass 1 Base inpainting model Download VOID Pass 2 Warped-noise refinement model Download

Place checkpoints anywhere and pass the path via --config.video_model.transformer_path (Pass 1) or --model_checkpoint (Pass 2).

▶️ Quick Start

The fastest way to try VOID is the included notebook — it handles setup, downloads the models, runs inference on a sample video, and displays the result:

... continue reading