VOID: Video Object and Interaction Deletion
VOID removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed. It is built on top of CogVideoX and fine-tuned for video inpainting with interaction-aware mask conditioning.
Example: If a person holding a guitar is removed, VOID also removes the person's effect on the guitar — causing it to fall naturally.
teaser-with-name.mp4
🤖 Models
VOID uses two transformer checkpoints, trained sequentially. You can run inference with Pass 1 alone or chain both passes for higher temporal consistency.
Model Description HuggingFace VOID Pass 1 Base inpainting model Download VOID Pass 2 Warped-noise refinement model Download
Place checkpoints anywhere and pass the path via --config.video_model.transformer_path (Pass 1) or --model_checkpoint (Pass 2).
▶️ Quick Start
The fastest way to try VOID is the included notebook — it handles setup, downloads the models, runs inference on a sample video, and displays the result:
... continue reading