Skip to content
Tech News
← Back to articles

VOID: Video Object and Interaction Deletion

read original more articles
Why This Matters

VOID introduces a groundbreaking method for removing objects and their physical interactions from videos, enabling more realistic and seamless editing. Its interaction-aware inpainting enhances the authenticity of scene modifications, which is valuable for content creators, filmmakers, and digital editors seeking high-quality video manipulation.

Key Takeaways

VOID: Video Object and Interaction Deletion

VOID removes objects from videos along with all interactions they induce on the scene — not just secondary effects like shadows and reflections, but physical interactions like objects falling when a person is removed. It is built on top of CogVideoX and fine-tuned for video inpainting with interaction-aware mask conditioning.

Example: If a person holding a guitar is removed, VOID also removes the person's effect on the guitar — causing it to fall naturally.

teaser-with-name.mp4

🤖 Models

VOID uses two transformer checkpoints, trained sequentially. You can run inference with Pass 1 alone or chain both passes for higher temporal consistency.

Model Description HuggingFace VOID Pass 1 Base inpainting model Download VOID Pass 2 Warped-noise refinement model Download

Place checkpoints anywhere and pass the path via --config.video_model.transformer_path (Pass 1) or --model_checkpoint (Pass 2).

▶️ Quick Start

The fastest way to try VOID is the included notebook — it handles setup, downloads the models, runs inference on a sample video, and displays the result:

... continue reading