While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a >15× acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting.
Moebius: 0.2B image inpainting model with 10B-level performance
Why This Matters
Moebius introduces a highly efficient, lightweight image inpainting model that rivals large-scale industrial models in quality while drastically reducing computational costs. Its innovative architecture and adaptive distillation strategy enable high-fidelity results with significantly fewer parameters, making advanced image editing more accessible and practical for a wider range of applications. This breakthrough paves the way for more efficient AI models in the tech industry, balancing performance and resource consumption for consumers and developers alike.
Key Takeaways
- Moebius achieves comparable inpainting quality with less than 2% of the parameters of larger models.
- The model delivers over 15× faster inference times, enhancing real-time applications.
- Innovative architecture preserves complex interactions despite extreme compression, setting new efficiency standards.
Get alerts for these topics