We propose a method for creating a matte the per-pixel foreground color and alpha of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte. Automatic, trimap-free methods are appearing, but are not of comparable quality. In our trimap free approach, we ask the user to take an additional photo of the background without the subject at the time of capture. This step requires a small amount of foresight but is far less timeconsuming than creating a trimap. We train a deep network with an adversarial loss to predict the matte. We first train a matting network with supervised loss on ground truth data with synthetic composites. To bridge the domain gap to real imagery with no labeling, we train another matting network guided by the first network and by a discriminator that judges the quality of composites. We demonstrate results on a wide variety of photos and videos and show significant improvement over the state of the art.
[Paper Arxiv], to appear in CVPR 2020
title={Background Matting: The World is Your Green Screen},
author = {Soumyadip Sengupta and Vivek Jayaram and Brian Curless and Steve Seitz and Ira Kemelmacher-Shlizerman},
booktitle={Computer Vision and Pattern Regognition (CVPR)},
Blog Post, with simplified methods and discussions
Captured videos for Background Matting
We capture 50 videos of subjects performing different motion with fix and hand-held camera in both indoor and outdoor settings. We also capture the background as the subject leaves the scene. We will soon release this data to help future research on Background Matting.
Comparison with existing methods
We show qualitative comparison w.r.t. Background substraction, Semantic segmentation (Deeplabv3+) and Alpha matting techniques. For Alpha matting algorithms, we compare with state-of-the-art (i) trimap based methods Context Aware Matting (CAM) and Index Matting (IM), where trimap is automatically created from segmentation, and (ii) automatic matting algorithm Late Fusion Matting (LFM). Our algorithm is first trained on synthetic-composite Adobe dataset with supervision (Ours Adobe) and then on unlabelled real data with self-supervision and adversarial loss (Ours Real). We also show that trianing on real data improves matting quality.
The authors thank the labmates from UW GRAIL lab, Ellie Bridge and Andrey Ryabstev for their support in data capturing and helpful discussions. This work was supported by NSF/Intel Visual and Experimental Computing Award #1538618, the UW Reality Lab, Facebook, Google, Futurewei.