|
|
|
|
|
|
|
|
|
|
|
This paper presents the HandsOff framework, which is capable of producing synthetic images with corresponding pixel-level labels without requiring additional human annotations. By unifying the fields of synthetic dataset generation and GAN inversion, HandsOff utilizes a small number of real labeled images and exploits the rich latent representations of GANs to train a label generator. Paired with a GAN generator, we now have access to infinitely many image-label pairs! |
Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations of synthetic images, which forces practitioners to seek out annotators, curate a set of synthetic images, and ensure the quality of generated labels. We introduce the HandsOff framework, a technique capable of producing an unlimited number of synthetic images and corresponding labels after being trained on less than 50 pre-existing labeled images. Our framework avoids the practical drawbacks of prior work by unifying the field of GAN inversion with dataset generation. We generate datasets with rich pixel-wise labels in multiple challenging domains such as faces, cars, full-body human poses, and urban driving scenes. Our method achieves state-of-the-art performance in semantic segmentation, keypoint detection, and depth estimation compared to prior dataset generation approaches and transfer learning baselines. We additionally showcase its ability to address broad challenges in model development which stem from fixed, hand-annotated datasets, such as the long-tail problem in semantic segmentation. |
|
Examples of synthetically generated images and corresponding labels (segmentation masks, keypoints, depth maps) across four uniquely challenging domains. |
A key feature unlocked by HandsOff is the ability to control the composition of the training set. We use this control to improve long-tail part segmentation. As we increase the number of images in the training set with the long-tail part, not only do our generated labels improve, the label generator's uncertainty in assigning long-tail labels decreases. |
The video below highlights the improvement in synthesizing images with the glasses label, a relatively rare occurring class in the dataset. As the number of glasses in our training set increases, the produced label identifies glasses more accurately with lower uncertainty. |
|
|
Austin Xu, Mariya I. Vasileva, Achal Dave, Arjun Seshadri HandsOff: Labeled Dataset Generation with No Additional Human Annotations. CVPR 2023 Highlight. (hosted on ArXiv) |
|
Acknowledgements |