Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

FaceSafe: An Inpainting Pipeline for Privacy-Compliant Scalable Image Datasets

FaceSafe: An Inpainting Pipeline for Privacy-Compliant Scalable Image Datasets

December 1, 2025

Large-scale web-scraped datasets have contributed significantly to progress in deep learning, yet the extensive presence of biometrics data, such as faces, poses a legitimate legal, ethics, and privac...

Accepted to DIG-BUG @ ICML 2025

Authors: Sydney Su, Lening Nick Cui, Ananya Salian, Roger You, Hao Qi Cui

Large-scale web-scraped datasets have contributed significantly to progress in deep learning, yet the extensive presence of biometrics data, such as faces, poses a legitimate legal, ethics, and privacy issue. Existing approaches address this by removing sensitive images entirely, often sacrificing downstream performance, or purchasing use of licensed images. We present FaceSafe, a novel privacy preserving transformation pipeline that uses a diffusion-based inpainting model to systematically replace detected faces in images with synthetic variants conditioned on different demographic attributes, resulting in a privacy-preserving dataset. Evaluated on 12,000 images transformed from LAION-400M and CelebA-HQ, FaceSafe eliminates privacy risks without significant loss of image quality or diversity.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.