Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
Published in Advances in Neural Information Processing Systems, 2023
Recommended citation: George Stein, Jesse C. Cresswell, Rasa Hosseingzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In Advances in Neural Information Processing Systems, volume 36, 2023
We study image-based generative models spanning semantically-diverse datasets to understand and improve the feature extractors and metrics used to evaluate them. We conduct the largest human experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations, and that diffusion models are unfairly punished by common metrics based on Inception. We show that DINOv2-ViT-L/14 is the best alternative to Inception.
[Paper] [PDF] [Code]