Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

Published in Advances in Neural Information Processing Systems, 2023

Recommended citation: George Stein, Jesse C. Cresswell, Rasa Hosseingzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In Advances in Neural Information Processing Systems, volume 36, 2023

We study image-based generative models spanning semantically-diverse datasets to understand and improve the feature extractors and metrics used to evaluate them. We conduct the largest human experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations, and that diffusion models are unfairly punished by common metrics based on Inception. We show that DINOv2-ViT-L/14 is the best alternative to Inception.

[Paper] [PDF] [Code]