The Artificial Analysis Text to Image Leaderboard & Arena aims to evaluate and compare text-to-image generation models, both open-source and proprietary, based on their effectiveness and accuracy according to human preferences23. The initiative uses a crowdsourcing approach to gather human preference data, allowing for the comparison of key models and providing valuable insights into the comparative performance of leading image models.
The Artificial Analysis Image Arena uses a crowdsourcing approach to gather human preference data on a large scale25. Participants are presented with prompts and two generated images, from which they must select the one that best matches the prompt. This process generates over 700 images per model, covering diverse styles and categories. The preferences are then used to calculate an ELO score for each model, providing a comparative ranking.
The ELO scores in the Artificial Analysis Text to Image Leaderboard & Arena are calculated using the ELO rating system, which is based on the outcomes of pairwise comparisons. In this case, human participants select the image that best matches the prompt, and the preferences are used to calculate the ELO score for each model, providing a comparative ranking.