Accepted to EACL Main Conference 2026

MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

Sagarika Banerjee, Tangatar Madi, Advait Swaminathan, Nguyen Dao Manh Anh

Abstract

Fine-grained image-caption alignment is a crucial component of robust visuo-linguistic compositional reasoning, enabling models to perform effectively in socially critical contexts such as visual risk assessment and cultural context reasoning. MiSCHiEF (Minimal-pairs in Safety & Culture for Holistic Evaluation of Fine-grained alignment) consists of two datasets: MiS (Minimal-pairs in Safety) and a culture-focused component. Our benchmark reveals that models generally perform better at confirming correct image-caption pairs than rejecting incorrect ones, and achieve higher accuracy when selecting the correct caption from two highly similar captions for a given image.

Citation

Sagarika Banerjee, Tangatar Madi, Advait Swaminathan, Nguyen Dao Manh Anh. "MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment". Accepted to EACL Main Conference 2026.

Resources

OpenReview

Details

Conference: Accepted to EACL Main Conference 2026
Authors: 4 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application