Accepted to CogInterp @ NeurIPS 2025

DecepBench: Benchmarking Multimodal Deception Detection

Vittesh Maganti, Nysa Lalye, Ethan Braverman

Abstract

As AI systems become more sophisticated, the ability to detect deceptive or manipulative language becomes increasingly important for safety. We introduce DecepBench, a benchmark designed to evaluate the capacity of language models to identify deceptive statements across multiple dimensions including intentional misdirection, selective omission, and strategic ambiguity. DecepBench comprises 8,000 examples sourced from negotiation transcripts, political discourse, and synthetic scenarios. Our evaluation reveals that current models perform poorly on subtle forms of deception, highlighting a critical gap in AI safety research. We provide detailed error analysis and propose directions for improving deception detection capabilities.

Citation

Vittesh Maganti, Nysa Lalye, Ethan Braverman. "DecepBench: Benchmarking Multimodal Deception Detection". Accepted to CogInterp @ NeurIPS 2025.

Resources

Details

Conference: Accepted to CogInterp @ NeurIPS 2025
Authors: 3 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application