Skip to main content

Summer Deadline: Sunday, March 29 @ 11:59pm PT. Click to apply.

Back to Research
Accepted to CogInterp @ NeurIPS 2025

DecepBench: Benchmarking Multimodal Deception Detection

Vittesh Maganti, Nysa Lalye, Ethan Braverman

Abstract

As AI systems become more sophisticated, the ability to detect deceptive or manipulative language becomes increasingly important for safety. We introduce DecepBench, a benchmark designed to evaluate the capacity of language models to identify deceptive statements across multiple dimensions including intentional misdirection, selective omission, and strategic ambiguity. DecepBench comprises 8,000 examples sourced from negotiation transcripts, political discourse, and synthetic scenarios. Our evaluation reveals that current models perform poorly on subtle forms of deception, highlighting a critical gap in AI safety research. We provide detailed error analysis and propose directions for improving deception detection capabilities.

Citation

Vittesh Maganti, Nysa Lalye, Ethan Braverman. "DecepBench: Benchmarking Multimodal Deception Detection". Accepted to CogInterp @ NeurIPS 2025.

Resources

Details

Conference
Accepted to CogInterp @ NeurIPS 2025
Authors
3 authors

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application