Skip to main content

Summer Deadline: Sunday, March 29 @ 11:59pm PT. Click to apply.

DecepBench: Benchmarking Multimodal Deception Detection

DecepBench: Benchmarking Multimodal Deception Detection

December 1, 2025

As AI systems become more sophisticated, the ability to detect deceptive or manipulative language becomes increasingly important for safety. We introduce DecepBench, a benchmark designed to evaluate t...

Accepted to CogInterp @ NeurIPS 2025

Authors: Vittesh Maganti, Nysa Lalye, Ethan Braverman

As AI systems become more sophisticated, the ability to detect deceptive or manipulative language becomes increasingly important for safety. We introduce DecepBench, a benchmark designed to evaluate the capacity of language models to identify deceptive statements across multiple dimensions including intentional misdirection, selective omission, and strategic ambiguity. DecepBench comprises 8,000 examples sourced from negotiation transcripts, political discourse, and synthetic scenarios. Our evaluation reveals that current models perform poorly on subtle forms of deception, highlighting a critical gap in AI safety research. We provide detailed error analysis and propose directions for improving deception detection capabilities.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.