Accepted to Wordplay @ EMNLP 2025 (Poster)
Authors: Lang Xiong, Raina Gao, Alyssa Jeong
We introduce Sarc7, a benchmark that classifies 7 types of sarcasm: self-deprecating, brooding, deadpan, polite, obnoxious, raging, and manic by annotating entries of the MUStARD dataset. The Sarc7 benchmark supports two tasks: (1) multi-class sarcasm classification, where given a sarcastic utterance and its dialogue context, the model predicts the dominant sarcasm type from seven annotated categories, and (2) sarcasm generation, where the model generates a sarcastic utterance consistent with one of the 7 types. Classification was evaluated using zero-shot, few-shot, chain-of-thought (CoT), and a novel emotion-based prompting technique. Emotion-based prompting yields the highest macro-averaged F1 score of 0.3664 (Gemini 2.5), outperforming CoT for several models. Human evaluators preferred emotion-based generations 38.46% more often than zero-shot baselines.

