Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Fine-Tuning Language Models for Ethical Ambiguity

Fine-Tuning Language Models for Ethical Ambiguity

December 1, 2024

Current approaches to aligning large language models (LLMs) with human values often treat ethical decisions as binary classifications. We argue that this approach fails to capture the nuanced nature o...

Accepted to SoLaR @ NeurIPS 2024

Authors: Pranav Senthilkumar, Visshwa Bala, Prisha Jain, Aneesa Maity

Current approaches to aligning large language models (LLMs) with human values often treat ethical decisions as binary classifications. We argue that this approach fails to capture the nuanced nature of real-world ethical dilemmas. We introduce an ethical ambiguity fine-tuning framework that teaches LLMs to recognize, articulate, and reason about situations where multiple valid ethical perspectives exist. Our method leverages a curated dataset of morally ambiguous scenarios annotated with diverse stakeholder perspectives. Experiments show that models fine-tuned with our approach demonstrate improved nuance in ethical reasoning while maintaining safety guardrails.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.