Accepted to SoLaR @ NeurIPS 2024

Fine-Tuning Language Models for Ethical Ambiguity

Pranav Senthilkumar, Visshwa Bala, Prisha Jain, Aneesa Maity

Abstract

Current approaches to aligning large language models (LLMs) with human values often treat ethical decisions as binary classifications. We argue that this approach fails to capture the nuanced nature of real-world ethical dilemmas. We introduce an ethical ambiguity fine-tuning framework that teaches LLMs to recognize, articulate, and reason about situations where multiple valid ethical perspectives exist. Our method leverages a curated dataset of morally ambiguous scenarios annotated with diverse stakeholder perspectives. Experiments show that models fine-tuned with our approach demonstrate improved nuance in ethical reasoning while maintaining safety guardrails.

Citation

Pranav Senthilkumar, Visshwa Bala, Prisha Jain, Aneesa Maity. "Fine-Tuning Language Models for Ethical Ambiguity". Accepted to SoLaR @ NeurIPS 2024.

Resources

View on arXiv

Details

Conference: Accepted to SoLaR @ NeurIPS 2024
Authors: 4 authors

Related Publications

Explore more research from Algoverse

NeurIPS 2025 (Spotlight)

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application