Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Advancing Uto-Aztecan Language Technologies: A Case Study on Comanche

Advancing Uto-Aztecan Language Technologies: A Case Study on Comanche

December 1, 2025

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investiga...

Accepted to AmericasNLP @ NAACL 2025

Authors: Jesus Alvarez C, Daua D. Karajeanes, Ashley Celeste Prado, John Ruttan

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. The findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion.

Begin Your Journey

The application takes 10 minutes and is reviewed on a rolling basis. We look for strong technical signal—projects, coursework, or competition results—and a genuine curiosity to do real research.

If admitted, you will join a structured pipeline with direct mentorship to take your work from ideation to top conference submission at venues like NeurIPS, ACL, and EMNLP.