Skip to main content

Spring Deadline: Sunday, March 1 @ 11:59pm PT. Click here to apply.

Back to Research
Accepted to SoLaR @ NeurIPS 2024

NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with LLMs

William Tan

Abstract

We present NusaMT-7B, a 7-billion parameter multilingual machine translation model specifically designed for Southeast Asian languages. Despite the region being home to over 1,200 languages, existing translation systems provide limited support for most of them. NusaMT-7B covers 23 Southeast Asian languages, including many low-resource languages like Javanese, Sundanese, and Khmer. We introduce novel training techniques for handling low-resource language pairs and demonstrate state-of-the-art performance on the FLORES benchmark for covered languages, with particular gains for underrepresented language pairs.

Citation

William Tan. "NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with LLMs". Accepted to SoLaR @ NeurIPS 2024.

Details

Conference
Accepted to SoLaR @ NeurIPS 2024
Authors
1 author

Publish Your Research

Join Algoverse and work with world-class mentors to publish at top AI conferences.

Start Your Application