Skip to content
GCC AI Research

Study on the paradox of ‘low-resource’ languages wins Outstanding Paper Award at EMNLP

MBZUAI · Significant research

Summary

A study co-authored by researchers from UC Berkeley, University of the Witwatersrand, Lelapa AI, and MBZUAI received the Outstanding Paper Award at EMNLP 2024. The paper critiques the term "low-resource" languages in NLP, highlighting its limitations in capturing the diverse challenges faced by different languages. The authors propose a more detailed analysis of resourcedness to encourage targeted support for languages currently underserved by technology. Why it matters: The research challenges assumptions in NLP and promotes more nuanced approaches to supporting the world's many languages, including Arabic, in AI systems.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Faculty win EACL 2023 outstanding paper

MBZUAI ·

MBZUAI faculty Alham Fikri Aji, Timothy Baldwin, and Fajri Koto won an Outstanding Paper Award at EACL 2023 for their paper "NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages." The paper introduces the first parallel resource for 10 Indonesian low-resource languages to boost performance in sentiment analysis and machine translation. The dataset is available on HuggingFace. Why it matters: This work highlights MBZUAI's commitment to advancing NLP research in low-resource languages, which can help preserve linguistic diversity and improve access to digital resources for speakers of underrepresented languages.

New method reveals major cross-lingual gaps in language models

MBZUAI ·

Researchers at MBZUAI have developed a new automatic method to examine cross-lingual abilities in multilingual language models, testing 10 models across 16 languages. They combined beam search with language-model-based simulation, generating 6,000 bilingual question pairs and found significant performance drops compared to English, even in high-resource languages like Chinese. The method introduces perturbations to test the models' ability to transfer knowledge rather than rely on memorization. Why it matters: This research highlights critical gaps in cross-lingual AI, providing a framework for developing more equitable and effective multilingual models, especially for Arabic and other under-represented languages.

Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks

arXiv ·

This paper benchmarks multilingual and monolingual LLM performance across Arabic, English, and Indic languages, examining model compression effects like pruning and quantization. Multilingual models outperform language-specific counterparts, demonstrating cross-lingual transfer. Quantization maintains accuracy while promoting efficiency, but aggressive pruning compromises performance, particularly in larger models. Why it matters: The findings highlight strategies for scalable and fair multilingual NLP, addressing hallucination and generalization errors in low-resource languages.

Challenges in low-resourced NLP: an Irish case study

MBZUAI ·

Dr. Teresa Lynn from Dublin City University (DCU) discussed the challenges in developing NLP tools for Irish, a low-resource language facing digital extinction. She highlighted the lack of speech and language applications and fundamental language resources for Irish. Lynn also mentioned her work at DCU on the GaelTech project and her involvement in the European Language Equality project. Why it matters: The development of NLP tools for low-resource languages like Irish is crucial for preserving linguistic diversity and preventing digital marginalization in the AI era.