Dr. Teresa Lynn from Dublin City University (DCU) discussed the challenges in developing NLP tools for Irish, a low-resource language facing digital extinction. She highlighted the lack of speech and language applications and fundamental language resources for Irish. Lynn also mentioned her work at DCU on the GaelTech project and her involvement in the European Language Equality project. Why it matters: The development of NLP tools for low-resource languages like Irish is crucial for preserving linguistic diversity and preventing digital marginalization in the AI era.
Mausam, head of Yardi School of AI at IIT Delhi and affiliate professor at University of Washington, will discuss Neuro-Symbolic AI. The talk will cover recent research threads with applications in NLP, probabilistic decision-making, and constraint satisfaction. Mausam's research explores neuro-symbolic machine learning, computer vision for radiology, NLP for robotics, multilingual NLP, and intelligent information systems. Why it matters: Neuro-Symbolic AI is gaining importance as it combines the strengths of neural and symbolic approaches, potentially leading to more robust and explainable AI systems.
Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.
A new paper coauthored by researchers at The University of Melbourne and MBZUAI explores disagreement in human annotation for AI training. The paper treats disagreement as a signal (human label variation or HLV) rather than noise, and proposes new evaluation metrics based on fuzzy set theory. These metrics adapt accuracy and F-score to cases where multiple labels may plausibly apply, aligning model output with the distribution of human judgments. Why it matters: This research addresses a key challenge in NLP by accounting for the inherent ambiguity in human language, potentially leading to more robust and human-aligned AI systems.
TII and LightOn have partnered to build the NOOR Platform for exascale computing, aimed at developing foundation models. The collaboration will leverage LightOn's expertise in large language models, with the first output being the largest Arabic language model to date. The platform will provide high-quality data pipelines and facilitate extreme-scale distributed training and serving. Why it matters: This partnership aims to establish Abu Dhabi as a center of AI excellence and boost the UAE's ambitions in high-tech innovation and NLP research.