MBZUAI Professor Chih-Jen Lin gave a keynote at the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval in Taipei. Lin's address, titled ‘On the “Rough Use” of Machine Learning Techniques’, focused on instances where machine learning techniques are employed inappropriately, using examples from graph representation learning and deep neural networks. He advocated for the development of high-quality, user-friendly software to improve the practical application of machine learning and mitigate misuse. Why it matters: Showcases MBZUAI's faculty expertise and contributions to the discussion on responsible AI research and deployment on a global stage.
This paper introduces an enhanced Dense Passage Retrieval (DPR) framework tailored for Arabic text retrieval. The core innovation is an Attentive Relevance Scoring (ARS) mechanism that improves semantic relevance modeling between questions and passages, replacing standard interaction methods. The method integrates pre-trained Arabic language models and architectural refinements, achieving improved retrieval and ranking accuracy for Arabic question answering. Why it matters: This work addresses the underrepresentation of Arabic in NLP research by providing a novel approach and publicly available code to improve Arabic text retrieval, which can benefit various applications like Arabic search engines and question-answering systems.
This paper presents six experiments evaluating personalization and user tracking in web search engine results. The experiments involve comparing search results based on VPN location (including UAE vs others), logged-in status, network type, search engine, browser, and trained Google accounts. The study measures total hits, first hit, and correlation between hits to identify patterns of personalization. Why it matters: The findings shed light on the extent of filter bubble effects and potential biases in search results for users in the UAE and globally.
This paper introduces a unified deep autoregressive model (UAE) for cardinality estimation that learns joint data distributions from both data and query workloads. It uses differentiable progressive sampling with the Gumbel-Softmax trick to incorporate supervised query information into the deep autoregressive model. Experiments show UAE achieves better accuracy and efficiency compared to state-of-the-art methods.