Researchers from MBZUAI, University of Washington, and other institutions presented studies at EMNLP 2024 exploring how LLMs represent cultures. A survey analyzed dozens of recent studies on LLMs and culture and proposes a new framework for future research. The survey found that there is no widely accepted definition of 'culture' in NLP, making it challenging to interpret how models represent culture through language. Why it matters: This highlights a key gap in the field and emphasizes the need for a more rigorous and consistent understanding of culture in AI, especially as LLMs become more globally integrated.
A new paper from MBZUAI introduces JEEM, a benchmark dataset for evaluating vision-language models on their understanding of images grounded in four Arabic-speaking societies (Jordan, UAE, Egypt, and Morocco) and their ability to use local dialects. The dataset comprises 2,178 images and 10,890 question-answer pairs reflecting everyday life and culturally specific scenes. Evaluation of several Arabic-capable models (Maya, PALO, Peacock, AIN, AyaV) and GPT-4o revealed that while models can generate fluent language, they struggle with genuine understanding, consistency, and relevance, especially when cultural context is important. Why it matters: This research highlights the challenges of building AI systems that can truly understand and interact with diverse cultures, emphasizing the need for culturally grounded datasets and evaluation metrics.
MBZUAI researchers release JEEM, a new benchmark dataset for evaluating vision-language models on Arabic dialects. The dataset covers image captioning and visual question answering tasks using images from Jordan, UAE, Egypt, and Morocco. Results show models struggle with cultural understanding and relevance despite fluent language generation.