MBZUAI researchers introduce Web2Code, a new large-scale dataset and evaluation framework for training and benchmarking multimodal LLMs on webpage understanding and HTML code generation. The dataset includes webpage images, HTML code, and QA pairs about webpage content. Experiments demonstrate the dataset's utility in webpage understanding, code generation, and general visual domain tasks, with code and data available on Github.
The provided content mentions KAUST (King Abdullah University of Science and Technology) and its association with King Abdullah bin Abdulaziz Al Saud. It also includes a copyright notice. Why it matters: This is a routine update reflecting KAUST's branding and legal information.
The paper introduces AraGPT2, a suite of pre-trained transformer models for Arabic language generation, with the largest model (AraGPT2-mega) containing 1.46 billion parameters. Trained on a large Arabic corpus of internet text and news, AraGPT2-mega demonstrates strong performance in synthetic news generation and zero-shot question answering. To address the risk of misuse, the authors also released a discriminator model with 98% accuracy in detecting AI-generated text. Why it matters: This release of both the model and discriminator fills a critical gap in Arabic NLP and encourages further research and applications in the field.
The UAE's National Programme for Coders will train 20,000 students in coding across eight universities, including MBZUAI and Khalifa University. The program includes 500 training opportunities at local and international companies. Amazon, Huawei, and IBM will launch digital libraries providing resources on AI, data science, and other technologies. Why it matters: This initiative aims to bolster the UAE's AI talent pool and enhance graduates' competitiveness in the job market through practical coding skills.