MBZUAI researchers introduce Web2Code, a new large-scale dataset and evaluation framework for training and benchmarking multimodal LLMs on webpage understanding and HTML code generation. The dataset includes webpage images, HTML code, and QA pairs about webpage content. Experiments demonstrate the dataset's utility in webpage understanding, code generation, and general visual domain tasks, with code and data available on Github.
MBZUAI researchers introduced Web2Code, a new dataset suite, at NeurIPS to enhance multimodal LLM performance in web page analysis and HTML generation. The suite includes a fine-tuning dataset and two benchmark datasets. Instruction tuning with Web2Code improved performance on specialized tasks without affecting general capabilities. Why it matters: This contribution addresses a key limitation in current multimodal LLMs, potentially boosting productivity in web design and development by providing targeted training data.
A study compared the vulnerability of C programs generated by nine state-of-the-art Large Language Models (LLMs) using a zero-shot prompt. The researchers introduced FormAI-v2, a dataset of 331,000 C programs generated by these LLMs, and found that at least 62.07% of the generated programs contained vulnerabilities, detected via formal verification. The research highlights the need for risk assessment and validation when deploying LLM-generated code in production environments.
G42, a global leader in artificial intelligence based in Abu Dhabi, partnered with creative innovation company R/GA to launch alpha.G42.ai, a generative interface designed to transform traditional websites into dynamic, conversational systems. This prototype redefines a brand's digital presence by employing an intelligent agent powered by integrated large language models (LLMs) to generate and curate personalized content for each visitor in real-time. The system processes various content types as knowledge, which it then synthesizes to produce dynamic, tailored outputs for users interacting via voice or text, moving beyond static content management. Why it matters: This initiative from a major UAE AI firm pioneers a novel approach to web interfaces, potentially influencing future digital interactions and content delivery globally.
This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.