MBZUAI researchers developed FarSight, a plugin to reduce hallucinations in Multimodal Large Language Models (MLLMs). FarSight addresses the issue where MLLMs generate inaccurate text by losing focus on relevant image details, leading to snowball hallucinations. Testing on models like LLaVA-1.5-7B showed FarSight's effectiveness in reducing initial mistakes, thereby minimizing overall hallucinations. Why it matters: Improving the reliability of MLLMs is crucial for applications requiring high accuracy, enhancing their utility in various real-world scenarios.
A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.
Presight CEO Thomas Pramotedham is highlighted for his instrumental role in developing and deploying artificial intelligence solutions for national-scale intelligence initiatives within the UAE. Presight, a leading UAE-based AI company, focuses on leveraging advanced AI capabilities to enhance public sector operations, security, and smart city infrastructure. The company's efforts are critical to advancing the UAE's strategic vision for digital transformation and data-driven governance across various national sectors. Why it matters: This demonstrates how local AI leaders and companies are directly implementing sophisticated AI systems to support national strategic goals and enhance government capabilities within the UAE.