Skip to content
GCC AI Research

I see what you’re saying: the Abu Dhabi AI researchers making video dubbing sync

MBZUAI · Notable

Summary

Researchers at MBZUAI have developed Auto-DUB, a system using deep learning, NLP, and CV to improve audio-visual dubbing, particularly for educational videos. The three-step process generates subtitles, creates an audio representation, and synchronizes the audio with lip movements. The system aims to overcome language barriers in e-learning by providing accurate translations and lip-synced audio. Why it matters: This research addresses a critical need in online education by making content more accessible to non-native English speakers, potentially expanding access to global educational resources in the Arab world.

Keywords

MBZUAI · dubbing · e-learning · NLP · CV

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

NYU Abu Dhabi translates speech into sign language using AI - The National

The National ·

Researchers at NYU Abu Dhabi have developed an AI system capable of translating spoken language into sign language. This innovative technology aims to enhance communication accessibility for individuals who are deaf or hard-of-hearing. The system leverages advancements in artificial intelligence, likely combining natural language processing for speech understanding and computer vision for sign generation. Why it matters: This development has the potential to significantly improve inclusion and communication for deaf communities within the Middle East and globally, bridging critical communication gaps.

MBZUAI team wins top prize at inaugural Arabic Natural Language Processing Conference

MBZUAI ·

An MBZUAI team won the best paper award at the inaugural Arabic Natural Language Processing Conference for their work on processing Arabic speech. Their study establishes a new approach to tackle the complexities of spoken Arabic, which differs significantly from text-based language models. The team's approach aims to advance new tools for Arabic speakers by addressing challenges like intonation and the continuous nature of speech. Why it matters: This award highlights the importance of specialized research in Arabic NLP, as mainstream LLMs often face limitations in accurately processing the nuances of Arabic speech.

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

arXiv ·

MBZUAI researchers introduce PG-Video-LLaVA, a large multimodal model with pixel-level grounding capabilities for videos, integrating audio cues for enhanced understanding. The model uses an off-the-shelf tracker and grounding module to localize objects in videos based on user prompts. PG-Video-LLaVA is evaluated on video question-answering and grounding benchmarks, using Vicuna instead of GPT-3.5 for reproducibility.