Skip to content
GCC AI Research

Search

Results for "metadata extraction"

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs

arXiv ·

KAUST researchers introduced MOLE, a framework leveraging LLMs for automated metadata extraction from scientific papers. The system processes documents in multiple formats and validates outputs, targeting datasets beyond Arabic. A new benchmark dataset has been released to evaluate progress in metadata extraction.

Masader: Metadata Sourcing for Arabic Text and Speech Data Resources

arXiv ·

Researchers created Masader, the largest public catalog for Arabic NLP datasets, containing 200 datasets annotated with 25 attributes. They developed a metadata annotation strategy applicable to other languages. The paper highlights issues within current Arabic NLP datasets and suggests recommendations. Why it matters: This curated dataset directory helps lower the barrier to entry for Arabic NLP research and development.