Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0
This paper presents a methodology for digitizing and encoding the Al-Mawrid Arabic-English dictionary using the ISO Lexical Markup Framework (LMF) and TEI Lex-0 guidelines. The research resolves structural ambiguities and inconsistencies, achieving a structural parsing accuracy of 91% and high precision/recall for information extraction, such as 85% precision for synonyms. It also discusses limitations of TEI Lex-0 for Arabic phenomena and explores Linguistic Linked Open Data (LLOD) integration. Why it matters: This work provides a crucial, standardized computational lexicon for Arabic, addressing a significant gap in Arabic lexical infrastructure and offering a reproducible workflow for retro-digitization efforts in Arabic NLP and Digital Humanities.