Skip to content
GCC AI Research

Search

Results for "racism"

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

arXiv ·

ArabDiscrim is a new corpus comprising 293,000 public Arabic Facebook posts from 2014 to 2024, specifically curated to discuss racism and discrimination. Unlike prior Twitter-centric datasets, it incorporates platform-native engagement signals, 200 curated terms with morphological regex families, and 20 discrimination axes. The resource also provides explicit attribution patterns and is released under a restricted research-use license for ethical compliance. Why it matters: This dataset provides a unique, ecologically valid foundation for fairness-oriented and platform-aware Arabic Natural Language Processing, moving beyond existing Twitter-centric resources.