Skip to content
GCC AI Research

Search

Results for "LLM extraction"

An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification

arXiv ·

Researchers proposed a four-stage NLP framework combining schema-constrained LLM extraction, Sentence-BERT (SBERT) alignment with ESCO, an adjudication protocol, and a verification mechanism for curriculum-labor market alignment. The framework was instantiated for the ABET-accredited BSc Computer Science program at the United Arab Emirates University (UAEU), extracting 400 competency records from the study plan and aligning them with 30 job postings. The extractor achieved a Cohen's kappa of 0.79 on the skill slot and surfaced interpretable supply-demand gaps in general, transversal, algorithms, and software engineering skills, with a minimal gap in AI and data science. Why it matters: This framework provides a robust, NLP-driven method to identify crucial skill gaps in higher education curricula, directly supporting quality assurance and workforce development initiatives in the region.

Knowledge distillation and the greening of LLMs

MBZUAI ·

Researchers from MBZUAI, University of British Columbia, and Monash University have created LaMini-LM, a collection of small language models distilled from ChatGPT. LaMini-LM is trained on a dataset of 2.58M instructions and can be deployed on consumer laptops and mobile devices. The smaller models perform almost as well as larger counterparts while addressing security concerns. Why it matters: This work enables the deployment of LLMs in resource-constrained environments and enhances data security by reducing reliance on cloud-based LLMs.

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs

arXiv ·

KAUST researchers introduced MOLE, a framework leveraging LLMs for automated metadata extraction from scientific papers. The system processes documents in multiple formats and validates outputs, targeting datasets beyond Arabic. A new benchmark dataset has been released to evaluate progress in metadata extraction.