Skip to content
GCC AI Research

Search

Results for "data compression"

Professor Marc Genton and former postdoctoral fellow win the 2017 Wilcoxon Award

KAUST ·

KAUST Professor Marc Genton and his former postdoc Stefano Castruccio jointly won the 2017 Wilcoxon Award for their paper in Technometrics. Their paper, "Compressing an ensemble with statistical models: An algorithm for global 3D spatio-temporal temperature," details a data-compression scheme for climate simulations. The method reduces data-storage requirements and accelerates climate research capacity. Why it matters: This award highlights KAUST's contribution to statistical methods for climate modeling and big data analysis, particularly relevant for studying renewable energy resources in Saudi Arabia.

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv ·

The paper introduces Sparse-Quantized Representation (SpQR), a new compression format and quantization technique for large language models (LLMs). SpQR identifies outlier weights and stores them in higher precision while compressing the remaining weights to 3-4 bits. The method achieves less than 1% accuracy loss in perplexity for LLaMA and Falcon LLMs and enables a 33B parameter LLM to run on a single 24GB consumer GPU. Why it matters: This enables near-lossless compression of LLMs, making powerful models accessible on resource-constrained devices and accelerating inference without significant accuracy degradation.

Overcoming the curse of dimensionality

MBZUAI ·

MBZUAI Professor Fakhri Karray and co-authors from the University of Waterloo have published "Elements of Dimensionality Reduction and Manifold Learning," a textbook on methods for extracting useful components from large datasets. The book addresses the challenge of the "curse of dimensionality," where growth in datasets complicates their use in machine learning. Karray developed the material from a popular course he taught at Waterloo. Why it matters: The textbook provides a unified resource for students and researchers in machine learning and AI, addressing a foundational challenge in processing high-dimensional data, relevant to diverse applications in the region.

CTRL: Closed-Loop Data Transcription via Rate Reduction

MBZUAI ·

A talk introduces a computational framework for learning a compact structured representation for real-world datasets, that is both discriminative and generative. It proposes to learn a closed-loop transcription between the distribution of a high-dimensional multi-class dataset and an arrangement of multiple independent subspaces, known as a linear discriminative representation (LDR). The optimality of the closed-loop transcription can be characterized in closed-form by an information-theoretic measure known as the rate reduction. Why it matters: The framework unifies concepts and benefits of auto-encoding and GAN and generalizes them to the settings of learning a both discriminative and generative representation for multi-class visual data.

KAUST and the Big Data age

KAUST ·

KAUST held a research workshop on Optimization and Big Data, gathering researchers to discuss challenges and opportunities in the field. Speakers presented novel optimization algorithms and distributed systems for handling large datasets. The workshop featured 20 speakers from KAUST, global universities, and Microsoft Research. Why it matters: The event highlights KAUST's role as a regional hub for advancing research and development in big data and optimization, crucial for AI and various computational fields.

SlimPajama-DC: Understanding Data Combinations for LLM Training

arXiv ·

Researchers at MBZUAI release SlimPajama-DC, an empirical analysis of data combinations for pretraining LLMs using the SlimPajama dataset. The study examines the impact of global vs. local deduplication and the proportions of highly-deduplicated multi-source datasets. Results show that increased data diversity after global deduplication is crucial, with the best configuration outperforming models trained on RedPajama.

Machine Learning Integration for Signal Processing

TII ·

Technology Innovation Institute's (TII) Directed Energy Research Center (DERC) is integrating machine learning (ML) techniques into signal processing to accelerate research. One project used convolutional neural networks to predict COVID-19 pneumonia from chest x-rays with 97.5% accuracy. DERC researchers also demonstrated that ML-based signal and image processing can retrieve up to 68% of text information from electromagnetic emanations. Why it matters: This adoption of ML for signal processing at TII highlights the potential for advanced AI techniques to enhance research and security applications in the UAE.

Going under the hood to improve AI efficiency

MBZUAI ·

MBZUAI's computer science department, led by Xiaosong Ma, focuses on improving AI efficiency and sustainability by reducing wasted resources. Xiaosong's background in high-performance computing informs her approach to optimizing AI workloads. She aims to collaborate with experts across different AI domains at MBZUAI to address these challenges. Why it matters: Optimizing AI efficiency is crucial for reducing the environmental impact and computational costs associated with increasingly complex AI models in the GCC region and globally.