Skip to content
GCC AI Research

Bring an order to the chaos: Order-Preserving IO stack for Modern Flash storage

MBZUAI · Notable

Summary

Professor Won from KAIST presented a talk at MBZUAI on ensuring storage order in modern IO stacks. He discussed separating durability and ordering mechanisms to avoid expensive transfer-and-flush methods. The talk covered order-preserving IO stacks for single-queue block devices, multi-queue IO stacks, and all-flash arrays. Why it matters: Optimizing IO stacks is crucial for improving the performance and efficiency of storage systems in AI infrastructure and data centers.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

On Optimizing Mobile Memory, Storage, and Beyond

MBZUAI ·

Prof. Chun Jason Xue from the City University of Hong Kong presented research on optimizing mobile memory and storage by analyzing mobile application characteristics, noting their differences from server applications. The research explores system software designs inherited from the Linux kernel and identifies optimization opportunities in mobile memory and storage management. Xue's work aims to enhance user experience on mobile devices through mobile application characterization, focusing on non-volatile and flash memories. Why it matters: Optimizing mobile systems based on the unique characteristics of mobile applications can significantly improve device performance and user experience in the region.

Optimizing AI Systems through Cross-Layer Design: A Data-Centric Approach

MBZUAI ·

A Duke University professor presented a data-centric approach to optimizing AI systems by addressing the memory capacity and bandwidth bottleneck. The presentation covered collaborative optimization across algorithms, systems, architecture, and circuit layers. It also explored compute-in-memory as a solution for integrating computation and memory. Why it matters: Optimizing AI systems through a data-centric approach can improve efficiency and performance, critical for advancing AI applications in the region.

Programmable Networks for Distributed Deep Learning: Advances and Perspectives

MBZUAI ·

A presentation discusses using programmable network devices to reduce communication bottlenecks in distributed deep learning. It explores in-network aggregation and data processing to lower memory needs and increase bandwidth usage. The talk also covers gradient compression and the potential role of programmable NICs. Why it matters: Optimizing distributed deep learning infrastructure is critical for scaling AI model training in resource-constrained environments.

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv ·

The paper introduces Sparse-Quantized Representation (SpQR), a new compression format and quantization technique for large language models (LLMs). SpQR identifies outlier weights and stores them in higher precision while compressing the remaining weights to 3-4 bits. The method achieves less than 1% accuracy loss in perplexity for LLaMA and Falcon LLMs and enables a 33B parameter LLM to run on a single 24GB consumer GPU. Why it matters: This enables near-lossless compression of LLMs, making powerful models accessible on resource-constrained devices and accelerating inference without significant accuracy degradation.