Skip to content
GCC AI Research

Search

Results for "perturbation"

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv ·

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

Award-winning algorithm aids observation

KAUST ·

KAUST researchers developed a machine learning algorithm to control a deformable mirror within the Subaru Telescope's exoplanet imaging camera, compensating for atmospheric turbulence. The algorithm, which computes a partial singular value decomposition (SVD), outperforms a standard SVD by a factor of four. The KAUST team received a best paper award at the PASC Conference for this work, which has already been deployed at the Subaru Telescope. Why it matters: This advancement enables sharper images of exoplanets, facilitating their identification and study, and showcases the impact of optimizing core linear algebra algorithms.

Confidence sets for Causal Discovery

MBZUAI ·

A new framework for constructing confidence sets for causal orderings within structural equation models (SEMs) is presented. It leverages a residual bootstrap procedure to test the goodness-of-fit of causal orderings, quantifying uncertainty in causal discovery. The method is computationally efficient and suitable for medium-sized problems while maintaining theoretical guarantees as the number of variables increases. Why it matters: This offers a new dimension of uncertainty quantification that enhances the robustness and reliability of causal inference in complex systems, but there is no indication of connection to the Middle East.

On Transferability of Machine Learning Models

MBZUAI ·

This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.