Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

arXiv · February 19, 2024 · Significant research

Summary

This paper investigates the intrinsic self-correction capabilities of LLMs, identifying model confidence as a key latent factor. Researchers developed an "If-or-Else" (IoE) prompting framework to guide LLMs in assessing their own confidence and improving self-correction accuracy. Experiments demonstrate that the IoE-based prompt enhances the accuracy of self-corrected responses, with code available on GitHub.

Keywords

LLM · self-correction · confidence · prompting · IoE framework

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv · Feb 1

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

Summary

Keywords

Related

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards