Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
2025
Global PIQA is a participatory benchmark for evaluating physically grounded commonsense reasoning in large language models across more than 100 languages and over 100 language varieties. It consists of multiple-choice questions about everyday physical situations, many of which are tied to local foods, customs, and practices, in order to test whether models capture culturally specific commonsense knowledge rather than only generic patterns. The paper shows that, while state-of-the-art models perform well on average, they still lag significantly in many lower-resource languages, and that open models generally underperform proprietary ones, highlighting persistent gaps in everyday knowledge across languages.
For this project, I contributed French examples to the Global PIQA dataset, helping to design French questions that encode local, physically grounded commonsense knowledge.
Recommended citation: Tyler A. Chang, Catherine Arnett et al. (2025). "Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures." Preprint.
Download Paper | Code
