ITALIC is a benchmark evaluating language models’ understanding of Italian culture, commonsense reasoning and linguistic proficiency in a morphologically rich language.

Above are example questions from ITALIC. Note: every example is a direct translation; the original questions are in Italian. The correct option is marked by (✓).
Dataset Details
Dataset Description
We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies.
ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. It serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware natural language systems.
- Curated by: CRISP research centre https://crispresearch.it/
- Language(s) (NLP): Italian
- License: MIT
Dataset Sources
- Huggingface: https://huggingface.co/datasets/Crisp-Unimib/ITALIC
- Zenodo [ADD THIS LATER]
- Paper: [ACCEPTED AT NAACL25]
Citation
BibTeX:
[COMING SOON]
APA:
[COMING SOON]