Dataset Structure

What the dataset looks like

By CRISP Research

ITALIC contains 10,000 carefully curated questions selected from an initial corpus of 2,110,643 questions.

Each question is formatted as a multiple-choice query, with an average question length of 87 characters and a median of 4 answer options. The longest question is 577 characters long. The minimum number of choices per question is 2, while the maximum is 5. The total number of tokens across the input data amounts to 499,963.

Column Data Type Description
question [String] The actual content of the question
options [List] The options to choose from. Only one is correct
answer [String] The correct answer out of the options
category [String] The dedicated cultural section of the question
Share: X (Twitter) Facebook LinkedIn