variables: 852592

This table contains all the variables of all the datasets we have at Our World In Data. Note that this includes only the information about the variables like name and default display settings, not the data itself.

Data license: CC-BY

This data as json

id	name	unit	description	createdAt	updatedAt	code	coverage	timespan	datasetId	sourceId	shortUnit	display	columnOrder	originalMetadata	grapherConfigAdmin	shortName	catalogPath	dimensions	schemaVersion	processingLevel	processingLog	titlePublic	titleVariant	attributionShort	attribution	descriptionShort	descriptionFromProducer	descriptionKey	descriptionProcessing	licenses	license	grapherConfigETL	type	sort	dataChecksum	metadataChecksum
852592	Test scores of the AI relative to human performance			2024-04-02 22:09:24	2024-07-08 15:19:13			1998-2023	6457			{}	0			performance	grapher/artificial_intelligence/2024-04-02/dynabench/dynabench#performance		2	minor						Human performance, as the benchmark, is set to zero. The capability of each AI system is normalized to an initial performance of -1.		[]	We mapped the benchmarks to their respective domains based on a review of each benchmark's primary focus and the specific capabilities it tests within AI systems: - MNIST was mapped to "Handwriting recognition", as it tests AI systems' ability to recognize and classify handwritten digits, a fundamental task in the domain of digital image processing. - GLUE was categorized under "Language understanding" due to its assessment of models across a variety of linguistic tasks, highlighting the general capabilities of AI in understanding human language. - ImageNet was categorized as "Image recognition", focusing on the ability of AI systems to accurately identify and categorize images into predefined classes, showcasing the advancements in visual perception. - SQuAD 1.1 and SQuAD 2.0 were distinguished as "Reading comprehension" and "Reading comprehension with unanswerable questions" respectively. While both benchmarks evaluate reading comprehension, SQuAD 2.0 adds an extra layer of complexity with the introduction of unanswerable questions, demanding deeper understanding and reasoning from AI models. - BBH was aligned with "Complex reasoning", as it challenges AI with tasks that require not just logical reasoning but also creative thinking, simulating complex problem-solving scenarios. - Switchboard was associated with "Speech recognition" due to its focus on transcribing and understanding human speech within a conversational context, evaluating AI's ability to process and respond to spoken language. - MMLU was placed in "General knowledge tests", given its assessment across multiple disciplines and topics, requiring a broad and comprehensive understanding of language. - HellaSwag was mapped to "Predictive reasoning" for its evaluation of AI's ability to predict logical continuations within given contexts, testing commonsense reasoning and understanding. - HumanEval was categorized under "Code generation", focusing on AI's capability to understand programming languages and generate code that solves specific problems, highlighting skills in logical thinking and algorithmic problem-solving. - SuperGLUE was designated as "Nuanced language interpretation" due to its advanced set of linguistic tasks that require deep understanding, reasoning, and interpretation of text, pushing the boundaries of what AI can comprehend. - GSK8k was mapped to "Math problem-solving", as it tests AI on solving mathematical problems that involve reasoning and logical deduction, reflecting capabilities in numerical understanding and problem-solving.				float	[]	a6595b55743a42288a208f1ac3835d7f	d1c49cd1390867c73746c31287476c96

Links from other tables

1 row from variableId in chart_dimensions
0 rows from variableId in explorer_variables
1 row from variableId in origins_variables
0 rows from variableId in posts_gdocs_variables_faqs
1 row from variableId in tags_variables_topic_tags
1 row from variableId in chart_variables