variables: 736524
Data license: CC-BY
This data as json
id | name | unit | description | createdAt | updatedAt | code | coverage | timespan | datasetId | sourceId | shortUnit | display | columnOrder | originalMetadata | grapherConfigAdmin | shortName | catalogPath | dimensions | schemaVersion | processingLevel | processingLog | titlePublic | titleVariant | attributionShort | attribution | descriptionShort | descriptionFromProducer | descriptionKey | descriptionProcessing | licenses | license | grapherConfigETL | type | sort | dataChecksum | metadataChecksum |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
736524 | Average accuracy on all knowledge tests | % | This benchmark assesses the average accuracy of models across all subjects based on the MMLU benchmark. The MMLU benchmark covers a wide range of 57 subjects, including STEM, humanities, social sciences, and more. It encompasses subjects of varying difficulty levels, spanning from elementary concepts to advanced professional topics. This comprehensive benchmark assesses not only world knowledge but also problem-solving abilities. | 2023-07-03 14:54:48 | 2024-07-08 15:20:30 | 6102 | 29574 | % | { "unit": "%", "zeroDay": "2019-01-01", "shortUnit": "%", "yearIsDay": true, "numDecimalPlaces": 1 } |
0 | performance_language_average | grapher/artificial_intelligence/2023-06-14/papers_with_code_benchmarks/papers_with_code_benchmarks#performance_language_average | 1 | [] |
float | [] |
350d88ac6e544f9a30fdfc2d60f9ca9f | a4e7bac7e8e71ce4627973ea44b60302 |