variables: 820641
Data license: CC-BY
This data as json
id | name | unit | description | createdAt | updatedAt | code | coverage | timespan | datasetId | sourceId | shortUnit | display | columnOrder | originalMetadata | grapherConfigAdmin | shortName | catalogPath | dimensions | schemaVersion | processingLevel | processingLog | titlePublic | titleVariant | attributionShort | attribution | descriptionShort | descriptionFromProducer | descriptionKey | descriptionProcessing | licenses | license | grapherConfigETL | type | sort | dataChecksum | metadataChecksum |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
820641 | Training dataset size | datapoints | 2024-02-15 16:16:43 | 2024-07-08 16:09:53 | 2019-2023 | 6385 | { "unit": "datapoints", "numDecimalPlaces": 0 } |
0 | dataset_size__tokens | grapher/artificial_intelligence/2024-02-15/epoch_llms/epoch_llms#dataset_size__tokens | 2 | minor | [ "Training data size refers to the amount or quantity of data that is used to train an AI model, indicating the number of examples or instances available for the model to learn from.", "Imagine you're teaching a computer to recognize different types of fruits. The training data size would refer to the number of fruit images you show to the computer during the training process. If you show it 100 images of various fruits, then the training data size would be 100. The more training data you provide, the better the computer can learn to identify different fruits accurately.", "The size of the datasets used to train Gemini Ultra, PaLM-2, GPT-4, GPT-3.5 and code-davinci-002 models are speculative and have not been officially disclosed." ] |
int | [] |
1824375b7745ec689ba0ced8672e90d5 | 1256271d3d9843a10be4ce95a17745d6 |