posts: 46569
Data license: CC-BY
This data as json
id | title | slug | type | status | content | archieml | archieml_update_statistics | published_at | updated_at | gdocSuccessorId | authors | excerpt | created_at_in_wordpress | updated_at_in_wordpress | featured_image | formattingOptions | markdown | wpApiSnapshot |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
46569 | Easier data reuse and more flexible visualizations - what we in the developer team are working on | dev-team-working-on-2021 | post | publish | <!-- wp:paragraph --> <p>In the last few weeks, we – the developers here at Our World in Data – have done a lot of brainstorming and planning to flesh out what capabilities we want to add to the Our World In Data site. Some of these features will make it easier to reuse our data, some will make it easier to view data from different angles.</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>We are currently <a href="https://ourworldindata.org/jobs">hiring for two technical roles</a> to help us build what we are describing below - so if you are interested yourself, have a look at the two job profiles or forward them to people you know who might be a good fit (the application <strong>deadline is Dec 5th</strong>) — thanks!</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>The improvements we are currently planning can be sorted into 3 main topics: </p> <!-- /wp:paragraph --> <!-- wp:list --> <ul><li><a href="#easier-data-reuse">Making it easier to reuse our data</a></li><li><a href="#richer-data-model">Using a richer data model</a></li><li><a href="#more-flexible-data-visualization">Enabling more flexible visualizations</a></li></ul> <!-- /wp:list --> <!-- wp:heading {"level":3} --> <h3>Easier data reuse</h3> <!-- /wp:heading --> <!-- wp:columns --> <div class="wp-block-columns"><!-- wp:column --> <div class="wp-block-column"><!-- wp:paragraph --> <p>If you are working with data yourself, then you are probably aware that on every chart on our site you can switch to the “Download” tab and download a CSV file with the underlying data. This has a few shortcomings. For one, only the data that our authors end up using in charts is easily accessible like this. But we have a much bigger catalog of more than 100,000 indicators in our internal database that we would like to open up. Ernst, our head of product design, likens this to a museum that only has a small part of its collection on display.</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>We are now working to make all this data available, and to do so in a form that is convenient to use for data scientists. We are creating a public index so that you can quickly discover if we have data on a certain area available, and then fetch that data as a tidy data frame in a modern file format (like Apache feather or parquet) that is easy to consume from Python, R or Observable notebooks. You can try out an experimental version of this index in Python already using the <a rel="noreferrer noopener" href="https://github.com/owid/owid-catalog-py" target="_blank">owid-catalog-py</a> package.</p> <!-- /wp:paragraph --> <!-- wp:heading {"level":4} --> <h4>Reusable metadata</h4> <!-- /wp:heading --> <!-- wp:paragraph --> <p>A large part of the work that our data team and our authors are doing is to curate data and add metadata. This is important because the data we collect comes from different sources, both from large institutions like the World Bank and the WHO, but also from individual researchers. As a reader, when deciding how much to trust the data, it helps immensely to understand where it came from.</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>Harmonizing this data on a technical level, so it uses the same date formats and country names, enables joining all this data together. But only by also recording information about how this data was collected, and its limitations, can real insight be drawn from it. We are now standardizing the metadata that we are collecting and will always serve it alongside the data files in JSON format, so that all this curation work can be reused in addition to the data that we already reshare.</p> <!-- /wp:paragraph --></div> <!-- /wp:column --> <!-- wp:column --> <div class="wp-block-column"><!-- wp:image {"id":46589,"sizeSlug":"large","linkDestination":"none"} --> <figure class="wp-block-image size-large"><img src="https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1-725x550.png" alt="Shows the download data option" class="wp-image-46589"/></figure> <!-- /wp:image --></div> <!-- /wp:column --></div> <!-- /wp:columns --> <!-- wp:heading {"level":3} --> <h3>Richer data model</h3> <!-- /wp:heading --> <!-- wp:paragraph --> <p>Another benefit of moving away from our closed internal database as the central data store is that we will be able to leverage richer data models. To understand why this is important, you should know that we currently bring all individual data points into one large MySQL table that has just 4 columns: Year/Date, Entity/Country, Variable, Value. </p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>This has worked well for us for a long time, since most of the data we are interested in is heavily aggregated, so country and year were good enough and kept things simple. But we now want to enable richer data models - our COVID-19 data effort already stretched the current model with the need for daily data, and so adding different granularities of time is one powerful change. But we also want to be able to break down critical indicators by sex or age group if the upstream data source provides this. </p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>Currently, when we want to include data like this we have to create new, independent variables. For example, deaths from smoking may end up becoming many variables like “Deaths - smoking - female - age 15-25” instead of just one with many dimensions. Authors then have to remember which variables to show next to each other in an article or chart. By making it possible to store additional dimensions other than year and country, we will be able to do this automatically and allow users to switch between levels of detail. </p> <!-- /wp:paragraph --> <!-- wp:columns --> <div class="wp-block-columns"><!-- wp:column --> <div class="wp-block-column"><!-- wp:heading {"level":4} --> <h4>Drill down into the details</h4> <!-- /wp:heading --> <!-- wp:paragraph --> <p>We are also planning to add proper support for hierarchies within dimensions so that we will be able to do proper drill-down and drill-up in our charts. If you look at this chart on child mortality, you’ll see that this shows data for the entire world and split by continent. In the top-left corner, you can find the “Add country” button to change this selection and show individual countries. </p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>This view you see initially is a good starting point, but has two issues. First, it had to be manually configured this way. Second, if you click on the “Add country” button then the continents and individual countries are all just shown as one long list, sorted alphabetically. In the future we will be able to show different sections in the country selector for different groupings automatically, but we’ll also be able to do this for other dimensions like cause of death, so you can get a broad picture first and then dive into the details.</p> <!-- /wp:paragraph --></div> <!-- /wp:column --> <!-- wp:column --> <div class="wp-block-column"><!-- wp:html --> <iframe src="https://ourworldindata.org/grapher/child-mortality-around-the-world" loading="lazy" style="width: 100%; height: 600px; border: 0px none;"></iframe> <!-- /wp:html --></div> <!-- /wp:column --></div> <!-- /wp:columns --> <!-- wp:columns --> <div class="wp-block-columns"><!-- wp:column --> <div class="wp-block-column"><!-- wp:heading {"level":4} --> <h4>Visualizing uncertainty</h4> <!-- /wp:heading --> <!-- wp:paragraph --> <p>Finally, we are planning to add metadata information to express the relationship between variables. </p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>One of the first areas where we want to use this feature is to add proper support for confidence intervals. Visualizing the uncertainty inherent in data or projections is very important, but at the moment we rarely do it because it currently poses all sorts of UI problems. By making our grapher understand these relationships, we’ll be able to use the visual hints for confidence intervals that are widely used in data visualization.</p> <!-- /wp:paragraph --></div> <!-- /wp:column --> <!-- wp:column --> <div class="wp-block-column"><!-- wp:html --> <iframe src="https://ourworldindata.org/grapher/daily-new-estimated-covid-19-infections-icl-model" loading="lazy" style="width: 100%; height: 600px; border: 0px none;"></iframe> <!-- /wp:html --></div> <!-- /wp:column --></div> <!-- /wp:columns --> <!-- wp:heading {"level":3} --> <h3>More flexible data visualization</h3> <!-- /wp:heading --> <!-- wp:paragraph --> <p>The final area for technical improvements is our visualization tool. Some items on our roadmap in this area are technically pretty simple, but we think they will give our readers and authors interesting new capabilities. </p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>For one, we want to make the content of our grapher charts more flexible, for example by allowing authors to create slideshows of static images or to use other visualization libraries than our handcrafted grapher. Our existing chart drawing code works very well to provide a limited set of chart types with a lot of standardized features, but sometimes a more bespoke setup would be useful. One example of this is an idea that we are currently working on to visualize war casualties over several hundred years where we need to visualize conflicting estimates, different kinds of casualties etc. all over a very long time period.</p> <!-- /wp:paragraph --> <!-- wp:heading {"level":4} --> <h4>Reusing our visualization tool</h4> <!-- /wp:heading --> <!-- wp:paragraph --> <p>We also want to make our grapher easier to reuse for other projects. Currently, the code base of the charting component is quite entangled with the internal administration UI that is used to create them, and it assumes many particularities about the infrastructure it runs on. We want to split this out so that the grapher becomes a separate NPM package that can be used in other projects. Since all our code is open source, we also hope that this will make it easier for others to contribute back to our charting system.</p> <!-- /wp:paragraph --> <!-- wp:heading {"level":4} --> <h4>Open exploration & contextual information</h4> <!-- /wp:heading --> <!-- wp:paragraph --> <p>With the advances described above, we also want to make the grapher more open to exploration - this means the users should be able to search for and visualize all the variables in our catalog. Our philosophy here is that we want our metadata to describe our data so well that generating high quality visualizations requires no further config. We hope that the ability to create arbitrary scatter plots or line charts will be another interesting look into our data collection.</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>When exploring data like this, it is important to understand what exactly you are looking at. Our authors spend a lot of time thinking about how best to explain critical concepts like for example “International constant $” that need to be understood to interpret our charts correctly. </p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>Currently, these definitions live either in prose in some part of our website or are squeezed into the subtitles of our charts. We want to experiment with a new additional canvas next to our charts that will be used to summarize important concepts that are needed to understand a specific chart. This might go as far as showing secondary charts or third party content, all with the aim to make sure that our readers draw valid conclusions from our content.</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>Finally, we are thinking of creating a new way for our authors to create their articles. We currently use a headless WordPress installation as our storage for articles, but our authors prefer to do the actual writing in Google Docs. In an ideal world they would be able to stay in Google Docs but be able to embed and configure interactive charts in that same window, without going through the detour of WordPress to have a more seamless editing experience and experience less friction in their daily work.</p> <!-- /wp:paragraph --> <!-- wp:heading {"level":3} --> <h3>Closing thoughts</h3> <!-- /wp:heading --> <!-- wp:paragraph --> <p>As you can see, we have many ideas in the pipeline and are excited to be working on them. Thanks to the many donations we receive, we are able to grow our technical team and increase the speed at which we are working on this.</p> <!-- /wp:paragraph --> <!-- wp:paragraph --> <p>If you are interested in joining us or contributing in some other way, then please get in touch! Likewise, if you’re interested in being an early adopter of our technical work, please <a href="http://eepurl.com/c7qucj" target="_blank" rel="noreferrer noopener">subscribe to our beta mailing list</a>.</p> <!-- /wp:paragraph --> <!-- wp:owid/prominent-link {"title":"Full-Stack Engineer","linkUrl":"https://ourworldindata.org/full-stack-engineer","className":"is-style-thin"} --> <!-- wp:paragraph --> <p>Remote (US East & EU/African timezones preferred)</p> <!-- /wp:paragraph --> <!-- /wp:owid/prominent-link --> <!-- wp:owid/prominent-link {"title":"Data Engineer","linkUrl":"https://ourworldindata.org/data-engineer","className":"is-style-thin"} --> <!-- wp:paragraph --> <p>Remote (US East & EU/African timezones preferred)</p> <!-- /wp:paragraph --> <!-- /wp:owid/prominent-link --> <!-- wp:html --> <div class="blog-info"> <p>An update on what we are working on and how we plan to improve the Our World In Data website.</p> </div> <!-- /wp:html --> <!-- wp:paragraph --> <p></p> <!-- /wp:paragraph --> | { "id": "wp-46569", "slug": "dev-team-working-on-2021", "content": { "toc": [], "body": [ { "type": "text", "value": [ { "text": "In the last few weeks, we \u2013\u00a0the developers here at Our World in Data \u2013 have done a lot of brainstorming and planning to flesh out what capabilities we want to add to the Our World In Data site. Some of these features will make it easier to reuse our data, some will make it easier to view data from different angles.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "We are currently ", "spanType": "span-simple-text" }, { "url": "https://ourworldindata.org/jobs", "children": [ { "text": "hiring for two technical roles", "spanType": "span-simple-text" } ], "spanType": "span-link" }, { "text": " to help us build what we are describing below - so if you are interested yourself, have a look at the two job profiles or forward them to people you know who might be a good fit (the application ", "spanType": "span-simple-text" }, { "children": [ { "text": "deadline is Dec 5th", "spanType": "span-simple-text" } ], "spanType": "span-bold" }, { "text": ") \u2014 thanks!", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "The improvements we are currently planning can be sorted into 3 main topics:\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "list", "items": [ { "type": "text", "value": [ { "url": "#easier-data-reuse", "children": [ { "text": "Making it easier to reuse our data", "spanType": "span-simple-text" } ], "spanType": "span-link" } ], "parseErrors": [] }, { "type": "text", "value": [ { "url": "#richer-data-model", "children": [ { "text": "Using a richer data model", "spanType": "span-simple-text" } ], "spanType": "span-link" } ], "parseErrors": [] }, { "type": "text", "value": [ { "url": "#more-flexible-data-visualization", "children": [ { "text": "Enabling more flexible visualizations", "spanType": "span-simple-text" } ], "spanType": "span-link" } ], "parseErrors": [] } ], "parseErrors": [] }, { "text": [ { "text": "Easier data reuse", "spanType": "span-simple-text" } ], "type": "heading", "level": 2, "parseErrors": [] }, { "left": [ { "type": "text", "value": [ { "text": "If you are working with data yourself, then you are probably aware that on every chart on our site you can switch to the \u201cDownload\u201d tab and download a CSV file with the underlying data. This has a few shortcomings. For one, only the data that our authors end up using in charts is easily accessible like this. But we have a much bigger catalog of more than 100,000 indicators in our internal database that we would like to open up. Ernst, our head of product design, likens this to a museum that only has a small part of its collection on display.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "We are now working to make all this data available, and to do so in a form that is convenient to use for data scientists. We are creating a public index so that you can quickly discover if we have data on a certain area available, and then fetch that data as a tidy data frame in a modern file format (like Apache feather or parquet) that is easy to consume from Python, R or Observable notebooks. You can try out an experimental version of this index in Python already using the ", "spanType": "span-simple-text" }, { "url": "https://github.com/owid/owid-catalog-py", "children": [ { "text": "owid-catalog-py", "spanType": "span-simple-text" } ], "spanType": "span-link" }, { "text": " package.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "text": [ { "text": "Reusable metadata", "spanType": "span-simple-text" } ], "type": "heading", "level": 4, "parseErrors": [] }, { "type": "text", "value": [ { "text": "A large part of the work that our data team and our authors are doing is to curate data and add metadata. This is important because the data we collect comes from different sources, both from large institutions like the World Bank and the WHO, but also from individual researchers. As a reader, when deciding how much to trust the data, it helps immensely to understand where it came from.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "Harmonizing this data on a technical level, so it uses the same date formats and country names, enables joining all this data together. But only by also recording information about how this data was collected, and its limitations, can real insight be drawn from it. We are now standardizing the metadata that we are collecting and will always serve it alongside the data files in JSON format, so that all this curation work can be reused in addition to the data that we already reshare.", "spanType": "span-simple-text" } ], "parseErrors": [] } ], "type": "sticky-right", "right": [ { "alt": "Shows the download data option", "size": "wide", "type": "image", "filename": "Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1.png", "parseErrors": [] } ], "parseErrors": [] }, { "text": [ { "text": "Richer data model", "spanType": "span-simple-text" } ], "type": "heading", "level": 2, "parseErrors": [] }, { "type": "text", "value": [ { "text": "Another benefit of moving away from our closed internal database as the central data store is that we will be able to leverage richer data models. To understand why this is important, you should know that we currently bring all individual data points into one large MySQL table that has just 4 columns: Year/Date, Entity/Country, Variable, Value.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "This has worked well for us for a long time, since most of the data we are interested in is heavily aggregated, so country and year were good enough and kept things simple. But we now want to enable richer data models - our COVID-19 data effort already stretched the current model with the need for daily data, and so adding different granularities of time is one powerful change. But we also want to be able to break down critical indicators by sex or age group if the upstream data source provides this.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "Currently, when we want to include data like this we have to create new, independent variables. For example, deaths from smoking may end up becoming many variables like \u201cDeaths - smoking - female - age 15-25\u201d instead of just one with many dimensions. Authors then have to remember which variables to show next to each other in an article or chart. By making it possible to store additional dimensions other than year and country, we will be able to do this automatically and allow users to switch between levels of detail.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "left": [ { "text": [ { "text": "Drill down into the details", "spanType": "span-simple-text" } ], "type": "heading", "level": 4, "parseErrors": [] }, { "type": "text", "value": [ { "text": "We are also planning to add proper support for hierarchies within dimensions so that we will be able to do proper drill-down and drill-up in our charts. If you look at this chart on child mortality, you\u2019ll see that this shows data for the entire world and split by continent. In the top-left corner, you can find the \u201cAdd country\u201d button to change this selection and show individual countries.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "This view you see initially is a good starting point, but has two issues. First, it had to be manually configured this way. Second, if you click on the \u201cAdd country\u201d button then the continents and individual countries are all just shown as one long list, sorted alphabetically. In the future we will be able to show different sections in the country selector for different groupings automatically, but we\u2019ll also be able to do this for other dimensions like cause of death, so you can get a broad picture first and then dive into the details.", "spanType": "span-simple-text" } ], "parseErrors": [] } ], "type": "sticky-right", "right": [ { "url": "https://ourworldindata.org/grapher/child-mortality-around-the-world", "type": "chart", "parseErrors": [] } ], "parseErrors": [] }, { "left": [ { "text": [ { "text": "Visualizing uncertainty", "spanType": "span-simple-text" } ], "type": "heading", "level": 4, "parseErrors": [] }, { "type": "text", "value": [ { "text": "Finally, we are planning to add metadata information to express the relationship between variables.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "One of the first areas where we want to use this feature is to add proper support for confidence intervals. Visualizing the uncertainty inherent in data or projections is very important, but at the moment we rarely do it because it currently poses all sorts of UI problems. By making our grapher understand these relationships, we\u2019ll be able to use the visual hints for confidence intervals that are widely used in data visualization.", "spanType": "span-simple-text" } ], "parseErrors": [] } ], "type": "sticky-right", "right": [ { "url": "https://ourworldindata.org/grapher/daily-new-estimated-covid-19-infections-icl-model", "type": "chart", "parseErrors": [] } ], "parseErrors": [] }, { "text": [ { "text": "More flexible data visualization", "spanType": "span-simple-text" } ], "type": "heading", "level": 2, "parseErrors": [] }, { "type": "text", "value": [ { "text": "The final area for technical improvements is our visualization tool. Some items on our roadmap in this area are technically pretty simple, but we think they will give our readers and authors interesting new capabilities.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "For one, we want to make the content of our grapher charts more flexible, for example by allowing authors to create slideshows of static images or to use other visualization libraries than our handcrafted grapher. Our existing chart drawing code works very well to provide a limited set of chart types with a lot of standardized features, but sometimes a more bespoke setup would be useful. One example of this is an idea that we are currently working on to visualize war casualties over several hundred years where we need to visualize conflicting estimates, different kinds of casualties etc. all over a very long time period.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "text": [ { "text": "Reusing our visualization tool", "spanType": "span-simple-text" } ], "type": "heading", "level": 3, "parseErrors": [] }, { "type": "text", "value": [ { "text": "We also want to make our grapher easier to reuse for other projects. Currently, the code base of the charting component is quite entangled with the internal administration UI that is used to create them, and it assumes many particularities about the infrastructure it runs on. We want to split this out so that the grapher becomes a separate NPM package that can be used in other projects. Since all our code is open source, we also hope that this will make it easier for others to contribute back to our charting system.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "text": [ { "text": "Open exploration & contextual information", "spanType": "span-simple-text" } ], "type": "heading", "level": 3, "parseErrors": [] }, { "type": "text", "value": [ { "text": "With the advances described above, we also want to make the grapher more open to exploration - this means the users should be able to search for and visualize all the variables in our catalog. Our philosophy here is that we want our metadata to describe our data so well that generating high quality visualizations requires no further config. We hope that the ability to create arbitrary scatter plots or line charts will be another interesting look into our data collection.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "When exploring data like this, it is important to understand what exactly you are looking at. Our authors spend a lot of time thinking about how best to explain critical concepts like for example \u201cInternational constant $\u201d that need to be understood to interpret our charts correctly.\u00a0", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "Currently, these definitions live either in prose in some part of our website or are squeezed into the subtitles of our charts. We want to experiment with a new additional canvas next to our charts that will be used to summarize important concepts that are needed to understand a specific chart. This might go as far as showing secondary charts or third party content, all with the aim to make sure that our readers draw valid conclusions from our content.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "Finally, we are thinking of creating a new way for our authors to create their articles. We currently use a headless WordPress installation as our storage for articles, but our authors prefer to do the actual writing in Google Docs. In an ideal world they would be able to stay in Google Docs but be able to embed and configure interactive charts in that same window, without going through the detour of WordPress to have a more seamless editing experience and experience less friction in their daily work.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "text": [ { "text": "Closing thoughts", "spanType": "span-simple-text" } ], "type": "heading", "level": 2, "parseErrors": [] }, { "type": "text", "value": [ { "text": "As you can see, we have many ideas in the pipeline and are excited to be working on them. Thanks to the many donations we receive, we are able to grow our technical team and increase the speed at which we are working on this.", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "type": "text", "value": [ { "text": "If you are interested in joining us or contributing in some other way, then please get in touch! Likewise, if you\u2019re interested in being an early adopter of our technical work, please ", "spanType": "span-simple-text" }, { "url": "http://eepurl.com/c7qucj", "children": [ { "text": "subscribe to our beta mailing list", "spanType": "span-simple-text" } ], "spanType": "span-link" }, { "text": ".", "spanType": "span-simple-text" } ], "parseErrors": [] }, { "url": "https://ourworldindata.org/full-stack-engineer", "type": "prominent-link", "title": "Full-Stack Engineer", "description": "Remote (US East & EU/African timezones preferred)", "parseErrors": [] }, { "url": "https://ourworldindata.org/data-engineer", "type": "prominent-link", "title": "Data Engineer", "description": "Remote (US East & EU/African timezones preferred)", "parseErrors": [] }, { "type": "text", "value": [ { "text": "An update on what we are working on and how we plan to improve the Our World In Data website.", "spanType": "span-simple-text" } ], "parseErrors": [] } ], "type": "article", "title": "Easier data reuse and more flexible visualizations - what we in the developer team are working on", "authors": [ "Daniel Bachler" ], "dateline": "November 30, 2021", "sidebar-toc": false, "featured-image": "" }, "createdAt": "2021-11-29T16:28:46.000Z", "published": false, "updatedAt": "2021-11-30T09:41:09.000Z", "revisionId": null, "publishedAt": "2021-11-30T09:41:08.000Z", "relatedCharts": [], "publicationContext": "listed" } |
{ "errors": [ { "name": "unexpected wp component tag", "details": "Found unhandled wp:comment tag list" }, { "name": "unexpected wp component tag", "details": "Found unhandled wp:comment tag image" } ], "numBlocks": 28, "numErrors": 2, "wpTagCounts": { "html": 3, "list": 1, "image": 1, "column": 6, "columns": 3, "heading": 9, "paragraph": 26, "owid/prominent-link": 2 }, "htmlTagCounts": { "p": 27, "h3": 4, "h4": 5, "ul": 1, "div": 10, "figure": 1, "iframe": 2 } } |
2021-11-30 09:41:08 | 2024-02-22 02:05:43 | 1YJMExQ_wBosb3_-J91XiQrzwAfHhlWtI1oHI7UOfplQ | [ "Daniel Bachler" ] |
2021-11-29 16:28:46 | 2021-11-30 09:41:09 | {} |
In the last few weeks, we – the developers here at Our World in Data – have done a lot of brainstorming and planning to flesh out what capabilities we want to add to the Our World In Data site. Some of these features will make it easier to reuse our data, some will make it easier to view data from different angles. We are currently [hiring for two technical roles](https://ourworldindata.org/jobs) to help us build what we are describing below - so if you are interested yourself, have a look at the two job profiles or forward them to people you know who might be a good fit (the application **deadline is Dec 5th**) — thanks! The improvements we are currently planning can be sorted into 3 main topics: * [Making it easier to reuse our data](#easier-data-reuse) * [Using a richer data model](#richer-data-model) * [Enabling more flexible visualizations](#more-flexible-data-visualization) ## Easier data reuse If you are working with data yourself, then you are probably aware that on every chart on our site you can switch to the “Download” tab and download a CSV file with the underlying data. This has a few shortcomings. For one, only the data that our authors end up using in charts is easily accessible like this. But we have a much bigger catalog of more than 100,000 indicators in our internal database that we would like to open up. Ernst, our head of product design, likens this to a museum that only has a small part of its collection on display. We are now working to make all this data available, and to do so in a form that is convenient to use for data scientists. We are creating a public index so that you can quickly discover if we have data on a certain area available, and then fetch that data as a tidy data frame in a modern file format (like Apache feather or parquet) that is easy to consume from Python, R or Observable notebooks. You can try out an experimental version of this index in Python already using the [owid-catalog-py](https://github.com/owid/owid-catalog-py) package. #### Reusable metadata A large part of the work that our data team and our authors are doing is to curate data and add metadata. This is important because the data we collect comes from different sources, both from large institutions like the World Bank and the WHO, but also from individual researchers. As a reader, when deciding how much to trust the data, it helps immensely to understand where it came from. Harmonizing this data on a technical level, so it uses the same date formats and country names, enables joining all this data together. But only by also recording information about how this data was collected, and its limitations, can real insight be drawn from it. We are now standardizing the metadata that we are collecting and will always serve it alongside the data files in JSON format, so that all this curation work can be reused in addition to the data that we already reshare. <Image filename="Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1.png" alt="Shows the download data option"/> ## Richer data model Another benefit of moving away from our closed internal database as the central data store is that we will be able to leverage richer data models. To understand why this is important, you should know that we currently bring all individual data points into one large MySQL table that has just 4 columns: Year/Date, Entity/Country, Variable, Value. This has worked well for us for a long time, since most of the data we are interested in is heavily aggregated, so country and year were good enough and kept things simple. But we now want to enable richer data models - our COVID-19 data effort already stretched the current model with the need for daily data, and so adding different granularities of time is one powerful change. But we also want to be able to break down critical indicators by sex or age group if the upstream data source provides this. Currently, when we want to include data like this we have to create new, independent variables. For example, deaths from smoking may end up becoming many variables like “Deaths - smoking - female - age 15-25” instead of just one with many dimensions. Authors then have to remember which variables to show next to each other in an article or chart. By making it possible to store additional dimensions other than year and country, we will be able to do this automatically and allow users to switch between levels of detail. #### Drill down into the details We are also planning to add proper support for hierarchies within dimensions so that we will be able to do proper drill-down and drill-up in our charts. If you look at this chart on child mortality, you’ll see that this shows data for the entire world and split by continent. In the top-left corner, you can find the “Add country” button to change this selection and show individual countries. This view you see initially is a good starting point, but has two issues. First, it had to be manually configured this way. Second, if you click on the “Add country” button then the continents and individual countries are all just shown as one long list, sorted alphabetically. In the future we will be able to show different sections in the country selector for different groupings automatically, but we’ll also be able to do this for other dimensions like cause of death, so you can get a broad picture first and then dive into the details. <Chart url="https://ourworldindata.org/grapher/child-mortality-around-the-world"/> #### Visualizing uncertainty Finally, we are planning to add metadata information to express the relationship between variables. One of the first areas where we want to use this feature is to add proper support for confidence intervals. Visualizing the uncertainty inherent in data or projections is very important, but at the moment we rarely do it because it currently poses all sorts of UI problems. By making our grapher understand these relationships, we’ll be able to use the visual hints for confidence intervals that are widely used in data visualization. <Chart url="https://ourworldindata.org/grapher/daily-new-estimated-covid-19-infections-icl-model"/> ## More flexible data visualization The final area for technical improvements is our visualization tool. Some items on our roadmap in this area are technically pretty simple, but we think they will give our readers and authors interesting new capabilities. For one, we want to make the content of our grapher charts more flexible, for example by allowing authors to create slideshows of static images or to use other visualization libraries than our handcrafted grapher. Our existing chart drawing code works very well to provide a limited set of chart types with a lot of standardized features, but sometimes a more bespoke setup would be useful. One example of this is an idea that we are currently working on to visualize war casualties over several hundred years where we need to visualize conflicting estimates, different kinds of casualties etc. all over a very long time period. ### Reusing our visualization tool We also want to make our grapher easier to reuse for other projects. Currently, the code base of the charting component is quite entangled with the internal administration UI that is used to create them, and it assumes many particularities about the infrastructure it runs on. We want to split this out so that the grapher becomes a separate NPM package that can be used in other projects. Since all our code is open source, we also hope that this will make it easier for others to contribute back to our charting system. ### Open exploration & contextual information With the advances described above, we also want to make the grapher more open to exploration - this means the users should be able to search for and visualize all the variables in our catalog. Our philosophy here is that we want our metadata to describe our data so well that generating high quality visualizations requires no further config. We hope that the ability to create arbitrary scatter plots or line charts will be another interesting look into our data collection. When exploring data like this, it is important to understand what exactly you are looking at. Our authors spend a lot of time thinking about how best to explain critical concepts like for example “International constant $” that need to be understood to interpret our charts correctly. Currently, these definitions live either in prose in some part of our website or are squeezed into the subtitles of our charts. We want to experiment with a new additional canvas next to our charts that will be used to summarize important concepts that are needed to understand a specific chart. This might go as far as showing secondary charts or third party content, all with the aim to make sure that our readers draw valid conclusions from our content. Finally, we are thinking of creating a new way for our authors to create their articles. We currently use a headless WordPress installation as our storage for articles, but our authors prefer to do the actual writing in Google Docs. In an ideal world they would be able to stay in Google Docs but be able to embed and configure interactive charts in that same window, without going through the detour of WordPress to have a more seamless editing experience and experience less friction in their daily work. ## Closing thoughts As you can see, we have many ideas in the pipeline and are excited to be working on them. Thanks to the many donations we receive, we are able to grow our technical team and increase the speed at which we are working on this. If you are interested in joining us or contributing in some other way, then please get in touch! Likewise, if you’re interested in being an early adopter of our technical work, please [subscribe to our beta mailing list](http://eepurl.com/c7qucj). ### Full-Stack Engineer Remote (US East & EU/African timezones preferred) https://ourworldindata.org/full-stack-engineer ### Data Engineer Remote (US East & EU/African timezones preferred) https://ourworldindata.org/data-engineer An update on what we are working on and how we plan to improve the Our World In Data website. | { "id": 46569, "date": "2021-11-30T09:41:08", "guid": { "rendered": "https://owid.cloud/?p=46569" }, "link": "https://owid.cloud/dev-team-working-on-2021", "meta": { "owid_publication_context_meta_field": [] }, "slug": "dev-team-working-on-2021", "tags": [], "type": "post", "title": { "rendered": "Easier data reuse and more flexible visualizations – what we in the developer team are working on" }, "_links": { "self": [ { "href": "https://owid.cloud/wp-json/wp/v2/posts/46569" } ], "about": [ { "href": "https://owid.cloud/wp-json/wp/v2/types/post" } ], "author": [ { "href": "https://owid.cloud/wp-json/wp/v2/users/52", "embeddable": true } ], "curies": [ { "href": "https://api.w.org/{rel}", "name": "wp", "templated": true } ], "replies": [ { "href": "https://owid.cloud/wp-json/wp/v2/comments?post=46569", "embeddable": true } ], "wp:term": [ { "href": "https://owid.cloud/wp-json/wp/v2/categories?post=46569", "taxonomy": "category", "embeddable": true }, { "href": "https://owid.cloud/wp-json/wp/v2/tags?post=46569", "taxonomy": "post_tag", "embeddable": true } ], "collection": [ { "href": "https://owid.cloud/wp-json/wp/v2/posts" } ], "wp:attachment": [ { "href": "https://owid.cloud/wp-json/wp/v2/media?parent=46569" } ], "version-history": [ { "href": "https://owid.cloud/wp-json/wp/v2/posts/46569/revisions", "count": 18 } ], "predecessor-version": [ { "id": 46602, "href": "https://owid.cloud/wp-json/wp/v2/posts/46569/revisions/46602" } ] }, "author": 52, "format": "standard", "status": "publish", "sticky": false, "content": { "rendered": "\n<p>In the last few weeks, we \u2013 the developers here at Our World in Data \u2013 have done a lot of brainstorming and planning to flesh out what capabilities we want to add to the Our World In Data site. Some of these features will make it easier to reuse our data, some will make it easier to view data from different angles.</p>\n\n\n\n<p>We are currently <a href=\"https://ourworldindata.org/jobs\">hiring for two technical roles</a> to help us build what we are describing below – so if you are interested yourself, have a look at the two job profiles or forward them to people you know who might be a good fit (the application <strong>deadline is Dec 5th</strong>) \u2014 thanks!</p>\n\n\n\n<p>The improvements we are currently planning can be sorted into 3 main topics: </p>\n\n\n\n<ul><li><a href=\"#easier-data-reuse\">Making it easier to reuse our data</a></li><li><a href=\"#richer-data-model\">Using a richer data model</a></li><li><a href=\"#more-flexible-data-visualization\">Enabling more flexible visualizations</a></li></ul>\n\n\n\n<h3>Easier data reuse</h3>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<p>If you are working with data yourself, then you are probably aware that on every chart on our site you can switch to the \u201cDownload\u201d tab and download a CSV file with the underlying data. This has a few shortcomings. For one, only the data that our authors end up using in charts is easily accessible like this. But we have a much bigger catalog of more than 100,000 indicators in our internal database that we would like to open up. Ernst, our head of product design, likens this to a museum that only has a small part of its collection on display.</p>\n\n\n\n<p>We are now working to make all this data available, and to do so in a form that is convenient to use for data scientists. We are creating a public index so that you can quickly discover if we have data on a certain area available, and then fetch that data as a tidy data frame in a modern file format (like Apache feather or parquet) that is easy to consume from Python, R or Observable notebooks. You can try out an experimental version of this index in Python already using the <a rel=\"noreferrer noopener\" href=\"https://github.com/owid/owid-catalog-py\" target=\"_blank\">owid-catalog-py</a> package.</p>\n\n\n\n<h4>Reusable metadata</h4>\n\n\n\n<p>A large part of the work that our data team and our authors are doing is to curate data and add metadata. This is important because the data we collect comes from different sources, both from large institutions like the World Bank and the WHO, but also from individual researchers. As a reader, when deciding how much to trust the data, it helps immensely to understand where it came from.</p>\n\n\n\n<p>Harmonizing this data on a technical level, so it uses the same date formats and country names, enables joining all this data together. But only by also recording information about how this data was collected, and its limitations, can real insight be drawn from it. We are now standardizing the metadata that we are collecting and will always serve it alongside the data files in JSON format, so that all this curation work can be reused in addition to the data that we already reshare.</p>\n</div>\n\n\n\n<div class=\"wp-block-column\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"725\" height=\"550\" src=\"https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1-725x550.png\" alt=\"Shows the download data option\" class=\"wp-image-46589\" srcset=\"https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1-725x550.png 725w, https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1-400x304.png 400w, https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1-150x114.png 150w, https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1-768x583.png 768w, https://owid.cloud/app/uploads/2021/11/Screenshot-2021-11-29-at-17-37-44-Extreme-poverty-how-far-have-we-come-how-far-do-we-still-have-to-go-1-1.png 800w\" sizes=\"(max-width: 725px) 100vw, 725px\" /></figure>\n</div>\n</div>\n\n\n\n<h3>Richer data model</h3>\n\n\n\n<p>Another benefit of moving away from our closed internal database as the central data store is that we will be able to leverage richer data models. To understand why this is important, you should know that we currently bring all individual data points into one large MySQL table that has just 4 columns: Year/Date, Entity/Country, Variable, Value. </p>\n\n\n\n<p>This has worked well for us for a long time, since most of the data we are interested in is heavily aggregated, so country and year were good enough and kept things simple. But we now want to enable richer data models – our COVID-19 data effort already stretched the current model with the need for daily data, and so adding different granularities of time is one powerful change. But we also want to be able to break down critical indicators by sex or age group if the upstream data source provides this. </p>\n\n\n\n<p>Currently, when we want to include data like this we have to create new, independent variables. For example, deaths from smoking may end up becoming many variables like \u201cDeaths – smoking – female – age 15-25\u201d instead of just one with many dimensions. Authors then have to remember which variables to show next to each other in an article or chart. By making it possible to store additional dimensions other than year and country, we will be able to do this automatically and allow users to switch between levels of detail. </p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<h4>Drill down into the details</h4>\n\n\n\n<p>We are also planning to add proper support for hierarchies within dimensions so that we will be able to do proper drill-down and drill-up in our charts. If you look at this chart on child mortality, you\u2019ll see that this shows data for the entire world and split by continent. In the top-left corner, you can find the \u201cAdd country\u201d button to change this selection and show individual countries. </p>\n\n\n\n<p>This view you see initially is a good starting point, but has two issues. First, it had to be manually configured this way. Second, if you click on the \u201cAdd country\u201d button then the continents and individual countries are all just shown as one long list, sorted alphabetically. In the future we will be able to show different sections in the country selector for different groupings automatically, but we\u2019ll also be able to do this for other dimensions like cause of death, so you can get a broad picture first and then dive into the details.</p>\n</div>\n\n\n\n<div class=\"wp-block-column\">\n<iframe src=\"https://ourworldindata.org/grapher/child-mortality-around-the-world\" loading=\"lazy\" style=\"width: 100%; height: 600px; border: 0px none;\"></iframe>\n</div>\n</div>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\">\n<h4>Visualizing uncertainty</h4>\n\n\n\n<p>Finally, we are planning to add metadata information to express the relationship between variables. </p>\n\n\n\n<p>One of the first areas where we want to use this feature is to add proper support for confidence intervals. Visualizing the uncertainty inherent in data or projections is very important, but at the moment we rarely do it because it currently poses all sorts of UI problems. By making our grapher understand these relationships, we\u2019ll be able to use the visual hints for confidence intervals that are widely used in data visualization.</p>\n</div>\n\n\n\n<div class=\"wp-block-column\">\n<iframe src=\"https://ourworldindata.org/grapher/daily-new-estimated-covid-19-infections-icl-model\" loading=\"lazy\" style=\"width: 100%; height: 600px; border: 0px none;\"></iframe>\n</div>\n</div>\n\n\n\n<h3>More flexible data visualization</h3>\n\n\n\n<p>The final area for technical improvements is our visualization tool. Some items on our roadmap in this area are technically pretty simple, but we think they will give our readers and authors interesting new capabilities. </p>\n\n\n\n<p>For one, we want to make the content of our grapher charts more flexible, for example by allowing authors to create slideshows of static images or to use other visualization libraries than our handcrafted grapher. Our existing chart drawing code works very well to provide a limited set of chart types with a lot of standardized features, but sometimes a more bespoke setup would be useful. One example of this is an idea that we are currently working on to visualize war casualties over several hundred years where we need to visualize conflicting estimates, different kinds of casualties etc. all over a very long time period.</p>\n\n\n\n<h4>Reusing our visualization tool</h4>\n\n\n\n<p>We also want to make our grapher easier to reuse for other projects. Currently, the code base of the charting component is quite entangled with the internal administration UI that is used to create them, and it assumes many particularities about the infrastructure it runs on. We want to split this out so that the grapher becomes a separate NPM package that can be used in other projects. Since all our code is open source, we also hope that this will make it easier for others to contribute back to our charting system.</p>\n\n\n\n<h4>Open exploration & contextual information</h4>\n\n\n\n<p>With the advances described above, we also want to make the grapher more open to exploration – this means the users should be able to search for and visualize all the variables in our catalog. Our philosophy here is that we want our metadata to describe our data so well that generating high quality visualizations requires no further config. We hope that the ability to create arbitrary scatter plots or line charts will be another interesting look into our data collection.</p>\n\n\n\n<p>When exploring data like this, it is important to understand what exactly you are looking at. Our authors spend a lot of time thinking about how best to explain critical concepts like for example \u201cInternational constant $\u201d that need to be understood to interpret our charts correctly. </p>\n\n\n\n<p>Currently, these definitions live either in prose in some part of our website or are squeezed into the subtitles of our charts. We want to experiment with a new additional canvas next to our charts that will be used to summarize important concepts that are needed to understand a specific chart. This might go as far as showing secondary charts or third party content, all with the aim to make sure that our readers draw valid conclusions from our content.</p>\n\n\n\n<p>Finally, we are thinking of creating a new way for our authors to create their articles. We currently use a headless WordPress installation as our storage for articles, but our authors prefer to do the actual writing in Google Docs. In an ideal world they would be able to stay in Google Docs but be able to embed and configure interactive charts in that same window, without going through the detour of WordPress to have a more seamless editing experience and experience less friction in their daily work.</p>\n\n\n\n<h3>Closing thoughts</h3>\n\n\n\n<p>As you can see, we have many ideas in the pipeline and are excited to be working on them. Thanks to the many donations we receive, we are able to grow our technical team and increase the speed at which we are working on this.</p>\n\n\n\n<p>If you are interested in joining us or contributing in some other way, then please get in touch! Likewise, if you\u2019re interested in being an early adopter of our technical work, please <a href=\"http://eepurl.com/c7qucj\" target=\"_blank\" rel=\"noreferrer noopener\">subscribe to our beta mailing list</a>.</p>\n\n\n <block type=\"prominent-link\" style=\"is-style-thin\">\n <link-url>https://ourworldindata.org/full-stack-engineer</link-url>\n <title>Full-Stack Engineer</title>\n <content>\n\n<p>Remote (US East & EU/African timezones preferred)</p>\n\n</content>\n <figure></figure>\n </block>\n\n <block type=\"prominent-link\" style=\"is-style-thin\">\n <link-url>https://ourworldindata.org/data-engineer</link-url>\n <title>Data Engineer</title>\n <content>\n\n<p>Remote (US East & EU/African timezones preferred)</p>\n\n</content>\n <figure></figure>\n </block>\n\n\n<div class=\"blog-info\">\n<p>An update on what we are working on and how we plan to improve the Our World In Data website.</p>\n</div>\n\n\n\n<p></p>\n", "protected": false }, "excerpt": { "rendered": "", "protected": false }, "date_gmt": "2021-11-30T09:41:08", "modified": "2021-11-30T09:41:09", "template": "", "categories": [ 207 ], "ping_status": "closed", "authors_name": [ "Daniel Bachler" ], "modified_gmt": "2021-11-30T09:41:09", "comment_status": "closed", "featured_media": 0, "featured_media_paths": { "thumbnail": null, "medium_large": null } } |