Support your COVID-19 projects with easier data access
(This post was also published on medium.com)
The COVID-19 pandemic has profoundly shaken the world. No other event in recent history had led to such a revolution in our lifestyle, nor had forced countries to halt their economic activity by implementing nationwide lockdowns. As the crisis was unfolding rapidly, standard economic indicators were initially of little use. Alternative sources could offer a helping hand, but they often require digging through the piles of non-standardized data. This can be a huge problem if you want to quickly come up with a view on the topic. With this blog post we are going to show you how to get some of the alternative-data indicators just in the right shape, to support your COVID-19 project.
An example: Covid-19 and electricity
In general, GDP and economic aggregates are easily accessible and most users are already familiar in how to retrieve and process them. What if instead one would want to gauge the real impact of COVID-19 by looking at main inputs of economic activity, for instance electricity production? As a matter of fact, the economic slowdown forced by lockdown measures should have had direct implications on energy demand and consumption. Let's check if this is visible in hard data for two European countries severely hit by the pandemic (Italy and the United Kingdom). Our goal is to track monthly evolution of electricity consumption in 2020 and compare it to the patterns observed over the last 5 years.
The data set
Let's talk about data. Accurate energy consumption data for EU countries can be found on ENTSO-E Transparency Platform. Nevertheless, data are published on an hourly basis, with little flexibility on the time span and country selection (for instance the length of the queried time series cannot exceed one year). That means some extra number crunching after pulling the data sets. Fortunately, you can offload all this to rejustify.
Rejustify
Rejustify allows for pulling data from multiple sources and from various original shapes, into your pre-designed data template. In this way, you can easily fit the data into your charting tools, or simply update the series. Data template is a simple data frame, where non-empty dimensions highlight the details which may be used for matching, and empty dimensions are the placeholders for the ENTSOE data to be filled in. In this example we will use R for demonstration, but similar logic can be applied in Google Sheets and Python.
First, you would need to create a free account at rejustify.com. Once you receive your token you can access rejustify engine from the platform you prefer the most.
# install rejustify package (if not installed) install.packages("remotes") remotes::install_github("rejustify/r-package") # load the package require(rejustify) # login to rejustify - create a free account at rejustify.com/account setCurl(learn=TRUE) register(token = "YOUR_TOKEN", email = "YOUR_EMAIL") # ENTSOE data template df <- data.frame(month = format(seq(as.Date("2018-1-1"), as.Date("2020-6-1"), "months"), "%m-%Y"), `entsoe,Italy` = NA, `entsoe,United Kingdom` = NA, check.names = FALSE, stringsAsFactors = FALSE)
We start off by creating a data set with three columns: the first one, month, contains the time dimension, whereas each of the remaining two columns refer to the country and indicator (i.e. energy consumption) we want to retrieve data for.
Figure 1. The starting data frame.
Communication with rejustify takes two steps. In the first one, the engine must understand your data. It is done through the function analyze(). Its goal is to provide the basic characteristics of the data set and offer best-fitting data sources for the empty dimensions.
# get the data structure from rejustify engine st <- analyze(df) # adjust (if needed) st <- adjust(st, id = 1, items = list("class" = "time", "feature" = "month", "format" = "%m-%Y")) st <- adjust(st, id = c(2,4), items = list('class' = 'general', 'feature' = NA)) st <- adjust(st, column = c(2,3), items = list('provider' = 'REJUSTIFY', 'table' = 'ENTSOE-CONS-MONTHLY'))
Figure 2. The output from analyze.
We notice that some of the suggestions returned by analyze() can be improved. For example, feature quarter is not the most accurate representation of the time dimension. Rather, our data template includes monthly series. Rejustify engine learns your data habits and with every use it will be better tailored to your preferences. We can make such corrections using the adjust() function. Similarly, we want to specify where we would like to pull the monthly energy consumption data from, hence by using adjust() we select the ENTSOE-CONS-MONTHLY table. Note that the rejustify engine will take stock of your adjustments thanks to its learning functionality. That is, rejustify will recognize all changes to the proposed data structure and use them to train the AI algorithms so to provide more accurate suggestions.
At this point, we use the function fill() to fill the missing data for each country in our data set by data points delivered by provider/table.
# fill the missing values with ENTSOE data rdf <- fill(df, st)
Visualize the data
In a few steps we have created a data set for monthly energy consumption by country. We can explore these data further graphically. As the energy consumption fluctuates across the year (for instance due to changes in heating/cooling demand), let's rebase the figures to January each year.
# load plot library require(ggplot2) # convert to numbers rdf[["data"]][,2] <- as.numeric(rdf[["data"]][,2]) rdf[["data"]][,3] <- as.numeric(rdf[["data"]][,3]) # standardize values to January = 1 for(i in nrow(rdf[["data"]]):1) { rdf[["data"]][i,-1] <- rdf[["data"]][i,-1] / rdf[["data"]][rdf[["data"]][["month"]] == paste0("01-",substring(rdf[["data"]]["month"]][[i]],4,7)),-1] } # group the months and years rdf[["data"]][["year"]] <- substring(rdf[["data"]][["month"]],4,7) rdf[["data"]][["month"]] <- substring(rdf[["data"]][["month"]],1,2) # plot for Italy ggplot(data=rdf[["data"]], aes(x=month, y = `entsoe,Italy`, group = year)) + geom_line(aes(color=year)) # plot for UK ggplot(data=rdf[["data"]], aes(x=month, y = `entsoe,United Kingdom`, group = year)) + geom_line(aes(color=year))
Figure 3. Energy consumption by country.
The charts exhibit very clear seasonal patterns that are quite consistent throughout the years. In a Southern European country like Italy, energy consumption peaks in the Summer, probably because of the intense usage of cooling systems. On the other hand, in the UK we observe the opposite trend, with peaks in winter months because of heating. Turning the attention to 2020, one can immediately visualize the enormous effects of COVID-19 and related lockdowns: month on month, energy consumption is on average remarkably lower than in the previous years. This image is probably even more powerful than any plot of GDP or other economic aggregates, since it is the tangible confirmation of the stop to production and economic activity at large due to the lockdown measures.
In short, the impact of lockdowns, visible in the energy consumption patterns, goes significantly beyond the seasonal trends.
Learn more about rejustify
This is just a simple demonstration of some of the capabilities of the rejustify engine that are accessible with a free account. More details about rejustify can be found on rejustify.com, including information on data providers as well as tutorials and case studies for each software supported by rejustify. If you are interested in learning more, do not forget to subscribe to our general newsletter to stay tuned about product updates and new functionalities that will be introduced in the upcoming months.