Working with satellite data - part 1
I already emphasized that Big Data changes the way data analysts think and work. Even hedge funds and stock traders adjust their positions in reaction to information delivered through alternative data channels. For instance, the flight information that a jet belonging to Occidental Petroleum landed at an airport in Omaha, helped the traders anticipate that the company was about to receive an extra capital injection of USD 10bn from Warren Buffett (read FT article). While many of the alternative data sets are premium, there is a fairly large amount available to the public. For quite some time I have been curious to explore them in detail and, as it turned out, a small proof-of-concept project offered an opportunity to do it.
The goal was to check if the satellite images can alleviate some of the data shortcomings in developing countries. As a proof of concept, I was interested to see the correlation between the night light intensity and GDP for African countries. The hypothesis here states that more light emission at night should be associated with higher output. Should there be a significant relation, one could use geospatial data to track the basic economic characteristics of countries in the region. While it is simplistic, and far from claiming any causal links, the main point behind this is to build technology to access and work with geospatial data. For greater transparency, let's split the whole process into two steps. In the first one, we deal with the geospatial data, and in the second one we merge the GDP data and run some basic analysis.
The data on night light intensity is gathered and made available by NOAA (National Oceanic and Atmospheric Administration). The sets for various time horizons as available here. My example year will be 2013, delivered in the set F182013. The folder contains a few files, but the relevant one corresponds to the stable light averages. Overall, the data set is a TIFF image for cloud-free composites of all the available archived smooth resolution data for a given calendar year. It is in the form of a grid reporting the intensity of lights as a six-bit digital number, for every 30 arc-second output pixel (which is approximately 0.86 square kilometres at the equator) spanning -180 to 180 degrees longitude and -65 to 75 degrees latitude. The six-bit scale is effectively a range between 0 and 63, where 0 means no light and 63 means maximum light. While it does not convey a clear message in a unit of measure, pixel comparison can inform on the distribution of human activity world-wide.
As you can imagine, the light captured by satellites at night does not only correspond to human activity. It also includes glare, moonlight and lighting features from the aurora, for instance. It can also be blurred by the noise in the image-taking process, or by non-human events, like forest fires. All of these have been cleaned by NOAA. The image is presented in Figure 1.
Figure 1. Stable night lights in 2013. Source: NOAA.
The data does not have a clear shape yet. For practical reasons, it will be useful to add the basic topographical information, for instance, to distinguish between regions. Luckily, many of such contour sets are available, and there are even online tools to create customized sets. I will use the maps in the geojson format at 50m scale. It can be downloaded here.
The processing of geo data becomes very easy with the
rasterpackage. It is a great tool to load, transform and analyse even very large sets. Code below aims to illustrate the basic concepts behind the package and how easy it is to extract valuable information from satellite data. In the example below we will use packages
raster,
tabularaster,
dplyrand
sf.
Firstly, let's load the two files (adjust the paths as needed).
lights_World <- raster("files/F182013.v4c_web.stable_lights.avg_vis.tif") map_Africa <- st_read("files/Africa_50M.geojson")
Secondly, to reduce the size of the files to make them lighter to handle, I would crop only the lights which fall into the contours of Africa. To do this, I need to add mask the contours on the raster object, and the crop the relevant region.
lights_Africa <- mask(lights_World, mask = map_Africa) lights_Africa <- crop(lights_Africa, extent(map_Africa))
At this stage, the object
lights_Africacontains an image of night light intensity for Africa. We can plot it using the standard command
plot, or using a more sophisticated
levelplotfrom
rasterVispackage.
Figure 2. Stable night lights in 2013 in Africa. Source: NOAA.
As the last step, we will calculate the sum of night lights per administrative regions in Africa. To do it in an efficient way (although it still takes some time), we will use the
extractfunction to extract the polygons from the map object, and for each one of them we will calculate the sum of light intensity by adding up the values in respective cells (or pixels) falling into given administrative region.
map_Africa <- st_cast(map_Africa, "POLYGON") map_objects <- cellnumbers(lights_Africa, map_Africa) map_objects %>% mutate(light = raster::extract(lights_Africa, map_objects[["cell_"]])) %>% group_by(object_) %>% summarise(sum_light = sum(light, na.rm = TRUE)) -> light_map_Africa map_Africa$lights <- light_map_Africa[["sum_light"]] #create the data frame df <- data.frame( 'country' = map_Africa[["admin"]], 'lights' = map_Africa[["lights"]])
As some of the countries have multiple administrative regions, we collapse the data country name.
df %>% group_by(country) %>% summarise(lights = sum(lights)) -> df
Figure 3. Total stable night lights in 2013 among African countries. Source: NOAA.