(This post was also published on medium.com)

The COVID-19 pandemic has profoundly shaken the world. No other event in recent history had led to such a revolution in our lifestyle, nor had forced countries to halt their economic activity by implementing nationwide lockdowns. As the crisis was unfolding rapidly, standard economic indicators were initially of little use. Alternative sources could offer a helping hand, but they often require digging through the piles of non-standardized data. This can be a huge problem if you want to quickly come up with a view on the topic. With this blog post we are going to show you how to get some of the alternative-data indicators just in the right shape, to support your COVID-19 project.

### An example: Covid-19 and electricity

In general, GDP and economic aggregates are easi...

I already emphasized that Big Data changes the way data analysts think and work. Even hedge funds and stock traders adjust their positions in reaction to information delivered through alternative data channels. For instance, the flight information that a jet belonging to Occidental Petroleum landed at an airport in Omaha, helped the traders anticipate that the company was about to receive an extra capital injection of USD 10bn from Warren Buffett (read FT article). While many of the alternative data sets are premium, there is a fairly large amount available to the public. For quite some time I have been curious to explore them in detail and, as it turned out, a small proof-of-concept project offered an opportunity to do it.

The goal was to check if the satell...

Stata has several built-in limits in the engine. They mostly support the efficient memory allocation and overall make the commands run faster. For the majority of applications the limits are large enough so that the user will not even notice them (the detailed list of limits can be found here). This is, however, these one-in-a-million applications which may make Stata routines quite cumbersome.

The limit which I discovered recently was about the maximum macro length (or the maximum command length, difficult to judge). Even in Stata MP, the maximum number of characters in a macro can be up to 4,227,143 in Stata 14. In Stata 15 it is nearly 4 times more but as it did not fix the problem, I suspect that it is related to the command length rather than macro length.

I had to use the SQL query to select the records with certain identifiers. The average size of the identifier was...

I recently came across a problem of testing if the expectations of one variable, call it $Y$, vary alongside the distribution of another variable, say $X$. The problem can be approached through several angles, including parametric quantile approach, however, it was decided to use one of the most flexible methods, and actually one of my favorites, i.e. the bootstrap.

The idea is quite simple. Imagine two random variables $Y$ and $X$. (For more information about the exact definitions of what a random variable is, the Wikipedia page has a lot of useful information.) Given their observed realisations $\{(Y_i,X_i):i=1,...,n\}$, the goal is to test if the conditional average of $Y$ is statistically different from its unconditional average. We can approximate the former by estimating the mean of $Y$ for different parts of $X$ distribution. For instance, we can test if expectations of $Y$ d...