April 2019

There are many packages dealing with density estimation in R. They offer several advantages over manually coded algorithms, including bandwidth-selection procedures or involving some more complex features of density estimation, like derivative estimation or higher order kernels. Some of them are also coded in native C language, which should speed up the calculations and enhance memory management. Nevertheless, many of these extra features may be often unused in simple applications of density estimations. That leaves the question open: which algorithm is the fastest?

I try to look at the broadest possible set of R packages dealing with density estimation, including
. There are some other packages which I skip here, as I wanted to make sure I estimate the density at a given point, and not across the domain, to make the numbers comparable. (As a s...

March 2019

Data access is often a nightmare. Especially with irregular data shapes or multiple data types. APIs, or application programming interfaces, offer a simple access gates to the information resources in their native structures, and therefore they offer a powerful tool to quickly boost many research projects.

In a nutshell, an API is a gate through which a user may access the resources or data located on a server in a quick and friendly way. APIs have a generic address, typically in the form of http address, and endpoints. Endpoints direct the user to specific parts of of the database (like tables), the user may need to access. APIs require an authentication key, called a token, which offers the server the access control mechanism. Sometimes you need to pay for a token, but oftentimes some limited functionality is offered for free.

To demonstrate the performance of the API, I will access the trading database of

R programming
February 2019

I recently came across a problem of testing if the expectations of one variable, call it $Y$, vary alongside the distribution of another variable, say $X$. The problem can be approached through several angles, including parametric quantile approach, however, it was decided to use one of the most flexible methods, and actually one of my favorites, i.e. the bootstrap.

The idea is quite simple. Imagine two random variables $Y$ and $X$. (For more information about the exact definitions of what a random variable is, the Wikipedia page has a lot of useful information.) Given their observed realisations $\{(Y_i,X_i):i=1,...,n\}$, the goal is to test if the conditional average of $Y$ is statistically different from its unconditional average. We can approximate the former by estimating the mean of $Y$ for different parts of $X$ distribution. For instance, we can test if expectations of $Y$ d...

January 2019

Popularity of binary choice models makes their applications more and more sophisticated, reaching beyond the simple low-dimension applications. The implications of having plenty of independent variables may be severe, making the likelihood function highly irregular or even impossible to be solved. Even if the local maximum can be found, it is likely to be unstable or it may take a lot of time and computing power, especially if the independent variables are simple dummies.

function tries to address some of the above these issues. For instance, the algorithm will check which observations do not bring extra information to the model and leave them out from the estimation. However, this step can be taking too long for some of the Big Data problems. A quick solution is to tell Stata in advance which observations to exclude when running the probit. Here is a simple plug and play code snippet.

Firstly, I define a generic setup wit...

December 2018

It may happen that even if the C# programs are compiled without errors, there is no program output in the console. The reason can be that while the output of the program is being buffered it is not displayed to the end user immediately. It is difficult to say why this exactly happens, as this can be system- or compiler-specific. However, there is a simple remedy.

To unbuffer the output, you will need to flush it with
function after the buffer is used. On the example of the Granger causality Diks-Wolski codes (you can find them
here), find the first occurrence of
function and just flush it right after
printf("Input file (X): "); 
In the example above
is the standard output buffer. It is the default file descriptor where a program can write its o...

December 2018

Recently, I have received many questions on how one can run the C# codes to replicate my research results and to extend them to other applications. While C# makes the computations very efficient, it is not the most friendly language for the end users, especially for the ones that have little experience with programming. Nevertheless, the overall process is quite simple and I outline it below. It should work for any OS.

C# programming language is set of commands and structures which make the algorithms intuitive and understandable for humans. Although it is low-level, meaning that its operations are very closely linked to the core computer architecture, the machines cannot understand them instantly. To make the C# programs understandable for your computer, you will need to compile them.

There are multiple C# compilers available, some of them even available online. I typically use the GCC compiler (

Pages: [ 1 ]
M. Wolski
Marcin Wolski, PhD
European Investment Bank
E-mail: M.Wolski (at) eib.org
Phone: +352 43 79 88708

View my LinkedIn profile View my profile
View my IDEAS/RePEc profile  IDEAS/RePEc