Thanks to an excellent series of posts on the python package
autograd for automatic differentiation by John Kitchin (e.g. More Auto-differentiation Goodness for Science and Engineering), this post revisits some earlier work on maximum likelihood estimation in Python and investigates the use of auto differentiation. As pointed out in this article, auto-differentiation "can be thought of as performing a non-standard interpretation of a computer program where this interpretation involves augmenting the standard computation with the calculation of various derivatives."
Auto-differentiation is neither symbolic differentiation nor numerical approximations using finite difference methods. What auto-differentiation provides is code augmentation where code is provided for derivatives of your functions free of charge. In this post, we will be using the
autograd package in python after defining a function in the usual
numpy way. In python, another auto-differentiation choice is the Theano package, which is used by PyMC3 a Bayesian probabilistic programming package that I use in my research and teaching. There are probably other implementations in python, as it is becoming a must-have in the machine learning field. Implementations also exist in C/C++, R, Matlab, and probably others.
The three primary reasons for incorporating auto-differentiation capabilities into your research are
- In nearly all cases, your code will run faster. For some problems, much faster.
- For difficult problems, your model is likely to converge closer to the true parameter values and may be less sensitive to starting values.
- Your model will provide more accurate calculations for things like gradiants and hessians (so your standard errors will be more accurately calculated).
With auto-differentiation, gone are the days of deriving analytical derivatives and programming them into your estimation routine. In this short note, we show a simple example of auto-differentiation, expand on that for maximum likelihood estimation, and show that for problems where likelihood calculations are expensive, or for which there are many parameters being estimated there can be dramatic speed-ups.