#+BEGIN_COMMENT .. title: Estimating Custom Maximum Likelihood Models in Python (and Matlab) .. slug: estimating-custom-mle .. date: 2017-05-06 9:15:50 UTC+01:00 .. tags: ipython, matlab, maximum likelihood .. status: .. has_math: yes .. category: .. link: .. description: .. type: text #+END_COMMENT In this post I show various ways of estimating "generic" maximum likelihood models in python. For each, we'll recover standard errors. We will implement a simple ordinary least squares model like this $$\mathbf{y = x\beta +\epsilon}$$ where $\epsilon$ is assumed distributed i.i.d. normal with mean 0 and variance $\sigma^2$. In our simple model, there is only a constant and one slope coefficient ($\beta = \begin{bmatrix} \beta_0 & \beta_1 \end{bmatrix}$). For this model, we would probably never bother going to the trouble of manually implementing maximum likelihood estimators as we show in this post. However, for more complicated models for which there is no established package or command, there are benefits to knowing how to build your own likelihood function and use it for estimation. It is also worthwhile noting that most of the methods shown here don't use analytical gradiants or hessians, so are likely (1) to have longer execution times and (2) to be less precise than methods where known analytical gradiants and hessians are built into the estimation method. I might explore those issues in a later post. *tl;dr*: There are numerous ways to estimate custom maximum likelihood models in Python, and what I find is: 1. For the most features, I recommend using the [[statsmodels][=Genericlikelihoodmodel= class from Statsmodels]] even if it is the least intuitive way for programmers familiar with Matlab. If you are comfortable with object oriented programming you should definitely go this route. 2. For fastest run times and computationally expensive problems Matlab will most likely be significantly even with lots of code optimizations. #+HTML: #+BEGIN_SRC python :session :exports none :results none :eval never-export # this is a dummy code line to initiate anaconda # environment and all banner printouts print("Hello World") #+END_SRC Python packages we'll use for this post: #+BEGIN_SRC python :session :results output :exports code :eval never-export import pymc3 as pm3 import numpy as np import numdifftools as ndt import pandas as pd from scipy.stats import norm import statsmodels.api as sm from statsmodels.base.model import GenericLikelihoodModel from scipy.optimize import minimize #+END_SRC #+RESULTS: First we generate data for this model here and add a column for the model constant: #+BEGIN_SRC python :session :results output :exports both :eval never-export N = 10000 x = 10 + 2*np.random.randn(N) y = 5 + x + np.random.randn(N) df = pd.DataFrame({'y':y, 'x':x}) df['constant'] = 1 df.head() #+END_SRC #+RESULTS: : x y constant : 0 4.857731 10.623466 1 : 1 11.285848 16.136891 1 : 2 8.454859 14.482064 1 : 3 8.673113 15.358859 1 : 4 10.655889 14.795186 1 #+BEGIN_SRC python :session :exports none :eval never-export # save data out for matlab # 1 x # 2 y # 3 constant df.to_csv('/tmp/data.csv',header=False,index=False) #+END_SRC #+RESULTS: ** OLS Estimation Since this is such a simple and universally used model, there are numerous packages available for estimating it. For this problem, you would undoubtedly want to use one of these existing packages. For example, =statsmodels= has an =OLS= method. We use it here for benchmarking purposes for comparing our maximum likelihood estimation of the same model below. #+BEGIN_SRC python :session :exports both :eval never-export sm.OLS(df.y,df[['constant','x']]).fit().summary() #+END_SRC #+RESULTS: #+begin_example OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.803 Model: OLS Adj. R-squared: 0.803 Method: Least Squares F-statistic: 4.086e+04 Date: Wed, 08 Mar 2017 Prob (F-statistic): 0.00 Time: 11:19:36 Log-Likelihood: -14124. No. Observations: 10000 AIC: 2.825e+04 Df Residuals: 9998 BIC: 2.827e+04 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ constant 4.9956 0.050 99.532 0.000 4.897 5.094 x 0.9982 0.005 202.135 0.000 0.989 1.008 ============================================================================== Omnibus: 1.709 Durbin-Watson: 2.000 Prob(Omnibus): 0.426 Jarque-Bera (JB): 1.731 Skew: 0.022 Prob(JB): 0.421 Kurtosis: 2.953 Cond. No. 51.8 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. #+end_example ** Matlab Matlab is a tool used by many econometricians for estimating generic likelihood models so I include it here for comparison purposes. The file =neg_loglike.m= defines the log-likelihood function (negative log-likelihood): #+BEGIN_SRC matlab :exports code :tangle /home/robhicks/Dropbox/matlab_functions/temp/neg_loglike.m ::eval never-export function [ll] = neg_loglike(theta,Y,X) mu = X*theta(1:2); ll = -1*sum(log(normpdf(Y,mu,theta(3)))); end #+END_SRC #+RESULTS: When this code is run in Matlab: #+NAME: matlab_results #+BEGIN_SRC matlab :session :exports code :results raw :eval never-export data = csvread('/tmp/data.csv'); X = data(:,[3 1]); Y = data(:,2); % Optimization Settings opt = optimset('GradObj','off',... 'Hessian','off',... 'LargeScale','off',... 'Display','off',... 'MaxIter', 100000,... 'MaxFunEvals', 1000000, ... 'TolFun', 1.000e-006,... 'TolX', 1.000e-006,... 'FunValCheck','off',... 'DerivativeCheck','off',... 'Diagnostics','off',... 'Algorithm','trust-region-reflective'); % starting values for estimation theta_start=.5*ones(size(X,2)+1,1); [theta,fval,flag,out,grad,hess] = fminunc(@(theta) neg_loglike(theta,Y,X), ... theta_start, opt); se = sqrt(diag(inv(hess))); ans = [theta se] #+END_SRC #+RESULTS: matlab_results | 4.9956 | 0.050186 | | 0.99823 | 0.004938 | | 0.99344 | 0.0070268 | We get the following parameter estimates: #+BEGIN_SRC python :session :var ml_results = matlab_results :results output :exports results :eval never-export df_ml = pd.DataFrame(ml_results,columns=['Parameter','Standard Error']) df_ml.index = ['Intercept','Slope','Sigma'] df_ml.head() #+END_SRC #+RESULTS: : Parameter Standard Error : Intercept 4.99560 0.050186 : Slope 0.99823 0.004938 : Sigma 0.99344 0.007027 ** Python Handcoded using Scipy Of all the python methods, this one is most similar to Matlab above. Defining the log-likelihood (in this case as in matlab a negative log-likelihood since there is no maximize function): #+BEGIN_SRC python :session :results output :exports code :eval never-export def neg_loglike(theta): mu = theta[0] + theta[1]*x return -1*norm(mu, theta[2]).logpdf(y).sum() #+END_SRC #+RESULTS: We'll optimize the log-likelihood over our parameters using =minimize= from python's =scipy.optimize= package: #+BEGIN_SRC python :session :results output :exports both :eval never-export theta_start = np.array([1,1,1]) res = minimize(neg_loglike, theta_start, method = 'Nelder-Mead', options={'disp': True}) #+END_SRC #+RESULTS: : Optimization terminated successfully. : Current function value: 14123.593389 : Iterations: 109 : Function evaluations: 193 The default method, =BFGS= sometimes fails to converge, so I usually use =Nelder-Mead=. Unfortunately, =Nelder-Mead= doesn't return an estimated Hessian, so we need to calculate it ourselves using the =numdifftools= package: #+BEGIN_SRC python :session :results output :exports both :eval never-export Hfun = ndt.Hessian(neg_loglike, full_output=True) hessian_ndt, info = Hfun(res['x']) se = np.sqrt(np.diag(np.linalg.inv(hessian_ndt))) results = pd.DataFrame({'parameters':res['x'],'std err':se}) results.index=['Intercept','Slope','Sigma'] results.head() #+END_SRC #+RESULTS: <> ** Extending Statsmodels This method is perhaps the best way to proceed, but unless you are well-versed in object oriented programming is likely to be confusing. *In my view it is worth the trouble to setup your custom maximum likelihood problem like this*, by leveraging the machinery of statsmodels for working with our custom likelihood function. Since we have $K+1$ parameters, proceed carefully, since by default =statsmodels= assumes the number of parameters is equal to the column dimension of your independent variable. First, define the log-likelihood function (note this is not the negative log-likelihood): #+BEGIN_SRC python :session :exports code :eval never-export def _ll_ols(y, X, beta, sigma): mu = X.dot(beta) return norm(mu,sigma).logpdf(y).sum() #+END_SRC #+RESULTS: : 0.9936952590942383 Then, using the log-likelihood define our custom likelihood class (I'll call it =MyOLS=). Note that there are two *key* parts to the code below: 1. The function =nloglikeobs=, is only acting as a "traffic cop" and spits the parameters into $\beta$ and $\sigma$ coefficients and calls the likelihood function =_ll_ols= above. 2. The =fit= function is where we inform statsmodels that our model has $K+1$ variables. We'll provide a name for the additional variable ($\sigma$) and provide default starting values. #+BEGIN_SRC python :session :results output :exports code :eval never-export class MyOLS(GenericLikelihoodModel): def __init__(self, endog, exog, **kwds): super(MyOLS, self).__init__(endog, exog, **kwds) def nloglikeobs(self, params): sigma = params[-1] beta = params[:-1] ll = _ll_ols(self.endog, self.exog, beta, sigma) return -ll def fit(self, start_params=None, maxiter=10000, maxfun=5000, **kwds): # we have one additional parameter and we need to add it for summary self.exog_names.append('sigma') if start_params == None: # Reasonable starting values start_params = np.append(np.zeros(self.exog.shape[1]), .5) return super(MyOLS, self).fit(start_params=start_params, maxiter=maxiter, maxfun=maxfun, ,**kwds) #+END_SRC #+RESULTS: Once the model structure is setup, we can use =fit()= for estimating the model. =statsmodels= handles all of the optimizations and hessian calculations for us. #+BEGIN_SRC python :session :results output :exports both :eval never-export sm_ols_manual = MyOLS(df.y,df[['constant','x']]).fit() print(sm_ols_manual.summary()) #+END_SRC #+RESULTS: #+begin_example Optimization terminated successfully. Current function value: 1.412359 Iterations: 212 Function evaluations: 386 MyOLS Results ============================================================================== Dep. Variable: y Log-Likelihood: -14124. Model: MyOLS AIC: 2.825e+04 Method: Maximum Likelihood BIC: 2.827e+04 Date: Wed, 08 Mar 2017 Time: 11:20:09 No. Observations: 10000 Df Residuals: 9998 Df Model: 1 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ constant 4.9956 0.050 99.541 0.000 4.897 5.094 x 0.9982 0.005 202.154 0.000 0.989 1.008 sigma 0.9934 0.007 141.420 0.000 0.980 1.007 ============================================================================== #+end_example The great thing about using the =statsmodels= infrastructure is that you have access to a lot of functionality once your model is setup and estimated. For example, suppose you want 95% confidence intervals, after estimation, we can use this command with the model object: #+BEGIN_SRC python :session :results output :exports both :eval never-export ci = sm_ols_manual.conf_int(alpha=.05) resdf = pd.DataFrame(ci,columns=['lower','upper']) resdf.index = ['constant','x','sigma'] resdf.head() #+END_SRC #+RESULTS: : lower upper : constant 4.897258 5.093985 : x 0.988556 1.007912 : sigma 0.979679 1.007215 The are lots of other methods available to your generic model instance, and it is worth the effort getting it running. <> ** PyMC3 The Bayesian Statistics Package =PyMC3= can also find the Hessian and Maximum Posterior values, which for Flat priors should give us something nearly identical to Maximum Likelihood. Note, the Hessian produced by =PyMC3= using =approx_hessian= is what you should use. The command =find_hessian= doesn't yield a valid hessian matrix for any of the cases I have tried. One potential advantage of using =PyMC3= is that the hessian could be calculated off of analytical gradiants and if this is the case would likely yield more accurate standard errors than any of the other methods presented in this post (including Matlab).[fn:1] Setting the model up in =PyMC3= #+BEGIN_SRC python :session :results output :exports code :eval never-export with pm3.Model() as model: b0 = pm3.Flat('constant',testval=5) b1 = pm3.Flat('x',testval=1) sigma = pm3.Flat('sigma',testval = 1) mu = b0 + b1 * df.x like = pm3.Normal('like',mu = mu, sd=sigma, observed=df.y) #+END_SRC #+RESULTS: Noting that with Flat priors the posterior is exactly proportional to the likelihood, we can use the =find_MAP= function #+BEGIN_SRC python :session :results output :exports both :eval never-export with model: # obtain MLE beta_mle = pm3.find_MAP() # obtain hessian at MAP hessian = pm3.approx_hessian(beta_mle) #+END_SRC #+RESULTS: : Optimization terminated successfully. : Current function value: 14123.593388 : Iterations: 7 : Function evaluations: 11 : Gradient evaluations: 11 The method =pm3.approx_hessian= uses =numdifftools= to calculate the standard errors evaluated at the parameter vector you specify. We evaluate it at the MAP estimates which coincides with MLE estimates for this model setup. Note, the only gotcha is that the =pm3.approx_hessian= changes the order of parameters, so we need to match the standard errors we calculate from the hessian to the estimates carefully. In this case, they are in reverse order. Putting together we have #+BEGIN_SRC python :session :results output :exports both :eval never-export se_1 = np.sqrt(np.diag(np.linalg.inv(hessian))) se = se_1[::-1] b = np.array([beta_mle['constant'], beta_mle['x'], beta_mle['sigma']]) results_pymc = pd.DataFrame({'parameters':b,'std err':se}) results_pymc.index=['Intercept','Slope','Sigma'] results_pymc.head() #+END_SRC #+RESULTS: : parameters std err : Intercept 4.995611 0.050186 : Slope 0.998235 0.004938 : Sigma 0.993442 0.007025 ** Comparison Unsurprisingly given the simplistic nature of the model estimated here, all of the results are nearly identical. I did an informal time (timed single run) comparison and here are the results. Note that we are using =PyMC3= in unintended ways and it wasn't built to optimize execution times for this type of problem. #+CAPTION: Single-Run Time Comparison #+ATTR_HTML: :border 20 :width 70% :frame border :class striped table-striped :style margin-left: auto; margin-right: auto; | Program | Execution Time (Seconds) | | Matlab with =fminunc= | .122 | | Python with =minimize= | .9937 | | Python =statsmodels= | .9848 | | Python =PyMC3= | 1.377 | Matlab is significantly faster for this problem. ** Computing Environment :noexport: #+BEGIN_SRC python :session :results raw :exports none :eval never-export import pandas as pd import tabulate import sys import subprocess df = pd.DataFrame() ## Kernel kernel_v = subprocess.Popen(["uname", "-sr"], stdout=subprocess.PIPE).communicate()[0] kernel_v = kernel_v.decode('utf-8').rstrip() df = df.append(pd.DataFrame([['OS Kernel',kernel_v.replace('_','-')]],columns=['Package','Version'])) ## GCC temp = subprocess.Popen(["gcc", "--version"], stdout=subprocess.PIPE).communicate()[0] temp = temp.decode('utf-8') gcc_v = temp.split('\n', 1)[0].rsplit(None, 1)[-1] df = df.append(pd.DataFrame([['gcc',gcc_v]],columns=['Package','Version'])) ## Python ver_num = str(sys.version_info[0])+"."+str(sys.version_info[1])+"."+str(sys.version_info[2]) df = df.append(pd.DataFrame([['python',ver_num]],columns=['Package','Version'])) ## Packages pkg_names = ['cython','numpy','pandas','theano','pymc3','statsmodels','numdifftools','scipy'] for i in pkg_names: exec('import '+i) ver_num = eval(i+'.__version__') data = [[i,ver_num]] this_data = pd.DataFrame(data,columns=['Package','Version']) df = df.append(this_data) tabulate.tabulate(df[['Package','Version']], headers=df.columns.values[:2],showindex=False,tablefmt="orgtbl") #+END_SRC #+ATTR_HTML: :border 20 :width 70% :frame border :class striped table-striped :style margin-left: auto; margin-right: auto; #+RESULTS: [fn:1] Without futher investigation, I can't tell if PyMC3 uses analytical gradiants for calculating the hessian.