econ_407_mle_companion

This companion document to the Panel Data notes shows how to implement many of the models we discuss in R and Stata.

The maximum likelihood method seeks to find model parameters $\pmb{\theta}^{MLE}$ that maximizes the joint likelihood (the likelihood function- $L$) of observing $\mathbf{y}$ conditional on some structural model that is a function of independent variables $\mathbf{x}$ and model parameters $\pmb{\theta}^{MLE}$: $$\pmb{\theta}^{MLE} = \underset{\pmb{\theta}}{\operatorname{max}} L = \underset{\pmb{\theta}}{\operatorname{max}} \prod_{i=1}^N prob(y_i|\mathbf{x}_i,\pmb{\theta})$$ In a simple ordinary least squares setting with a linear model, we have the model parameters $\pmb{\theta}=[\pmb{\beta},\sigma]$ making the likelihood function $$prob(y_i|\mathbf{x}_i,\pmb{\beta},\sigma) = \frac{1}{\sqrt{2\sigma^2 \pi}}e^{\frac{-(y_i - \mathbf{x}_i\pmb{\beta})^2}{2\sigma^2}}$$ Typically, for computational purposes we work with the log-likelihood function rather than the the likelihood. This is $$log(L) = \sum_{i=1}^N log(prob(y_i|\mathbf{x}_i,\pmb{\theta}))$$

In the code below we show how to implement a simple regression model using generic maximum likelihood estimation in stata and R. While this is a toy example with existing estimators in stata (regress) and R (lm), we develop the example here for demonstration purposes.

Load Data and run simple OLS regression Model¶

{r message=FALSE,error=FALSE,warning=FALSE}

allows us to open stata datasets¶

library(foreign) library(maxLik) library(stargazer)

Loading our data:
{r}
tk4.df = subset(tk.df, time == 4)
tk4.df$ones=1 attach(tk4.df) This is the standard OLS estimation results in stata: . regress ln_wage educ pexp pexp2 broken_home Source | SS df MS Number of obs = 1034 -------------+------------------------------ F( 4, 1029) = 51.36 Model | 37.3778146 4 9.34445366 Prob > F = 0.0000 Residual | 187.21445 1029 .181938241 R-squared = 0.1664 -------------+------------------------------ Adj R-squared = 0.1632 Total | 224.592265 1033 .217417488 Root MSE = .42654 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .0852725 .0092897 9.18 0.000 .0670437 .1035014 pexp | .2035214 .0235859 8.63 0.000 .1572395 .2498033 pexp2 | -.0124126 .0022825 -5.44 0.000 -.0168916 -.0079336 broken_home | -.0087254 .0357107 -0.24 0.807 -.0787995 .0613488 _cons | .4603326 .137294 3.35 0.001 .1909243 .7297408 ------------------------------------------------------------------------------ Implementing Maximum Likelihood Estimation in Stata¶ Since the normal pdf is undefined for negative variances, we transform the variance term by exp to force any real number that is attempted by the maximization routine to be evaluated as a postive number. Consequently to recover the estimate for the model$\sigma^2$we need to raise it by exp post-estimation to find the maximum likelihood estimate for$\sigma^2$. Here is the code defining the likelihood function and estimating the maximum likelihood model: . program define ols_mle 1. * This program defines the contribution of the likelihood function . * for each observation in a simple OLS model . args lnf xb log_sigma2 2. qui replace lnf' =ln(1/sqrt(2*_pi*exp(log_sigma2'))) - (($ML_y1 - xb')^2)/(2*exp(log_sigma2'))
3.  end

.
.
. ml clear

. * Tell the ML program we want a linear function (lf) modeling the mean
. * of y as a linear function of the variables on the right hand side of
. * the equals.  sigma: just models sigma as a constant
. ml model lf ols_mle (ln_wage=educ pexp pexp2 broken_home) (log_sigma2:)

. ml max

initial:       log likelihood = -3426.2813
alternative:   log likelihood = -2118.3989
rescale:       log likelihood = -1754.9153
rescale eq:    log likelihood = -765.29913
Iteration 0:   log likelihood = -765.29913
Iteration 1:   log likelihood = -652.29309
Iteration 2:   log likelihood = -584.46047
Iteration 3:   log likelihood = -583.66381
Iteration 4:   log likelihood = -583.66289
Iteration 5:   log likelihood = -583.66289

Number of obs   =       1034
Wald chi2(4)    =     206.44
Log likelihood = -583.66289                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eq1          |
educ |   .0852725   .0092672     9.20   0.000     .0671092    .1034358
pexp |   .2035214   .0235288     8.65   0.000     .1574058     .249637
pexp2 |  -.0124126    .002277    -5.45   0.000    -.0168755   -.0079497
broken_home |  -.0087254   .0356243    -0.24   0.807    -.0785477     .061097
_cons |   .4603326   .1369617     3.36   0.001     .1918926    .7287726
-------------+----------------------------------------------------------------
log_sigma2   |
_cons |  -1.708935   .0439799   -38.86   0.000    -1.795134   -1.622736
------------------------------------------------------------------------------

Note that in our program, $\sigma^2$ is equal to the exponential of what stata reports (so the standard deviation can't be negative):

{r}
exp(-1.708935)

Implementing Maximum Likelihood Estimation in R¶

Here we define the likelihood function. Note the first step is to determine if the model variance $\sigma^2<0$. If so, the model returns an NA (not a number), and the optimization function enters another iteration for looking for model parameters.

{r}
y = ln_wage
X = matrix(c(educ,pexp,pexp2,broken_home,ones),nrow = length(educ))
ols.lnlike <- function(theta) {
if (theta[1] <= 0) return(NA)
-sum(dnorm(y, mean = X %*% theta[-1], sd = sqrt(theta[1]), log = TRUE))
}

Note, that we can evaluate the log-likelihood for any parameter vector:

{r}
ols.lnlike(c(1,1,1,1,1,1))

Manual Optimization (the hard way)¶

While this method is the hard way, it demonstrates the mechanics of optimization in R. In order to find the best or optimal parameters, we use R's optim function to find the estimated parameter vector. It needs as arguments

1. starting values
2. optimization method (for unconstrained problems BFGS is a good choice)
3. function to be minimized
4. any data that the function needs for evaluation.

Also, we can ask optim to return the hessian, which we'll use for calculating standard errors.

{r}
mle_res.optim = optim(c(1,.1,.1,.1,.1,.1), method="BFGS",fn=ols.lnlike,hessian=TRUE)
summary(mle_res.optim)

Our model parameters are

{r}
b = round(mle_res.optim$par,5) b and the standard errors are {r} se = round(sqrt(diag(solve(mle_res.optim$hessian))),5)
se

Putting it together for comparison to stata: {r results='asis'} labels = c('Sigma2','educ','pexp','pexp2','broken_home','Constant') tabledat = data.frame(matrix(c(labels,b,se),nrow=length(b))) names(tabledat) = c("Variable","Parameter","Standard Error") stargazer(tabledat,summary=FALSE,title="Maximum Likelihood Results from R", type="html",rownames = FALSE)


### Using maxLik (the easy way)

Unlike optim above, the maxLik package needs the log-likelihood value not -1*log-likelihood. So let's redefine the likelihood:
{r}
ols.lnlike <- function(theta) {
if (theta[1] <= 0) return(NA)
sum(dnorm(y, mean = X %*% theta[-1], sd = sqrt(theta[1]), log = TRUE))
}

Using the maxLik package facilitates this process alot.

{r}
# starting values with names (a named vector)
start = c(1,.1,.1,.1,.1,.1)
names(start) = c("sigma2","educ","pexp","pexp2","broken_home","Constant")
mle_res <- maxLik( logLik = ols.lnlike, start = start, method="BFGS" )
summary( mle_res )

My personal opinion is that if you do a lot of this type of work (writing custom log-likelihoods), you are better off using Matlab, R', orPythonthanStata. If you useRmy strong preference is to usemaxLikas I demonstrate above overmleormle2, since the latter two require that you define your likelihood function as having separate arguments for each parameter to be optimized over.

Visualizing the Likelihood Function¶

Let's graph likelihood for the educ coefficient, holding

1. all data at observed values
2. other parameters constant at optimal values

I am going to call this a "marginal log-likelihood" since we are only allowing one parameter to vary (rather than all 6). Here is what it looks like:

{r}
coeff2 = mle_res.optim$par altcoeff <- seq(.06,.11,0.001) LL <- NULL # calculates the LL for various values of the education parameter # and stores them in LL: for (m in altcoeff) { coeff2[2] <- m # consider a candidate parameter value for education LLt <- -1*ols.lnlike(coeff2) LL <- rbind(LL,LLt) } plot(altcoeff,LL, type="l", xlab="Education Coefficient", ylab="-LogL", main="Marginal Log-Likelihood Function for \n Education Coefficient") abline(v=mle_res.optim$par[2], col="red")
abline(h=mle_res.optim\$value, col="red")`