# ECON 407: Companion to Maximum Likelihood

This companion to the maximum likelihood discussion shows how to implement many of our topics in R and Stata.

The maximum likelihood method seeks to find model parameters $\pmb{\theta}^{MLE}$ that maximizes the joint likelihood (the likelihood function- $L$) of observing $\mathbf{y}$ conditional on some structural model that is a function of independent variables $\mathbf{x}$ and model parameters $\pmb{\theta}^{MLE}$: $$\pmb{\theta}^{MLE} = \underset{\pmb{\theta}}{\operatorname{max}} L = \underset{\pmb{\theta}}{\operatorname{max}} \prod_{i=1}^N prob(y_i|\mathbf{x}_i,\pmb{\theta})$$ In a simple ordinary least squares setting with a linear model, we have the model parameters $\pmb{\theta}=[\pmb{\beta},\sigma]$ making the likelihood function $$prob(y_i|\mathbf{x}_i,\pmb{\beta},\sigma) = \frac{1}{\sqrt{2\sigma^2 \pi}}e^{\frac{-(y_i - \mathbf{x}_i\pmb{\beta})^2}{2\sigma^2}}$$ Typically, for computational purposes we work with the log-likelihood function rather than the the likelihood. This is $$log(L) = \sum_{i=1}^N log(prob(y_i|\mathbf{x}_i,\pmb{\theta}))$$

In the code below we show how to implement a simple regression model using generic maximum likelihood estimation in stata and R. While this is a toy example with existing estimators in stata (regress) and R (lm), we develop the example here for demonstration purposes.

We'll be estimating a standard OLS model using maximum likelihood. For bench-marking purposes, the standard OLS estimation results in stata are given below:

webuse set "http://rlhick.people.wm.edu/econ407/data"
webuse tobias_koop
keep if time==4
regress ln_wage educ pexp pexp2 broken_home

set more off
(prefix now "http://rlhick.people.wm.edu/econ407/data")
(16,885 observations deleted)

Source |       SS           df       MS      Number of obs   =     1,034
-------------+----------------------------------   F(4, 1029)      =     51.36
Model |  37.3778146         4  9.34445366   Prob > F        =    0.0000
Residual |   187.21445     1,029  .181938241   R-squared       =    0.1664
-------------+----------------------------------   Adj R-squared   =    0.1632
Total |  224.592265     1,033  .217417488   Root MSE        =    .42654

------------------------------------------------------------------------------
ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ |   .0852725   .0092897     9.18   0.000     .0670437    .1035014
pexp |   .2035214   .0235859     8.63   0.000     .1572395    .2498033
pexp2 |  -.0124126   .0022825    -5.44   0.000    -.0168916   -.0079336
broken_home |  -.0087254   .0357107    -0.24   0.807    -.0787995    .0613488
_cons |   .4603326    .137294     3.35   0.001     .1909243    .7297408
------------------------------------------------------------------------------


## Maximum Likelihood in Stata

To setup the model, we need to program in the normal log-likelihood function. Here is the code defining the likelihood function and estimating the maximum likelihood model:

program define ols_mle
* This program defines the contribution of the likelihood function
* for each observation in a simple OLS model
args lnf xb sigma
qui replace lnf' =ln(1/sqrt(2*_pi*(sigma')^2)) - (($ML_y1 - xb')^2)/(2*(sigma')^2) end  Having defined the likelihood, use the stata maximum likelihood capabilities. Note: the code is telling the Maximum likelihood program • we want a linear function (lf) modeling the mean • y as a linear function of the independent variables • sigma is modeled as a constant ml model lf ols_mle (ln_wage=educ pexp pexp2 broken_home) (sigma:) ml max  initial: log likelihood = -<inf> (could not be evaluated) feasible: log likelihood = -6232.9439 rescale: log likelihood = -1697.4414 rescale eq: log likelihood = -722.18386 Iteration 0: log likelihood = -722.18386 Iteration 1: log likelihood = -606.03669 Iteration 2: log likelihood = -583.66648 Iteration 3: log likelihood = -583.66289 Iteration 4: log likelihood = -583.66289 Number of obs = 1,034 Wald chi2(4) = 206.44 Log likelihood = -583.66289 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- eq1 | educ | .0852725 .0092672 9.20 0.000 .0671092 .1034358 pexp | .2035214 .0235288 8.65 0.000 .1574058 .249637 pexp2 | -.0124126 .002277 -5.45 0.000 -.0168755 -.0079497 broken_home | -.0087254 .0356243 -0.24 0.807 -.0785477 .061097 _cons | .4603326 .1369617 3.36 0.001 .1918926 .7287726 -------------+---------------------------------------------------------------- sigma | _cons | .4255097 .0093569 45.48 0.000 .4071704 .4438489 ------------------------------------------------------------------------------  ## Maximum Likelihood in R Load data and put into an R dataframe. library(foreign) library(maxLik) tk.df = read.dta("http://rlhick.people.wm.edu/econ407/data/tobias_koop.dta") tk4.df = subset(tk.df, time == 4) tk4.df$ones=1
attach(tk4.df)


Loading our data:

y = ln_wage
X = matrix(c(educ,pexp,pexp2,broken_home,ones),nrow = length(educ))
ols.lnlike <- function(theta) {
if (theta[1] <= 0) return(NA)
-sum(dnorm(y, mean = X %*% theta[-1], sd = theta[1], log = TRUE))
}


Notice, we can evaluate this function at any parameter value (where the first parameter is $\sigma$):

ols.lnlike(c(1,1,1,1,1,1))

1305304.09136282


### Manual Optimization (the hard way)

While this method is the hard way, it demonstrates the mechanics of optimization in R. In order to find the best or optimal parameters, we use R's optim function to find the estimated parameter vector. It needs as arguments

1. starting values
2. optimization method (for unconstrained problems BFGS is a good choice)
3. function to be minimized
4. any data that the function needs for evaluation.

Also, we can ask optim to return the hessian, which we'll use for calculating standard errors. The code below calls optim on our function with starting values of 1 for all parameters c(1,.1,.1,.1,.1,.1) and uses the BFGS optimization method and asks for a hessian to be returned. This runs the maximum likelihood model and assembles results in a table:

mle_res.optim = optim(c(1,.1,.1,.1,.1,.1), method="BFGS", fn=ols.lnlike, hessian=TRUE)
# parameter estimates
b = round(mle_res.optim$par,5) # std errors se = round(sqrt(diag(solve(mle_res.optim$hessian))),5)
labels = c('sigma','educ','pexp','pexp2','broken_home','Constant')
tabledat = data.frame(matrix(c(labels,b,se),nrow=length(b)))
names(tabledat) = c("Variable","Parameter","Standard Error")
print(tabledat)

     Variable Parameter Standard Error
1       sigma   0.42551        0.00936
2        educ   0.08527        0.00927
3        pexp   0.20352        0.02353
4       pexp2  -0.01241        0.00228
5 broken_home  -0.00873        0.03562
6    Constant   0.46033        0.13696


### Using maxLik (the easy way)

Unlike optim above, the maxLik package needs the log-likelihood value not -1*log-likelihood. So let's redefine the likelihood:

ols.lnlike <- function(theta) {
if (theta[1] <= 0) return(NA)
sum(dnorm(y, mean = X %*% theta[-1], sd = sqrt(theta[1]), log = TRUE))
}

# starting values with names (a named vector)
start = c(1,.1,.1,.1,.1,.1)
names(start) = c("sigma2","educ","pexp","pexp2","broken_home","Constant")
mle_res <- maxLik( logLik = ols.lnlike, start = start, method="BFGS" )
print(summary(mle_res))

--------------------------------------------
Maximum Likelihood estimation
BFGS maximization, 117 iterations
Return code 0: successful convergence
Log-Likelihood: -583.6629
6  free parameters
Estimates:
Estimate Std. error t value  Pr(> t)
sigma2       0.181058   0.007963  22.737  < 2e-16 ***
educ         0.085273   0.009262   9.207  < 2e-16 ***
pexp         0.203521   0.023524   8.652  < 2e-16 ***
pexp2       -0.012413   0.002277  -5.452 4.99e-08 ***
broken_home -0.008725   0.035626  -0.245 0.806525
Constant     0.460333   0.136859   3.364 0.000769 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------


My personal opinion is that if you do a lot of this type of work (writing custom log-likelihoods), you are better off using Matlab, R, or Python than Stata. If you use R my strong preference is to use maxLik as I demonstrate above over mle or mle2, since the latter two require that you define your likelihood function as having separate arguments for each parameter to be optimized over.

## Visualizing the Likelihood Function

Let's graph likelihood for the educ coefficient, holding

1. all data at observed values
2. other parameters constant at optimal values

I am going to call this a "marginal log-likelihood" since we are only allowing one parameter to vary (rather than all 6). Here is what it looks like:

coeff2 = mle_res.optim$par altcoeff <- seq(.06,.11,0.001) LL <- NULL # calculates the LL for various values of the education parameter # and stores them in LL: for (m in altcoeff) { coeff2[2] <- m # consider a candidate parameter value for education LLt <- -1*ols.lnlike(coeff2) LL <- rbind(LL,LLt) } plot(altcoeff,LL, type="l", xlab="Education Coefficient", ylab="-LogL", main="Marginal Log-Likelihood Function for \n Education Coefficient") abline(v=mle_res.optim$par[2], col="red")
abline(h=mle_res.optim\$value, col="red")
dev.copy(png,'/tmp/marginal_mle.png')
dev.off()


Notice that as we move away from the estimated value (the red line), the log-likelihood function declines as we lose joint probability from the estimation sample.