clear
set more off

* Let's load the mroz dataset and the lfp variable as the dv:
webuse set "http://rlhick.people.wm.edu/econ407/data"
webuse mroz

* put faminc into 10000 dollars
replace faminc=faminc/10000


* suppose we think that wife's education is endogenous in a labor force participation equation
* We could run this:

ivprobit lfp kl6 k618 faminc (we=wmed)

* Search google and you'll find many people asking for ivlogit: none exists in stata
* The reason is that the likelihood function does not have a closed form solution since adjusting
* the standard errors to account for the fact that the instrumented value of we is a random
* variable. Simply using the predicted values as a regressor without accounting
* for the fact that it is a random variable will lead to bad standard errors (underestimated).

* To get around this, bootstrap:

* first define an eclass program that first runs the relevancy equation and then 
* include the residual in the original logit equation to recover the correct
* b's
capture program drop ivlogit
program ivlogit, eclass
          version 11
          tempname ivbeta
          tempname esample
          tempname resid
          
         * first run the relevancy equation:
         reg we kl6 k618 faminc wmed
         predict resid, r
 
         * now include the residual in the logit regression to yield the correct 
         * beta's from the iv regression
         logit lfp kl6 k618 faminc we resid
         matrix `ivbeta' = e(b)
         * need to drop resid so the next replicate can execute the predict command
         drop resid
         ereturn post `ivbeta'
         ereturn local cmd="bootstrap"
end

* note that once we scale the ivprobit results by .6, we get almost exactly the same 
* means and confidence intervales with our ivlogit command.  You might argue that our 
* approach is superior to ivprobit because it accounts of *any* general error sructure
* and does not rely on normality:
* This forces normality: 
bootstrap _b, reps(100): ivlogit
* This recycles the previous results and calculates the upper and lower .025 percentiles
* rather than forcing symmetry via the std deviation and is the preferred method
* (called the non-parametric bootstrap):
estat bootstrap, percentile
* to test for exogeneity of we we would need to do some further programming work 
* (save the estimates and variance covariance) and the do a hausman test compared
* to a standard logit model that assumes we is exogenous (from the regression below).