#+BEGIN_COMMENT .. title: ECON 407: Companion to IV Regression .. slug: econ_407_notes_iv_companion .. date: 2016-07-11 09:31:21 UTC-05:00 .. tags: Endogeneity, ECON407 .. link: .. description: Companion Chapter for IV Regression .. type: text #+END_COMMENT #+OPTIONS: toc:nil #+BEGIN_EXPORT html #+END_EXPORT This page has been moved to [[https://econ.pages.code.wm.edu/407/notes/docs/index.html][https://econ.pages.code.wm.edu/407/notes/docs/index.html]] and is no longer being maintained here. This companion document to our chapter on endogeneity quickly explores the problem of endogeneity and how to estimate this class of models in R and Stata. Recall that the OLS estimator requires $$ E(\mathbf{x'\epsilon}) = 0 $$ This code shows how to overcome estimation problems where this assumption fails but where we can identify an instrument for implementing instrumental variables regression (IV Regression). We demonstrate the uses of ~R~ and ~stata~ for IV regression problems. First, let's open up the data in both R and Stata noting that we are using a "Cross-sectioned" version of Tobias and Koop that focuses on 1983. Load data and summarize: #+BEGIN_SRC stata :session :results output :exports both :eval never-export webuse set "https://rlhick.people.wm.edu/econ407/data/" webuse tobias_koop keep if time==4 sum #+END_SRC #+RESULTS: #+begin_example (prefix now "https://rlhick.people.wm.edu/econ407/data") (16,885 observations deleted) Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- id | 1,034 1090.952 634.8917 4 2177 educ | 1,034 12.27466 1.566838 9 19 ln_wage | 1,034 2.138259 .4662805 .42 3.59 pexp | 1,034 4.81528 2.190298 0 12 time | 1,034 4 0 4 4 -------------+--------------------------------------------------------- ability | 1,034 .0165957 .9209635 -3.14 1.89 meduc | 1,034 11.40329 3.027277 0 20 feduc | 1,034 11.58511 3.735833 0 20 broken_home | 1,034 .1692456 .3751502 0 1 siblings | 1,034 3.200193 2.126575 0 15 -------------+--------------------------------------------------------- pexp2 | 1,034 27.97969 22.59879 0 144 #+end_example and in R: #+BEGIN_SRC R :session :eval never-export :exports code :results none replace library(foreign) library(sandwich) library(lmtest) library(boot) library(AER) library(car) library(ivpack) tk.df = read.dta("https://rlhick.people.wm.edu/econ407/data/tobias_koop.dta") tk4.df = subset(tk.df, time == 4) attach(tk4.df) #+END_SRC ** OLS If we ignore any potential endogeneity problem we can estimate OLS as described in the OLS chapter companion. Here are the results from stata: #+BEGIN_SRC stata :session :results output :exports both :eval never-export reg ln_wage pexp pexp2 educ broken_home #+END_SRC #+RESULTS: #+begin_example Source | SS df MS Number of obs = 1,034 -------------+---------------------------------- F(4, 1029) = 51.36 Model | 37.3778146 4 9.34445366 Prob > F = 0.0000 Residual | 187.21445 1,029 .181938241 R-squared = 0.1664 -------------+---------------------------------- Adj R-squared = 0.1632 Total | 224.592265 1,033 .217417488 Root MSE = .42654 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pexp | .2035214 .0235859 8.63 0.000 .1572395 .2498033 pexp2 | -.0124126 .0022825 -5.44 0.000 -.0168916 -.0079336 educ | .0852725 .0092897 9.18 0.000 .0670437 .1035014 broken_home | -.0087254 .0357107 -0.24 0.807 -.0787995 .0613488 _cons | .4603326 .137294 3.35 0.001 .1909243 .7297408 ------------------------------------------------------------------------------ #+end_example where education, has the elasticity #+BEGIN_SRC stata :session :results output :exports both :eval never-export margins, dyex(educ) continuous #+END_SRC #+RESULTS: #+begin_example Average marginal effects Number of obs = 1,034 Model VCE : OLS Expression : Linear prediction, predict() dy/ex w.r.t. : educ ------------------------------------------------------------------------------ | Delta-method | dy/ex Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | 1.046691 .1140274 9.18 0.000 .8229385 1.270444 ------------------------------------------------------------------------------ #+end_example Running the OLS regression in R is done in a similar manner (I am surpressing output for the sake of brevity). #+BEGIN_SRC R :session :eval never-export :exports code :results none replace ols.lm = lm(ln_wage ~ pexp + pexp2 + broken_home + educ) #+END_SRC * The endogeneity problem Suppose we are worried that education is endogenous. That is, it is correlated with the population regression errors. This means OLS estimates of $\beta$ are biased. We hypothesize that the variable =feduc= is a good instrument having all the properties we describe in detail in the notes document. ** Estimation in Stata In stata, we use this code: #+BEGIN_SRC stata :session :results output :exports both :eval never-export ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc) #+END_SRC #+RESULTS: #+begin_example Instrumental variables (2SLS) regression Number of obs = 1,034 Wald chi2(4) = 138.19 Prob > chi2 = 0.0000 R-squared = 0.1277 Root MSE = .43528 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1495027 .0320009 4.67 0.000 .0867821 .2122233 pexp | .214752 .0246553 8.71 0.000 .1664285 .2630755 pexp2 | -.0117453 .0023508 -5.00 0.000 -.0163529 -.0071377 broken_home | .0244713 .0397189 0.62 0.538 -.0533763 .102319 _cons | -.4064389 .4356072 -0.93 0.351 -1.260213 .4473354 ------------------------------------------------------------------------------ Instrumented: educ Instruments: pexp pexp2 broken_home feduc #+end_example Note 2 things: 1. The mean estimate for the elasticity on education has nearly doubled compared to OLS 2. There is no R command that I can find that will exactly replicate the above results, since ivregress isn't applying a small sample correction to the variance covariance matrix that the R =ivreg= command does. #+BEGIN_SRC stata :session :results output :exports both :eval never-export margins, dyex(educ) continuous #+END_SRC #+RESULTS: #+begin_example Average marginal effects Number of obs = 1,034 Model VCE : Unadjusted Expression : Linear prediction, predict() dy/ex w.r.t. : educ ------------------------------------------------------------------------------ | Delta-method | dy/ex Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | 1.835095 .3928002 4.67 0.000 1.065221 2.60497 ------------------------------------------------------------------------------ #+end_example ** Estimation in R This runs the estimation in R. #+BEGIN_SRC R :session :eval never-export :exports both :results output replace ivmodel <- ivreg(ln_wage ~ pexp + pexp2 + broken_home + educ | pexp + pexp2 + broken_home + feduc) summary(ivmodel) #+END_SRC #+RESULTS: #+begin_example Call: ivreg(formula = ln_wage ~ pexp + pexp2 + broken_home + educ | pexp + pexp2 + broken_home + feduc) Residuals: Min 1Q Median 3Q Max -1.8472 -0.2326 0.0194 0.2541 1.6113 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.406439 0.436664 -0.931 0.352 pexp 0.214752 0.024715 8.689 < 2e-16 *** pexp2 -0.011745 0.002357 -4.984 7.30e-07 *** broken_home 0.024471 0.039815 0.615 0.539 educ 0.149503 0.032079 4.661 3.57e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4363 on 1029 degrees of freedom Multiple R-Squared: 0.1277, Adjusted R-squared: 0.1243 Wald test: 34.38 on 4 and 1029 DF, p-value: < 2.2e-16 #+end_example As an FYI, ~R~ is reporting the unadjusted standard errors that matches output from this stata command: #+BEGIN_SRC stata :exports code :eval never-export ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc), small #+END_SRC Stata's ivregress output for robust regression (suppressed) is obtained from #+BEGIN_SRC stata :exports code :eval never-export ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc), robust #+END_SRC Here is the robust version of the model in =R=, #+BEGIN_SRC R :session :eval never-export :exports both :results output replace summary(ivmodel,vcov=sandwich) #+END_SRC #+RESULTS: #+begin_example Call: ivreg(formula = ln_wage ~ pexp + pexp2 + broken_home + educ | pexp + pexp2 + broken_home + feduc) Residuals: Min 1Q Median 3Q Max -1.8472 -0.2326 0.0194 0.2541 1.6113 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.406439 0.440450 -0.923 0.356 pexp 0.214752 0.023863 8.999 < 2e-16 *** pexp2 -0.011745 0.002359 -4.978 7.53e-07 *** broken_home 0.024471 0.033503 0.730 0.465 educ 0.149503 0.032908 4.543 6.20e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4363 on 1029 degrees of freedom Multiple R-Squared: 0.1277, Adjusted R-squared: 0.1243 Wald test: 37.63 on 4 and 1029 DF, p-value: < 2.2e-16 #+end_example * Inference We have more work to do: 1. Test for relevant and strong instruments 2. Test for endogeneity 3. Test for overidentification (not relevant for this example) In stata, we issue these commands: #+BEGIN_SRC stata :session :results output :exports both :eval never-export estat firststage #+END_SRC #+RESULTS: #+begin_example First-stage regression summary statistics -------------------------------------------------------------------------- | Adjusted Partial Variable | R-sq. R-sq. R-sq. F(1,1029) Prob > F -------------+------------------------------------------------------------ educ | 0.2416 0.2387 0.0878 98.9915 0.0000 -------------------------------------------------------------------------- Minimum eigenvalue statistic = 98.9915 Critical Values # of endogenous regressors: 1 Ho: Instruments are weak # of excluded instruments: 1 --------------------------------------------------------------------- | 5% 10% 20% 30% 2SLS relative bias | (not available) -----------------------------------+--------------------------------- | 10% 15% 20% 25% 2SLS Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53 LIML Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53 --------------------------------------------------------------------- #+end_example Note, since the number of instruments is equal to the number of endogenous variables, we don't have an overidentification problem. #+BEGIN_SRC stata :session :results output :exports both :eval never-export estat overid #+END_SRC #+RESULTS: : no overidentifying restrictions : r(498); In R, we do it this way: #+BEGIN_SRC R :session :eval never-export :exports both :results output replace summary(ivmodel,vcov=sandwich,diagnostics = TRUE) #+END_SRC #+RESULTS: #+begin_example Call: ivreg(formula = ln_wage ~ pexp + pexp2 + broken_home + educ | pexp + pexp2 + broken_home + feduc) Residuals: Min 1Q Median 3Q Max -1.8472 -0.2326 0.0194 0.2541 1.6113 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.406439 0.440450 -0.923 0.356 pexp 0.214752 0.023863 8.999 < 2e-16 *** pexp2 -0.011745 0.002359 -4.978 7.53e-07 *** broken_home 0.024471 0.033503 0.730 0.465 educ 0.149503 0.032908 4.543 6.20e-06 *** Diagnostic tests: df1 df2 statistic p-value Weak instruments 1 1029 80.649 <2e-16 *** Wu-Hausman 1 1028 4.376 0.0367 * Sargan 0 NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4363 on 1029 degrees of freedom Multiple R-Squared: 0.1277, Adjusted R-squared: 0.1243 Wald test: 37.63 on 4 and 1029 DF, p-value: < 2.2e-16 #+end_example These results tell us we have relevant and strong instruments and that education is likely endogenous.