Application of the Tobit and Heckman Sample Selection Model
This page has been moved to https://econ.pages.code.wm.edu/407/notes/docs/index.html and is no longer being maintained here.
Tobit Application
The following code shows some of the mechanics for running a Tobit Model as well as ways we can use the model results after estimation. This uses a "toy dataset" that has only 1 dependent variable.
clear
webuse set "https://rlhick.people.wm.edu/econ407/data"
webuse toy_tobit
sum
(prefix now "https://rlhick.people.wm.edu/econ407/data") Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- index | 5,000 2499.5 1443.52 0 4999 y | 5,000 5.082118 .9811196 3.858572 8.662185 x | 5,000 -.0008725 1.001673 -3.620101 3.748413
The first 5 rows of data looks like this
list in 1/5
+--------------------------------+ | index y x | |--------------------------------| 1. | 0 4.2630254 -1.2008912 | 2. | 1 4.6382766 .1366034 | 3. | 2 4.776782 1.1964091 | 4. | 3 6.8389654 .71531347 | 5. | 4 5.2471644 .39031073 | +--------------------------------+
Of particular interest is our censored dependent variable \(\mathbf{y}\). The histogram is
hist y, frequency bin(20) graphregion(color(white)) ///
title("Histogram of the Censored Dependent Variable") xtitle("y")
graph export "/tmp/toy_tobit_hist.eps", replace
The scatterplot also shows how the censored values have been "stacked" on the lowering censoring point and how lower values of \(\mathbf{y}\) are dragged right towards the censoring point.
scatter x y, graphregion(color(white)) title("Scatterplot of x and y") ///
msize(tiny) xtitle("x") ytitle("y") xlab(0(.5)9)
graph export "/tmp/toy_tobit_scatter.eps", replace
../sitepics/toytobitscatter.png
To run the tobit model, we issue the command
egen a = min(y)
display "Lower Censoring Point (a): ", a
tobit y x, ll
Lower Censoring Point (a): 3.8585718 Refining starting values: Grid node 0: log likelihood = -6924.13 Fitting full model: Iteration 0: log likelihood = -6924.13 Iteration 1: log likelihood = -6813.1161 Iteration 2: log likelihood = -6810.4363 Iteration 3: log likelihood = -6810.4337 Iteration 4: log likelihood = -6810.4337 Tobit regression Number of obs = 5,000 Uncensored = 4,198 Limits: lower = 3.86 Left-censored = 802 upper = +inf Right-censored = 0 LR chi2(1) = 1074.36 Prob > chi2 = 0.0000 Log likelihood = -6810.4337 Pseudo R2 = 0.0731 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .5062554 .01488 34.02 0.000 .4770841 .5354267 _cons | 4.986882 .0147063 339.10 0.000 4.958051 5.015713 -------------+---------------------------------------------------------------- var(e.y)| 1.02864 .0232449 .9840649 1.075235 ------------------------------------------------------------------------------
Outputs from the Tobit Model
Once we have run the model, we can use the results for creating parameter values that we might want to use for veryifying our understanding of the Tobit Model.
Parameter | Stata Command |
---|---|
\(\sigma\) | sqrt(_b[/var(e.y)]) |
\(log(L)\) | e(ll) |
\(\beta_{name}\) | _b[name] |
\(\mathbf{x}\hat{\beta}\) | predict xb, xb |
Example code using these special variables will create a new variable in your data space:
gen sigma = sqrt(_b[/var(e.y)])
gen logLike = e(ll)
gen beta_x = _b[x]
predict xb, xb
These can be used for replicating a number of the outputs available as postestimation commands. The following table lists the various expected values from the Tobit model, the formula it is based on, and the stata commands for generating marginal effects and predicted values, respectively.
Expected Value | Formula | Stata Command |
---|---|---|
\(E[y \shortmid y\hspace{.03in} obs]\) | \(\mathbf{x}_i \beta^{T} + \sigma \frac{\phi \left(\frac{a-\mathbf{x}_i \beta^{T}}{\sigma} \right)}{1-\Phi \left(\frac{a-\mathbf{x}_i \beta^{T}}{\sigma}\right)}\) | margins, predict(e(a,.)) |
predict ycond, e(a,.) |
||
\(Prob(Not Censored)\) | \(1-\Phi \left(\frac{a-\mathbf{x}_i \beta^{T}}{\sigma} \right)\) | margins, pr(a,.) |
predict probobs, pr(a,.) |
||
\(E[y]\) | \(\Phi \left (\frac{a-\mathbf{x}_i \beta^{T}}{\sigma} \right ) a + \left (1-\Phi \left(\frac{a-\mathbf{x}_i \beta^{T}}{\sigma}\right ) \right) \left [\mathbf{x}_i \beta^{T} + \sigma \frac{\phi \left( \frac{a-\mathbf{x}_i \beta^{T}}{\sigma} \right)}{1-\Phi \left (\frac{a-\mathbf{x}_i \beta^{T}}{\sigma} \right)} \right ]\) | margins, predict(ystar(a,.)) |
predict y, predict(ystar(a,.) |
||
\(E[y^*]\) | \(\mathbf{x}_i \beta^{T}\) | margins, xb |
predict xb, xb |
Noting that \(\beta^T\) are the estimates of \(\beta\) from the Tobit Model.
Heckman Application
The following code shows how to run a Heckman model using data that has completely non-overlapping variables in \(\mathbf{x}\), the independent variables in the amounts equation and \(\mathbf{w}\) the independent variables in the selection equation.
clear
webuse set "https://rlhick.people.wm.edu/econ407/data"
webuse toy_heckman
sum
(prefix now "https://rlhick.people.wm.edu/econ407/data") Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- index | 5,000 2499.5 1443.52 0 4999 y | 3,166 -1.504517 .9267993 -4.643568 1.950691 x | 5,000 .0038611 .9948256 -3.184449 4.071575 z | 5,000 .6332 .4819795 0 1 w | 5,000 .0051353 .9901703 -3.564716 3.639859
Note in this dataset, there are missing values for \(\mathbf{y}\) as a result of our selection mechanism.The first five observations looks like this:
list in 1/5
+--------------------------------------------------+ | index y x z w | |--------------------------------------------------| 1. | 0 . -2.9543519 0 -2.1461785 | 2. | 1 -1.4679083 .92509399 1 1.5765399 | 3. | 2 . .98621375 0 .05838758 | 4. | 3 -1.8399264 1.1407735 1 .70403742 | 5. | 4 -1.4611701 .42070096 1 .23800567 | +--------------------------------------------------+
Using this data, we estimate the very simple Heckman Model having only one variable each in the amounts and slection equation.
heckman y x, select(z = w)
Iteration 0: log likelihood = -6421.6346 Iteration 1: log likelihood = -6419.2129 Iteration 2: log likelihood = -6419.1934 Iteration 3: log likelihood = -6419.1934 Heckman selection model Number of obs = 5,000 (regression model with sample selection) Selected = 3,166 Nonselected = 1,834 Wald chi2(1) = 611.64 Log likelihood = -6419.193 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- y | x | .4847046 .0195987 24.73 0.000 .4462918 .5231174 _cons | -2.009821 .025109 -80.04 0.000 -2.059034 -1.960608 -------------+---------------------------------------------------------------- z | w | .9816843 .0262184 37.44 0.000 .9302972 1.033071 _cons | .4758687 .0213349 22.30 0.000 .4340531 .5176844 -------------+---------------------------------------------------------------- /athrho | 1.106114 .0630869 17.53 0.000 .9824655 1.229762 /lnsigma | .0147216 .0168441 0.87 0.382 -.0182922 .0477353 -------------+---------------------------------------------------------------- rho | .8026843 .0224399 .7541312 .8425102 sigma | 1.01483 .0170939 .9818741 1.048893 lambda | .8145885 .0334737 .7489813 .8801956 ------------------------------------------------------------------------------ LR test of indep. eqns. (rho = 0): chi2(1) = 214.58 Prob > chi2 = 0.0000
As discussed in class, the command bundles the likelihood ratio test statistic (at the bottom) for testing whether \(\rho=0\) (the OLS Model could be applied) or not (the Heckman Model should be applied).
Outputs from the Heckman Model
Once we have run the model, we can use the results for creating parameter values that we might want to use for verifying our understanding of the Heckman Model.
Parameter | Stata Command |
---|---|
\(\sigma\) | e(sigma) |
\(\rho\) | e(rho) |
\(log(L)\) | e(ll) |
\(\beta_{name}\) | _b[name] |
\(\gamma_{name}\) |
_b[z:name] (where z is the name of your selection dependent variable) |
\(\mathbf{x}\hat{\beta}\) | predict xb, xb |
\(\mathbf{w}\hat{\gamma}\) | predict zg, xbsel |
One thing worth noting is that we have two sets of parameters being maintained in the background by stata. To extract individual ones, given our variable names use syntax like this
gen beta_x = _b[x]
di "beta x coefficient: "beta_x
gen gamma_w = _b[z:w]
di "gamma w coefficient:", gamma_w
beta x coefficient: .48470458 gamma w coefficient: .98168427
Usually for this class, you will be working with the linear predictor, so you'll be calculating \(\mathbf{w}\gamma\) and \(\mathbf{x}\beta\) using
predict zg, xbsel
predict xb, xb
Expected Value | Formula | Stata Command |
---|---|---|
\(E[y \shortmid y\hspace{.03in} obs]\) | \(\mathbf{x}_i \beta^{H} + \rho \sigma_{\epsilon} \frac{\phi(\mathbf{w}_i \gamma)}{\Phi(\mathbf{w}_i \gamma)}\) | margins, predict(ycond) |
predict ycond, ycond |
||
\(Prob(Not Censored)\) | \(\Phi(\mathbf{w}_i \gamma)\) | margins, pr(psel) |
predict probobs, pr(psel) |
||
\(E[y]\) | \(\Phi(\mathbf{w}_i \gamma)\left[\mathbf{x}_i \beta^H + \rho \sigma_{\epsilon} \frac{\phi(\mathbf{w}_i \gamma)}{\Phi(\mathbf{w}_i \gamma)}\right]\) | margins, predict(yexpected) |
predict y, yexpected |
||
\(E[y^*]\) | \(\mathbf{x}_i \beta^{H}\) | margins, xb |
predict xb, xb |
Noting that \(\beta^T\) are the estimates of \(\beta\) from the Tobit Model.
For example, if we want to predict \(E[y \shortmid y\hspace{.03in} obs]\) for each observation we can just issue this command:
predict ycond, ycond
list y ycond in 1/10
variable ycond already defined r(110); +------------------------+ | y ycond | |------------------------| 1. | . -1.771267 | 2. | -1.4679083 -1.518555 | 3. | . -1.130811 | 4. | -1.8399264 -1.269633 | 5. | -1.4611701 -1.473895 | |------------------------| 6. | -1.4463141 -1.561146 | 7. | -2.0634319 -1.475339 | 8. | -.82478092 -2.157227 | 9. | -.75221966 -1.741791 | 10. | . -1.650602 | +------------------------+
This shows that we can use the model to predict outcomes assuming everyone is selected.