# Heckman Code¶

The python code generating the toy data for the figures above is given below. This version examines Case 4.1

```# start a connected stata17 session
from pystata import config
config.init('be')
```
```  ___  ____  ____  ____  ____ ©
/__    /   ____/   /   ____/      17.0
___/   /   /___/   /   /___/       BE—Basic Edition

Statistics and Data Science       Copyright 1985-2021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC        https://www.stata.com
979-696-4600        stata@stata.com

Stata license: Single-user  perpetual
Serial number: 301706306291
Licensed to: Rob Hicks
College of William and Mary

Notes:
1. Unicode is supported; see help unicode_advice.
```
```import numpy as np
import pandas as pd

# true parameters
rho_t = np.array([0.8])
rho_x_w_t = np.array([0.8])
gamma_t = np.array([.5,1.0])
beta_t = np.array([-2.0,0.5])
sigma_e_t = np.array([1.0])
N =5000

# generate toy data consistent with heckman:
# generate potentially correlated x,w data
mean_x_w = np.array([0,0])
cov_x_w = np.array([[1,rho_x_w_t[0]],[rho_x_w_t[0], 1]])
w, x = np.random.multivariate_normal(mean_x_w, cov_x_w, N).T

# add constant to first position and convert to DataFrame
w_ = pd.DataFrame(np.c_[np.ones(N),w],columns=['Constant (Selection)','Slope (Selection)'])
x_ = pd.DataFrame(np.c_[np.ones(N),x], columns=['Constant','Slope'])

# generate errors
mean_u_eps = np.array([0,0])
cov_u_eps = np.array([[1,rho_t[0]],[rho_t[0],sigma_e_t[0]]])
u, epsilon = np.random.multivariate_normal(mean_u_eps, cov_u_eps, N).T

# generate latent zstar
zstar = w_.dot(gamma_t) + u
# generate observed z (indicator=1 if zstar is positive)
z = zstar > 0

# generate latent ystar
ystar = x_.dot(beta_t) + epsilon
y=ystar.copy()
# generate observed y [if z=0, set y to NaN]
y[~z] = np.nan

stata_data = pd.DataFrame(np.c_[y,z,x,w], columns=['y','z','x','w'])
stata_data.head()
```
y z x w
0 -1.234971 1.0 0.705421 1.638497
1 NaN 0.0 -0.705640 -0.496328
2 -2.111768 1.0 0.630919 0.375758
3 -2.386395 1.0 -0.841180 -0.265511
4 -2.297429 1.0 0.637365 0.953034

We can estimate a Two-Step Heckman Model in Python using an unmerged branch from StatsModels (this replicates the Stata two-step results).

```import heckman as heckman
res = heckman.Heckman(y, x_, w_).fit(method='twostep')
print(res.summary())
```
```       Heckman Regression Results
=======================================
Dep. Variable:                        y
Model:                          Heckman
Method:                Heckman Two-Step
Date:                  Wed, 12 May 2021
Time:                          12:22:11
No. Total Obs.:                    5000
No. Censored Obs.:                 1847
No. Uncensored Obs.:               3153
==============================================================================
coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Constant      -1.9881      0.037    -53.394      0.000      -2.061      -1.915
Slope          0.4999      0.024     20.649      0.000       0.452       0.547
========================================================================================
coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------------
Constant (Selection)     0.4736      0.022     21.891      0.000       0.431       0.516
Slope (Selection)        0.9805      0.028     35.587      0.000       0.927       1.035
================================================================================
coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
IMR (Lambda)     0.7412      0.062     12.026      0.000       0.620       0.862
=====================================
rho:                            0.753
sigma:                          0.985
=====================================

First table are the estimates for the regression (response) equation.
Second table are the estimates for the selection equation.
Third table is the estimate for the coef of the inverse Mills ratio (Heckman's Lambda).
```

And in Stata, we can estimate the Full Information Maximum Likelihood model over the toy dataset as

```%%stata -d stata_data
heckman y x, select(z=w)
```
```
```
```Iteration 0:   log likelihood = -6363.4231
```
```Iteration 1:   log likelihood = -6360.8593
```
```Iteration 2:   log likelihood = -6360.8397
```
```Iteration 3:   log likelihood = -6360.8397

Heckman selection model                         Number of obs     =      5,000
(regression model with sample selection)              Selected    =      3,153
Nonselected =      1,847

Wald chi2(1)      =     721.77
Log likelihood =  -6360.84                      Prob > chi2       =     0.0000
```
```------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
```
```y            |
x |   .5161336   .0192116    26.87   0.000     .4784796    .5537877
_cons |  -2.024205   .0244713   -82.72   0.000    -2.072167   -1.976242
-------------+----------------------------------------------------------------
z            |
w |   .9748141   .0260147    37.47   0.000     .9238261    1.025802
_cons |   .4769289   .0213639    22.32   0.000     .4350564    .5188013
-------------+----------------------------------------------------------------
/athrho |   1.116945   .0613091    18.22   0.000     .9967814    1.237109
/lnsigma |   .0034814    .016585     0.21   0.834    -.0290246    .0359874
-------------+----------------------------------------------------------------
rho |   .8065037   .0214307                      .7602391     .844629
sigma |   1.003487   .0166429                      .9713925    1.036643
lambda |   .8093163     .03187                      .7468522    .8717805
------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 229.21       Prob > chi2 = 0.0000
```

1

Note, we are using Stata 17 Jupyter Notebook integration using the `%%stata` magic.