ECON 407: Companion to Panel Data Models

This page has been moved to https://econ.pages.code.wm.edu/407/notes/docs/index.html and is no longer being maintained here.

This document, a companion to the Panel Data series of lecture notes, provides a brief description of how to implement panel data models in both R and Stata. We will load with Tobias and Koop but this time will use the entire dataset since we are now ready to exploit the panel nature of the full dataset. Load data and summarize:

library(foreign)
library(lmtest)
library(plm)
tk.df = read.dta("https://rlhick.people.wm.edu/econ407/data/tobias_koop.dta")
attach(tk.df)

We also need to load the data into stata

webuse set "https://rlhick.people.wm.edu/econ407/data/"
webuse tobias_koop
sum
set more off
(prefix now "https://rlhick.people.wm.edu/econ407/data")

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          id |     17,919    1081.852    629.8459          1       2178
        educ |     17,919    12.67604    1.922433          9         20
     ln_wage |     17,919    2.296821    .5282364        .07       4.57
        pexp |     17,919    8.362688    4.127502          0         22
        time |     17,919    8.196719    3.956042          0         14
-------------+---------------------------------------------------------
     ability |     17,919     .052374    .9261294      -4.04       2.01
       meduc |     17,919     11.4719    2.988851          0         20
       feduc |     17,919    11.70925    3.766923          0         20
 broken_home |     17,919     .153859    .3608236          0          1
    siblings |     17,919    3.156203    2.120989          0         18
-------------+---------------------------------------------------------
       pexp2 |     17,919    86.96986    75.26336          0        484

We will be estimating a very simple returns to education equation and will ignore any endogeneity issues for the purposes of illustration.

Pooled OLS

Here are the commands to run pooled OLS in stata. After we run the model, we store the model estimates for doing hypothesis testing later.

regress ln_wage educ pexp pexp2 broken_home 
est store bpool

      Source |       SS           df       MS      Number of obs   =    17,919
-------------+----------------------------------   F(4, 17914)     =    928.58
       Model |  858.620393         4  214.655098   Prob > F        =    0.0000
    Residual |  4141.10561    17,914  .231165882   R-squared       =    0.1717
-------------+----------------------------------   Adj R-squared   =    0.1715
       Total |  4999.72601    17,918  .279033709   Root MSE        =     .4808

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0890684    .001939    45.93   0.000     .0852677    .0928691
        pexp |   .0907662   .0034375    26.40   0.000     .0840284     .097504
       pexp2 |   -.003046   .0001895   -16.07   0.000    -.0034175   -.0026745
 broken_home |  -.0561547   .0100131    -5.61   0.000    -.0757813    -.036528
       _cons |   .6822885   .0286842    23.79   0.000     .6260648    .7385122
------------------------------------------------------------------------------

In R, this is simply an OLS model and is easily run using

wage.ols = lm(ln_wage~educ+pexp+pexp2+broken_home)
summary(wage.ols)

Call:
lm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.53137 -0.27441  0.02012  0.31121  2.11371 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.6822885  0.0286842  23.786  < 2e-16 ***
educ         0.0890684  0.0019390  45.934  < 2e-16 ***
pexp         0.0907662  0.0034375  26.405  < 2e-16 ***
pexp2       -0.0030460  0.0001895 -16.071  < 2e-16 ***
broken_home -0.0561547  0.0100131  -5.608 2.08e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4808 on 17914 degrees of freedom
Multiple R-squared:  0.1717,	Adjusted R-squared:  0.1715 
F-statistic: 928.6 on 4 and 17914 DF,  p-value: < 2.2e-16

or we could do it using the plm package after setting up our panel dataset:

tk.panel.df = pdata.frame(tk.df, index = c("id","time"))
wage.pool = plm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.panel.df,model="pooling")
summary(wage.pool)
Pooling Model

Call:
plm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.panel.df, 
    model = "pooling")

Unbalanced Panel: n = 2178, T = 1-15, N = 17919

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-2.531374 -0.274411  0.020121  0.311211  2.113707 

Coefficients:
               Estimate  Std. Error  t-value  Pr(>|t|)    
(Intercept)  0.68228847  0.02868415  23.7863 < 2.2e-16 ***
educ         0.08906836  0.00193904  45.9343 < 2.2e-16 ***
pexp         0.09076619  0.00343747  26.4050 < 2.2e-16 ***
pexp2       -0.00304601  0.00018953 -16.0715 < 2.2e-16 ***
broken_home -0.05615469  0.01001310  -5.6081 2.076e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    4999.7
Residual Sum of Squares: 4141.1
R-Squared:      0.17173
Adj. R-Squared: 0.17155
F-statistic: 928.576 on 4 and 17914 DF, p-value: < 2.22e-16

Random Effects

In stata, we run the following code. Notice, we use xtset to inform stata of the panel data individual (id) and time (time) identifiers. Also, save the results for analysis later:

xtset id time
xtreg ln_wage educ pexp pexp2 broken_home , re
est store bre
       panel variable:  id (unbalanced)
        time variable:  time, 0 to 14, but with gaps
                delta:  1 unit

Random-effects GLS regression                   Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-sq:                                           Obs per group:
     within  = 0.2283                                         min =          1
     between = 0.1489                                         avg =        8.2
     overall = 0.1714                                         max =         15

                                                Wald chi2(4)      =    5026.37
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0901376   .0034376    26.22   0.000     .0834001    .0968751
        pexp |   .1027822   .0026045    39.46   0.000     .0976776    .1078868
       pexp2 |  -.0036384   .0001426   -25.52   0.000    -.0039179   -.0033589
 broken_home |  -.0646814   .0228582    -2.83   0.005    -.1094827   -.0198801
       _cons |   .5807788   .0441847    13.14   0.000     .4941784    .6673792
-------------+----------------------------------------------------------------
     sigma_u |  .37244396
     sigma_e |  .32867768
         rho |  .56218095   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Or, the Swamy-Aurora version of the random effects model (closest to what R uses):

xtreg ln_wage educ pexp pexp2 broken_home , sa th
est store bre_sa

Random-effects GLS regression                   Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-sq:                                           Obs per group:
     within  = 0.2283                                         min =          1
     between = 0.1492                                         avg =        8.2
     overall = 0.1714                                         max =         15

                                                Wald chi2(4)      =    5004.33
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------- theta --------------------
  min      5%       median        95%      max
0.3201   0.4517     0.6885     0.7595   0.7672

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0904542   .0033583    26.93   0.000     .0838721    .0970363
        pexp |   .1025766   .0026118    39.27   0.000     .0974575    .1076957
       pexp2 |  -.0036296   .0001431   -25.37   0.000      -.00391   -.0033492
 broken_home |  -.0642368    .022005    -2.92   0.004    -.1073659   -.0211077
       _cons |   .5782614   .0432297    13.38   0.000     .4935328      .66299
-------------+----------------------------------------------------------------
     sigma_u |  .35445648
     sigma_e |  .32867768
         rho |  .53768242   (fraction of variance due to u_i)
------------------------------------------------------------------------------

And in R the we have similar code:

wage.re = plm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.panel.df,model="random")
summary(wage.re)
Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.panel.df, 
    model = "random")

Unbalanced Panel: n = 2178, T = 1-15, N = 17919

Effects:
                 var std.dev share
idiosyncratic 0.1080  0.3287 0.469
individual    0.1223  0.3498 0.531
theta:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3152  0.6847  0.7151  0.7007  0.7478  0.7642 

Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.36051 -0.15586  0.03123  0.00648  0.19569  2.21809 

Coefficients:
               Estimate  Std. Error  t-value Pr(>|t|)    
(Intercept)  0.57764593  0.04297643  13.4410  < 2e-16 ***
educ         0.09053539  0.00333714  27.1296  < 2e-16 ***
pexp         0.10252019  0.00261397  39.2202  < 2e-16 ***
pexp2       -0.00362720  0.00014322 -25.3263  < 2e-16 ***
broken_home -0.06411881  0.02178336  -2.9435  0.00325 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    2693.3
Residual Sum of Squares: 1960.1
R-Squared:      0.27508
Adj. R-Squared: 0.27492
F-statistic: 1675.26 on 4 and 17914 DF, p-value: < 2.22e-16

Fixed Effects

In stata, the easiest model to run is the "between" estimator. Note, stata automatically drops covariates that do not vary within an individual's observations so that the model runs, but it leaves the variable in the regression results output:

xtreg ln_wage educ pexp pexp2 broken_home , fe
est store bfe
note: broken_home omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-sq:                                           Obs per group:
     within  = 0.2286                                         min =          1
     between = 0.1331                                         avg =        8.2
     overall = 0.1663                                         max =         15

                                                F(3,15738)        =    1554.76
corr(u_i, Xb)  = 0.0053                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0761987   .0059414    12.83   0.000     .0645529    .0878444
        pexp |   .1070331   .0027728    38.60   0.000      .101598    .1124681
       pexp2 |  -.0038188    .000149   -25.63   0.000    -.0041109   -.0035268
 broken_home |          0  (omitted)
       _cons |   .7679618   .0716835    10.71   0.000     .6274539    .9084696
-------------+----------------------------------------------------------------
     sigma_u |  .40417965
     sigma_e |  .32867768
         rho |   .6019421   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2177, 15738) = 10.41                Prob > F = 0.0000

In R we have the same result (although R plm surpresses the model constant in a way more in line with our class notes). R also automatically drops covariates not varying within a cross sectional unit and it drops it from the results (it doesn't tell you explicitly the way stata does).

wage.fe = plm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.panel.df,model="within")
summary(wage.fe)
Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.panel.df, 
    model = "within")

Unbalanced Panel: n = 2178, T = 1-15, N = 17919

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-2.452999 -0.145082  0.010925  0.167903  2.543757 

Coefficients:
         Estimate  Std. Error t-value  Pr(>|t|)    
educ   0.07619867  0.00594136  12.825 < 2.2e-16 ***
pexp   0.10703306  0.00277281  38.601 < 2.2e-16 ***
pexp2 -0.00381882  0.00014899 -25.632 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    2204
Residual Sum of Squares: 1700.2
R-Squared:      0.22862
Adj. R-Squared: 0.12176
F-statistic: 1554.76 on 3 and 15738 DF, p-value: < 2.22e-16

Hypothesis Testing and Model Comparison

We have various tests to run:

OLS (Pooled) versus Random Effects

In stata, right after running the random effects model, run

quietly: xtreg ln_wage educ pexp pexp2 broken_home , re
xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        ln_wage[id,t] = Xb + u[id] + e[id,t]

        Estimated results:
                         |       Var     sd = sqrt(Var)
                ---------+-----------------------------
                 ln_wage |   .2790337       .5282364
                       e |    .108029       .3286777
                       u |   .1387145        .372444

        Test:   Var(u) = 0
                             chibar2(01) = 19338.65
                          Prob > chibar2 =   0.0000

In R,

plmtest(wage.pool,type='bp')

	Lagrange Multiplier Test - (Breusch-Pagan) for unbalanced panels

data:  ln_wage ~ educ + pexp + pexp2 + broken_home
chisq = 19339, df = 1, p-value < 2.2e-16
alternative hypothesis: significant effects

OLS (Pooled) versus Fixed Effects

In stata, this is tested by default and included in the fe output (at the end):

------------------------------------------------------------------------------
F test that all u_i=0:     F(2177, 15738) =    10.41         Prob > F = 0.0000

In R, use this (note the slight difference in the F statistic (and degrees of freedom) due to stata using a model constant):

pFtest(wage.fe, wage.pool) 

	F test for individual effects

data:  ln_wage ~ educ + pexp + pexp2 + broken_home
F = 10.384, df1 = 2176, df2 = 15738, p-value < 2.2e-16
alternative hypothesis: significant effects

Random Effects versus Fixed Effects

In stata, install xtoverid and use this after the fixed effects regression:

xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re
Sargan-Hansen statistic  31.892  Chi-sq(3)    P-value = 0.0000

or, you can use the Hausman test explictly. Let's run Hausman on the Swamy-Arora version of random effects:

hausman bfe bre_sa

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |      bfe         bre_sa       Difference          S.E.
-------------+----------------------------------------------------------------
        educ |    .0761987     .0904542       -.0142555        .0049012
        pexp |    .1070331     .1025766        .0044565         .000931
       pexp2 |   -.0038188    -.0036296       -.0001892        .0000415
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       43.88
                Prob>chi2 =      0.0000

in R, we use the Hausman test:

phtest(wage.fe, wage.re)

	Hausman Test

data:  ln_wage ~ educ + pexp + pexp2 + broken_home
chisq = 49.388, df = 3, p-value = 1.079e-10
alternative hypothesis: one model is inconsistent

Note the stata output is more informative.

Endogeneity

IV regression on panel data (where x4 is endogenous and instrumented by z1) can be performed by

xtivreg y x1 x2 x3 (x4=z1), fe

and in R by

summary(plm(y~x1+x2+x3+x4 | . + x1+x2+x3+z1, data=panel.df,model="within"))

Note: Random Effects with instruments seemingly doesn't work in R (doesn't come close to matching stata's xtivreg command). Fixed effects (shown above), not verified.

Robust Standard Errors

In stata:

xtreg ln_wage educ pexp pexp2, fe robust

Fixed-effects (within) regression               Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-sq:                                           Obs per group:
     within  = 0.2286                                         min =          1
     between = 0.1331                                         avg =        8.2
     overall = 0.1663                                         max =         15

                                                F(3,2177)         =     535.99
corr(u_i, Xb)  = 0.0053                         Prob > F          =     0.0000

                                 (Std. Err. adjusted for 2,178 clusters in id)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0761987   .0099113     7.69   0.000     .0567621    .0956352
        pexp |   .1070331   .0042128    25.41   0.000     .0987716    .1152945
       pexp2 |  -.0038188   .0002193   -17.42   0.000    -.0042489   -.0033888
       _cons |   .7679618   .1208608     6.35   0.000     .5309472    1.004976
-------------+----------------------------------------------------------------
     sigma_u |  .40417965
     sigma_e |  .32867768
         rho |   .6019421   (fraction of variance due to u_i)
------------------------------------------------------------------------------

In R and noting that only the standard errors, t-stats, and p values change, we have very similar results. Note the vcovHC and coeftest functions are being drawn from the plm package and not sandwich:

coeftest(wage.fe,vcov=vcovHC(wage.fe,type="sss"))

t test of coefficients:

         Estimate  Std. Error  t value  Pr(>|t|)    
educ   0.07619867  0.00991101   7.6883 1.579e-14 ***
pexp   0.10703306  0.00421265  25.4076 < 2.2e-16 ***
pexp2 -0.00381882  0.00021928 -17.4155 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1