{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this note, I'll explore the Ipython statsmodels package for estimating linear regression models (OLS). The goal is to completely map stata commands for reg into something implementable in Ipython.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#load python libraries for this ipython notebook:\n", "%matplotlib inline\n", "import pandas as pd # pandas data manipulation library\n", "import statsmodels.formula.api as smf # for the ols and robust ols model\n", "import statsmodels.api as sm\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note this is from the Tobias and Koop dataset on the returns to education. The full dataset is a panel. I have turned this into a cross section where time==4 (for demonstration purposes only)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# load data and create dataframe\n", "tobias_koop=pd.read_csv('http://rlhick.people.wm.edu/econ407/data/tobias_koop_t_4.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the pandas data frame created, we can easily run some models. First let's look at the data, run summary statistics, and plot the histogram of our dependent variable, ln_wage." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ideducln_wagepexptimeabilitymeducfeducbroken_homesiblingspexp2
0 4 12 2.14 2 4 0.26 12 10 1 4 4
1 6 15 1.91 4 4 0.44 12 16 0 2 16
2 8 13 2.32 8 4 0.51 12 15 1 2 64
3 11 14 1.64 1 4 1.82 16 17 1 2 1
4 12 13 2.16 6 4-1.30 13 12 0 5 36
\n", "
" ], "text/plain": [ " id educ ln_wage pexp time ability meduc feduc broken_home \\\n", "0 4 12 2.14 2 4 0.26 12 10 1 \n", "1 6 15 1.91 4 4 0.44 12 16 0 \n", "2 8 13 2.32 8 4 0.51 12 15 1 \n", "3 11 14 1.64 1 4 1.82 16 17 1 \n", "4 12 13 2.16 6 4 -1.30 13 12 0 \n", "\n", " siblings pexp2 \n", "0 4 4 \n", "1 2 16 \n", "2 2 64 \n", "3 2 1 \n", "4 5 36 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tobias_koop.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ideducln_wagepexptimeabilitymeducfeducbroken_homesiblingspexp2
count 1034.000000 1034.000000 1034.000000 1034.000000 1034 1034.000000 1034.000000 1034.000000 1034.000000 1034.000000 1034.000000
mean 1090.951644 12.274662 2.138259 4.815280 4 0.016596 11.403288 11.585106 0.169246 3.200193 27.979691
std 634.891728 1.566838 0.466280 2.190298 0 0.920963 3.027277 3.735833 0.375150 2.126575 22.598790
min 4.000000 9.000000 0.420000 0.000000 4 -3.140000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 537.250000 12.000000 1.820000 3.000000 4 -0.550000 11.000000 10.000000 0.000000 2.000000 9.000000
50% 1081.500000 12.000000 2.120000 5.000000 4 0.170000 12.000000 12.000000 0.000000 3.000000 25.000000
75% 1666.500000 13.000000 2.450000 6.000000 4 0.720000 12.000000 14.000000 0.000000 4.000000 36.000000
max 2177.000000 19.000000 3.590000 12.000000 4 1.890000 20.000000 20.000000 1.000000 15.000000 144.000000
\n", "
" ], "text/plain": [ " id educ ln_wage pexp time ability \\\n", "count 1034.000000 1034.000000 1034.000000 1034.000000 1034 1034.000000 \n", "mean 1090.951644 12.274662 2.138259 4.815280 4 0.016596 \n", "std 634.891728 1.566838 0.466280 2.190298 0 0.920963 \n", "min 4.000000 9.000000 0.420000 0.000000 4 -3.140000 \n", "25% 537.250000 12.000000 1.820000 3.000000 4 -0.550000 \n", "50% 1081.500000 12.000000 2.120000 5.000000 4 0.170000 \n", "75% 1666.500000 13.000000 2.450000 6.000000 4 0.720000 \n", "max 2177.000000 19.000000 3.590000 12.000000 4 1.890000 \n", "\n", " meduc feduc broken_home siblings pexp2 \n", "count 1034.000000 1034.000000 1034.000000 1034.000000 1034.000000 \n", "mean 11.403288 11.585106 0.169246 3.200193 27.979691 \n", "std 3.027277 3.735833 0.375150 2.126575 22.598790 \n", "min 0.000000 0.000000 0.000000 0.000000 0.000000 \n", "25% 11.000000 10.000000 0.000000 2.000000 9.000000 \n", "50% 12.000000 12.000000 0.000000 3.000000 25.000000 \n", "75% 12.000000 14.000000 0.000000 4.000000 36.000000 \n", "max 20.000000 20.000000 1.000000 15.000000 144.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tobias_koop.describe()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAEACAYAAABGYoqtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGaJJREFUeJzt3X+wZGV95/H3hQk/3MxwmVGIwyoXhhBTxnitRKO4iRdM\nlYlbwWg01m52N3e1Ym0tUQaDG5NQcUNSKSIUQV3Nmrg60UTRMiwlyIplmJNgZLIj6yWaYnfFMLA1\nI8siwowLxJ0w+8dzLtP03L736b79nOd877xfVV1zz+nT/Xzm6R9Pn+d7TjdIkiRJkiRJkiRJkiRJ\nkiRJas0AtwOHgIeA97brtgC3AI+31585cJvLgG8B+4HXdRlWktS9eeBk4PuAO4GfAa4EPgrMAu8G\nPthuuwM4AHw/8DLgm8CpHeeVJFVwBrAHeCmwBPxwu34r8GD792XAtQO3+QzwT7sKKEk61gkdtHEY\neIC0J7EHOAu4r73uYeCk9rIduH/gdve220qSKulikNgEzAE/CvzsiG1mOsghSRrTpo7auR+4gTRQ\n7CcNGncB24DvAn/frj974DbnArcO39H27duPHDhwoHBcSdpwvgGcN+6NSu5JbAdeApxCCvbzwH8j\n1RouJ9Uj3gnc2G5/M/BG4HzgAuDFwG3Dd3rgwAGOHDnS+8u73vWu6hk2Ss4IGc1pzr5fSAcHja3k\nnsSpwPuB55MOa/0j0t7EF4DrSUcy7QXe0G5/D3ANqW7xBHBJ+29I+/btqx0hS4ScETKCOafNnP1Q\ncpD4BmlvYNhB4NUjbnMtTz/CSZJUUReF6+PS4uJi7QhZIuSMkBHMOW3m7IeIRxUdaefXJEmZZmZm\nYIL3fPckCmmapnaELBFyRsgI5pw2c/aDg4QkaSSnmyTpOOB0kyRp6hwkCokyTxkhZ4SMYM5pM2c/\nOEhIkkayJiFJxwFrEpKkqXOQKCTKPGWEnBEygjmnzZz94CAhSRrJmoQkHQesSUiSps5BopAo85QR\nckbICOacNnP2g4OEJGkkaxKSdBywJiFJmjoHiUKizFNGyBkhI5hz2szZDw4SkqSRrElI0nHAmoQk\naeocJAqJMk8ZIed6M27ZspWZmZlOLlu2bJ3Of7qgCI85mLMvHCS04R069G3gSAeX3W1b0sZhTUIb\nXpqL7eo5M4PPT/XRpDWJTdOPIh3PNi2/GIvavPl0Dh58uHg7ktNNhUSZp4yQM0LGpAEO08XU1nqm\ntaL0pzn7wUFCkjRSyf3i7cAfAj8OfAf4PeC9wDXA2we2ez1wQ/v3ZcAVwBPAWwfWD7ImobF0XZPo\npi1rHxrPpDWJkoPEOcBLgM8BzwFuAy4EFoGvAB8f2n4HcDvwCuCZpAHiXODxoe0cJDQWBwmpnyfT\n3Qt8EngU+BqwBJzZXrdS0IuB64GvA3cAe4GLCuYrKso8ZYScETImTe0AWaL0pzn7oauaxA5gDvhS\nu/wB4CBwI/Csdt124P6B29wLnNVRPknSCro4T2Ib8HngUuCLpGmoh4ATgXcDJ5GmoK4G9gPXtbd7\nD/C3pLrGIKebNBanm6T+niexBbgZuJI0QEDaQ1j2PuCj7d/7gbMHrjsXuHWlO11cXGRubg6A2dlZ\n5ufnWVhYAI7u+rns8uDyUcvLC4WWl9eVuv/l5XapJ/3rcv+Wm6Zh165dAE+9X/bNqcBfAG8aWv8T\nwGnA6aS9hI+0688DDgDnAxcADwCnrHC/RyLYvXt37QhZIuRcb0bgCBzp4LK7w7Ymfx1EeMyPHDHn\ntDHhLm7JmsTLSYe/fgh4sr38AvBLwH3A3wFbgXe0299DOjx2D/Bp4BLSobCSpEr87iZteNYkpH4e\nAitJCs5BopBjC6b9FCFnhIxJUztAlij9ac5+cJCQJI1kTUIbnjUJyZqEJKkAB4lCosxTRsgZIWPS\n1A6QJUp/mrMfHCQkSSNZk9CGZ01CsiYhSSrAQaKQKPOUEXJGyJg0tQNkidKf5uwHBwlJ0kjWJLTh\nWZOQrElIkgpwkCgkyjxlhJwRMiZN7QBZovSnOfvBQUKSNJI1CW141iQkaxKSpAIcJAqJMk8ZIWeE\njElTO0CWKP1pzn5wkJAkjWRNQhueNQnJmoQkqQAHiUKizFNGyBkhY9LUDpAlSn+asx8cJCRJI1mT\n0IZnTUKyJiFJKsBBopAo85QRckbImDS1A2SJ0p/m7AcHCUnSSNYktOFZk5CsSUiSCig5SGwHbgYe\nBfYDb2vXbwFuAR4HbgfOHLjNZcC32u1fVzBbcVHmKSPkjJAxaWoHyBKlP83ZDyUHiZOBjwHPBV4F\nXAE8H7gceAh4NnAHcGW7/Q7gHcBLgdcD7wdOLZhPkrSGLmsSnweuAq4F/hXwN8BW4L8DZ5D2Ip4D\nvL3d/jPAB4HPDt2PNQmNxZqE1P+axA7gbNKew1nAfe36h4GT2st24P6B29zbbitJqmRTB21sAz4F\nvJlUh1jJWKPb4uIic3NzAMzOzjI/P8/CwgJwdH6w9vLyur7kGbV83XXX9bL/BpeXlpbYuXPnuu7v\nqOXlhQLLzcC6Evc/uNwuVerPLpaHX0u184xa7mt/Nk3Drl27AJ56v+yjLaS9h9cMrFsCXtj+vQ14\nsP17J/D7A9vdBLx6hfs8EsHu3btrR8gSIed6MwJH4EgHl90dtjX56yDCY37kiDmnjQnnQUvWJE4F\nPgf8MfDhgfVXAucAlwK/BpwGvAU4D/hL0kemZwI3AHPAE0P32/5/pTzWJKR+1iReDvw48CHgyfby\nz4FrSHsQB0hHMv1mu/097XV7gE8Dl3DsACFJ6lDJQeIL7f0PXj4OHCRNI51CGkQeGLjNtaQjnrYD\nf1YwW3HHzoX3U4ScETImTe0AWaL0pzn7wTOuJUkj+d1N2vCsSUj9rElIkoJzkCgkyjxlhJwRMiZN\n7QBZovSnOfuhi5PpJE3dpuXpg+I2bz6dgwcf7qQt9Y81CW14G7Um0eX/yddcfNYkJElT5yBRSJR5\nygg5I2RMmtoBMjW1A2SJ8rhHyTkpBwlJ0kjWJLThWZNYf1u+5uKzJiFJmjoHiUKizFNGyBkhY9LU\nDpCpqR0gS5THPUrOSTlISJJGsiahDc+axPrb8jUXnzUJSdLUOUgUEmWeMkLOCBmTpnaATE3tAFmi\nPO5Rck4qZ5B4DvAJ4O52eQFYLJRHktQjOfNTu4EPAJ8kDSr/mPRR5LxysVZlTUJjsSax/rZ8zcU3\naU0i5waHSL9J/QRpkHgG8CDwveM2NiUOEhqLg8T62/I1F1/JwvXfAD88sPyLwJfHbeh4E2WeMkLO\nCBmTpnaATE3tAFmiPO5Rck4q5/ck/g3wwfbvvyWNRD9XLJEkqTdydz1mgOeS9jz20d1+7kqcbtJY\nnG5af1u+5uIrOd00C/wu8B+Be4GXAK8etyFJUjw5g8QfAw8Br2qXvwX8XrFEG0SUecoIOSNkTJra\nATI1tQNkifK4R8k5qZxB4kLgfQPL+4Fzy8SRJPVJzvzUXcDPk06mOwF4LfDrwIsL5lqNNQmNxZrE\n+tvyNRffpDWJnKOb3gZ8rP37s8CLgNeP25AkKZ61pptOAM4CXglcBFwNPA/4UuFc4UWZp4yQM0LG\npKkdIFNTO0CWKI97lJyTWmuQeBL4D8A/kJ5ZDXBwjPv/BOns7K8OrLumvd/ly+sGrruMVBjfP7Re\nklRBzvzUO4BnA78PPDqwPmewuAB4HPgo8IJ23dXAV4CPD227A7gdeAXwTOAGUoH88aHtrEloLNYk\n1t+Wr7n4StYk3kx6Nv700PofzLjtl1j5iwBXCnoxcD3w9faylzTF9dmMdiRJBeQcAvs80oAwfFmP\nD5D2RG4EntWu2w7cP7DNvaR6SEhR5ikj5IyQMWlqB8jU1A6QJcrjHiXnpHL2JF7Dsfu1B4GvkU6y\nG9cHgCuBE4F3k6afFse5g8XFRebm5gCYnZ1lfn6ehYUF4OgDVnt5WV/yjFpeWlrqVZ6VlpeWlqb2\neBx9g1wotLy8rtT9Ly+zxvWrLS9N3F4fng99W57G87PEctM07Nq1C+Cp98tJ5MxP/SnpbOv/0m7/\nKuCvgB8BfoNUb1jNDtIewwtWuO4F7e1fBOwEziYVrwFuAv4AuGXoNtYkNBZrEutvy9dcfCW/u+m5\nwE8C/xL4F+3f20h7GL+dk21o+SeA04DTgbeSPtYA3Ay8ETifVPB+MXBbxv1LkgrJGSR+CPjmwPID\npD2Ar5GOQlrNF4H/CTyfdLjrpcAvAfcBfwdsJR09BXAP6fDYPcCngUtIP3QUUpR5ygg5I2RMmtoB\nMjW1A2SJ8rhHyTmpnJrER0hTPh8i7RX863bdOaz9Sf+fjJnn2vYiSeqB3PmpnwJe3v79V8DnysTJ\nYk1CY7Emsf62fM3FV/I8CUg/NDRLOo9hK6me8O1xG5MkxZJTk7iU9INDf9ouvxD4k2KJNogo85QR\nckbImDS1A2RqagfIEuVxj5JzUjl7Eu8kDQwPtMt/zfi1BklSQDnzU98EfgB4hLTnsYP0UeQ55WKt\nypqExmJNYv1t+ZqLr+R5Ev8JuKr9+w3Ap4APjtuQJCmenEHiCtL5DjcDryady/A7JUNtBFHmKSPk\njJAxaWoHyNTUDpAlyuMeJeekVqtJzJC+duOtpCObIE053VM6lCSpH1abn7oUeAvp5Lkvt+t+BPgw\n6cS695SNNpI1CY3FmsT62/I1F9+kNYnVbvBV0l5EM7T+FaRfq1vpC/u64CChsThIrL8tX3PxlShc\nn0f64Z9hd7LyDwlpQJR5ygg5I2RMmtoBMjW1A2SJ8rhHyTmp1QaJk4H/u8L677TXSZI2uNV2PZ4E\n/n7EdSeTd2RUCU43aSxON62/LV9z8ZX47qaTJk4jSdoQVtsbOLzGRauIMk9ZM+eWLVuZmZkpfulO\n02Fb69HUDpDF11A/1Joykjh06NukKZO1Lrsztxt1kTSpLj9mTYs1iQ2iu1qBNYn1tuVrLr6S390k\nSTpOOUgUEmWeMkbOpnaATE3tAJma2gGyxHhuxsk5KQcJSdJI1iRUjTWJCO0AfA9dHNC4efPpHDz4\ncPF2jlclvruprxwkNggHiQjtdNmWBfKSLFz3TJR5yhg5m9oBMjW1A2RqagfIEuO5GSfnpBwkJEkj\nOd2kapxuitBOl2053VSS002SpKlzkCgkyjxljJxN7QCZmtoBMjW1A2SJ8dyMk3NSDhKSpJFK1yQ+\nAbwS+N8c/bnTLcD1wIWk385+fXs9wGXAFcATpJ9OvWGF+7QmsUFYk4jQTpdtWZMoqa81ifcBrxpa\ndznwEPBs4A7gynb9DuAdwEtJA8f7gVML55MkraL0IPEl4NDQuouBa4BHgKuA1w6svx74Omnw2Atc\nVDhfMVHmKWPkbGoHyNTUDpCpqR0gS4znZpyck6pRkzgLuK/9+2HSL+CdBGwH7h/Y7t52W0lSJav9\nfGmXxponW1xcZG5uDoDZ2Vnm5+dZWFgAjo7qLuctL6+r137T/rvWMmtcX/r2OcuD60rc/+Aya1w/\n7duvt7285cHnx8LCQvXXR+7ysr7kWe67Xbt2ATz1fjmJLk6m2wHcyNHC9RLwi8BdwDbgbuAMYCdw\nNql4DXAT8AfALUP3Z+F6g7BwHaGdLtuycF1SXwvXcGyoz5CK11uBd5IGEICbgTcC5wMXAC8Gbusg\nXxFR5ilj5GxqB8jU1A6QqakdIEuM52acnJMqPd30RdIbPsCTpL2Fa0gF6gOk4vQb2uvvaa/bQzoE\n9pL2X0lSJX53k6pxuilCO1225XRTSX2ebpIkBeUgUUiUecoYOZvaATI1tQNkamoHyBLjuRkn56Qc\nJCRJI1mTUDXWJCK002Vb1iRKsiYhSZo6B4lCosxTxsjZ1A6QqakdIFNTO0CWGM/NODkn5SAhSRrJ\nmoSqsSYRoZ0u27ImUZI1CUnS1DlIFBJlnjJGzqZ2gExN7QCZmtoBssR4bsbJOSkHCUnSSNYkVI01\niQjtdNmWNYmSrElIkqbOQaKQKPOUMXI2tQNkamoHyNTUDpAlxnMzTs5JOUhIkkayJqFqrElEaKfL\ntqxJlGRNQpI0dQ4ShUSZp4yRs6kdIFNTO0CmpnaALDGem3FyTspBQpI0kjUJVWNNIkI7XbZlTaIk\naxKSpKlzkCgkyjxljJxN7QCZmtoBMjW1A2SJ8dyMk3NSDhKSpJGsSagaaxIR2umyLWsSJVmTkCRN\nnYNEIVHmKWPkbGoHyNTUDpCpqR0gS4znZpyck3KQkCSNZE1C1ViTiNBOl21ZkygpYk3iIeDJ9nKw\nXbcFuAV4HLgdOLNONEkS1B0k/l/b/gmkwQHgctLg8WzgDuDKOtHWL8o8ZYycTe0AmZraATI1tQNk\nifHcjJNzUn2rSVwMXAM8AlwFvLZuHEk6vtWsSTwInAR8B3gPcDXwf4DzgEfbbR4BnkXa61hmTWKD\nsCYRoZ0u27ImUdKkNYlN04+S7RXAPcAPADcCe0dsF7G4LkkbQs1B4u7236+RBokXAvuBOeAuYBvw\n3fbyNIuLi8zNzQEwOzvL/Pw8CwsLwNH5wdrLy+v6kmfU8nXXXVe1/47Oj6+2vATsHGP7lZZZ4/pp\nLDcD60rc/+Aya1y/2vIk/bme9vKXB58fw6+l4ev7sry0tMTOnTt7k2d5uWkadu3aBfDU++Ukan1K\nP4NUnP4fwPcD/xl4M/BK4BzgUuDXgNOAtwzdNsR0U9M0A2+E/VUzZ/50U8PRN5WJWspsZ70a4MKO\n2lrP/6lhvP6sM93ka2i6Jp1uqjVInA/8Gan+8CCpJnEt6Sin64GLSNNPbwAeGLptiEFCa7MmEaGd\nLtuyJlFStEFiPRwkNggHiQjtdNmWg0RJEU+m29CiHDsdI2dTO0CmpnaATE3tAFliPDfj5JyUg4Qk\naSSnm1SN000R2umyLaebSnK6SZI0dQ4ShUSZp4yRs6kdIFNTO0CmpnaALDGem3FyTspBQpI0kjUJ\nVWNNIkI7XbZlTaKkiN/dJEkDNi2/kRW3efPpHDz4cCdtRed0UyFR5ilj5GxqB8jU1A6QqakdYITD\npD2W5cvuoeXpXQ4d+vbUUsd4DU3OQUKSNJI1CVVjTSJCO1221e3/6Xh7H/E8CUnS1DlIFBJlnjJG\nzqZ2gExN7QCZmtoBMjW1A2SJ8RqanIOEJGkkaxKqxppEhHa6bMuaREnWJCRJU+cgUUiUecoYOZva\nATI1tQNkamoHyNTUDpAlxmtocg4SkqSRrEmoGmsSEdrpsi1rEiX53U2aii1btk71KwskxeZ0UyFR\n5imHc6YBosz35Rx7yU45/n+siqZ2gExN7QCZmtoBskR5rU/KQUKSNJI1CT1Nd3UC2Khz3f6f+t5O\naut4ex/xPAlJ0tQ5SBQSZZ4yRs6mdoBMTe0AmZraATI1tQNkifEampxHNwXhUUeSarAmMaG9e/fy\nW791LV1E2bz5FD75yV1sxHlh/099b6fLtqxJlOR5Eh3bs2cPt976MIcPLxZv66ST/m3xNqTjSze/\np70Rfku7j4PEhcCHgW3Ae4Er6sYZ7cQTz+fw4X824toGWJhKO5s2Xc53v/vIVO7rWA3TyllOQ/8z\nQpQ59Fj9uVDovpd/T3saGkblPHQo4mTN0/WtcD1DGiDeBjwP+DngZVUTTWypdoBMEXJGyAjmnDZz\n9kHfBol54BHgJuAA8EfA66ommlipT/7TFiFnhIxgzmkzZx/0bZA4C7hvYPnedp0kqYI+1iQG9XZC\n74QTTmBm5ia2bNm34vWPPfYVnvGMO6fS1mOPfWsq97OyfQXve1r21Q6QaV/tAJn21Q6QaV/tAJn2\n1Q5QVN/ehOeBjwAvapd/BTgT+HcD29wD7Og4lyRF9w3gvNoh1usE0hTTxaRppruBC6omkiT1ykWk\ngeIQ8LuVs0iSJEmK5kLSHsVB4HdWuP57gI8CjwF3kc6rqGGtnL8MPDlweXt30Z7yCeBB4Ksjru9L\nX66Vsw99uR24GXgU2E86p2dYH/ozJ2cf+nMGuJ00c/AQ6QTa4VppH/ozJ2cf+hPStP0eUt5hfejL\nqZghvfH+DOnJfjfHnlT3JuALwFbgrcCtXQZs5eS8BPj1jnMNu4B0MMCoN98+9CWsnbMPfXkO8Ebg\nNOCHSIPa84e26UN/5uTsQ39COmDlZOD7gDtJr6dBfehPWDtnX/rzl0kDwV+ucF1f+nLdXgR8ZWD5\n7cDVQ9vcSCpwA5xIGt3/UfloT5OT8xLgNzpLNNp5jH7z7UNfLlstZ1/6ctDnSXuTg/rUn8tWytm3\n/jyD9An4x4bW960/R+XsQ39uB/6c9GF1pT2JsfuybyfTLcs5qW5wm38g7VJvLx9tZAYYffLfrwLf\nAW6jn4eg9aEvc/WpL3cAc8AdQ+v71p+jckJ/+vMw8ADpE/pfD13Xp/5cLSfU789r2wxPjrh+7L7s\n6yAxrG/nc4yyUs4bSbv+ZwJ/AXyo00QbS5/6chvwKdLu+xMVc6xltZx96s9NpIHsR4HXVMyxltVy\n1u7PnyJ9R8iXifOeObF5nj6N8yvAu4e2uZGjD1KtXdCcnIM2A7V+OWgHq0831e7LZavlHFSzL7eQ\nPpWPejPrS3+ulXNQzf4c9KscewBIX/pz0Eo5B9Xoz9/m6YXzJ4H/OrRNH/tyIjkn1b2JNPe2jXTk\nxue7DNjKyfkyUsbvBf49sLvDfINWm+vvQ18uWy1nH/ryVNKnxDetsk0f+jMnZx/6czvwEuAU0mN/\nJ8d+qWcf+jMnZx/6c9mPsXJNog99OTUrnVR3NWkEh7Tb9zHqH8o1KufyV4lcRRqtD5EekHO7Dgh8\nkad/uriUfvblWjn70Jc/ybGf1n6B/vVnTs4+9OcOYC+pr/4X8Jvt+r7152o5+/RaX/ZSjh7d1Le+\nlCRJkiRJkiRJkiRJkiRJkiRJkiSpX/4/VF4hAvS1S6IAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tobias_koop.ln_wage.plot(kind='hist')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# OLS analysis\n", "\n", "This is the model we'll be estimating the vector \\\$$\\beta\\\$$ from\n", "\n", "$$\n", "y_i = \\beta_0 + \\beta_1 educ_i + \\beta_2 pexp_i + \\beta_3 pexp^2_i + \\beta_4 broken\\_home_i + \\epsilon_i\n", "$$\n", "\n", "For now, we will ignore any potential bias due to endogeneity, although we'll come back to that in a later post." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fitting the model in stata\n", "\n", "Here is the code for estimating the model in stata:\n", "\n", "\n", ". reg ln_wage educ pexp pexp2 broken_home\n", "\n", " Source | SS df MS Number of obs = 1034\n", "-------------+------------------------------ F( 4, 1029) = 51.36\n", " Model | 37.3778146 4 9.34445366 Prob > F = 0.0000\n", " Residual | 187.21445 1029 .181938241 R-squared = 0.1664\n", "-------------+------------------------------ Adj R-squared = 0.1632\n", " Total | 224.592265 1033 .217417488 Root MSE = .42654\n", "\n", "------------------------------------------------------------------------------\n", " ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " educ | .0852725 .0092897 9.18 0.000 .0670437 .1035014\n", " pexp | .2035214 .0235859 8.63 0.000 .1572395 .2498033\n", " pexp2 | -.0124126 .0022825 -5.44 0.000 -.0168916 -.0079336\n", " broken_home | -.0087254 .0357107 -0.24 0.807 -.0787995 .0613488\n", " _cons | .4603326 .137294 3.35 0.001 .1909243 .7297408\n", "------------------------------------------------------------------------------\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fitting the model Ipython with statsmodels\n", "\n", "We will estimate the same models as above using statsmodels. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: ln_wage R-squared: 0.166\n", "Model: OLS Adj. R-squared: 0.163\n", "Method: Least Squares F-statistic: 51.36\n", "Date: Thu, 05 Mar 2015 Prob (F-statistic): 1.83e-39\n", "Time: 09:18:38 Log-Likelihood: -583.66\n", "No. Observations: 1034 AIC: 1177.\n", "Df Residuals: 1029 BIC: 1202.\n", "Df Model: 4 \n", "Covariance Type: nonrobust \n", "===============================================================================\n", " coef std err t P>|t| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------\n", "Intercept 0.4603 0.137 3.353 0.001 0.191 0.730\n", "educ 0.0853 0.009 9.179 0.000 0.067 0.104\n", "pexp 0.2035 0.024 8.629 0.000 0.157 0.250\n", "pexp2 -0.0124 0.002 -5.438 0.000 -0.017 -0.008\n", "broken_home -0.0087 0.036 -0.244 0.807 -0.079 0.061\n", "==============================================================================\n", "Omnibus: 55.892 Durbin-Watson: 1.761\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 112.050\n", "Skew: -0.355 Prob(JB): 4.66e-25\n", "Kurtosis: 4.448 Cond. No. 391.\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "formula = \"ln_wage ~ educ + pexp + pexp2 + broken_home\"\n", "results = smf.ols(formula,tobias_koop).fit()\n", "print(results.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# OLS Regression with robust standard errors\n", "\n", "Recall that for robust standard errors, we first recover our OLS estimates ($\\mathbf{b}$) of $\\beta$. Using $\\mathbf{b}$, construct $\\mathbf{e} = \\mathbf{y - xb}$. From there, calculate the robust variance/covariance matrix of estimated parameters as\n", "$$\n", "Var(\\mathbf{b})_{robust} = \\mathbf{(x'x)^{-1}x'diag(ee')x(x'x)^{-1}}\\times \\frac{N}{N-K}\n", "$$\n", "for calculating standard errors.\n", "\n", "## Fitting the model in stata\n", "\n", "In stata, we would run this:\n", "\n", "\n", ". reg ln_wage educ pexp pexp2 broken_home, robust\n", "\n", "Linear regression Number of obs = 1034\n", " F( 4, 1029) = 64.82\n", " Prob > F = 0.0000\n", " R-squared = 0.1664\n", " Root MSE = .42654\n", "\n", "------------------------------------------------------------------------------\n", " | Robust\n", " ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " educ | .0852725 .0099294 8.59 0.000 .0657882 .1047568\n", " pexp | .2035214 .0226835 8.97 0.000 .1590101 .2480326\n", " pexp2 | -.0124126 .0022447 -5.53 0.000 -.0168173 -.0080079\n", " broken_home | -.0087254 .0312381 -0.28 0.780 -.070023 .0525723\n", " _cons | .4603326 .1315416 3.50 0.000 .2022121 .718453\n", "------------------------------------------------------------------------------\n", "\n", "\n", "## Fitting the model in Ipython\n", "\n", "In Ipython, we don't need to rerun the model. The default OLS command already includes a number of different types of robust standard errors (one of which using the method outlined above). All we need to do is create a new results instance that calls the covariance type we want:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: ln_wage R-squared: 0.166\n", "Model: OLS Adj. R-squared: 0.163\n", "Method: Least Squares F-statistic: 64.82\n", "Date: Thu, 05 Mar 2015 Prob (F-statistic): 6.41e-49\n", "Time: 09:18:39 Log-Likelihood: -583.66\n", "No. Observations: 1034 AIC: 1177.\n", "Df Residuals: 1029 BIC: 1202.\n", "Df Model: 4 \n", "Covariance Type: HC1 \n", "===============================================================================\n", " coef std err t P>|t| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------\n", "Intercept 0.4603 0.132 3.500 0.000 0.202 0.718\n", "educ 0.0853 0.010 8.588 0.000 0.066 0.105\n", "pexp 0.2035 0.023 8.972 0.000 0.159 0.248\n", "pexp2 -0.0124 0.002 -5.530 0.000 -0.017 -0.008\n", "broken_home -0.0087 0.031 -0.279 0.780 -0.070 0.053\n", "==============================================================================\n", "Omnibus: 55.892 Durbin-Watson: 1.761\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 112.050\n", "Skew: -0.355 Prob(JB): 4.66e-25\n", "Kurtosis: 4.448 Cond. No. 391.\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors are heteroscedasticity robust (HC1)\n" ] } ], "source": [ "results_robust = results.get_robustcov_results(cov_type='HC1')\n", "print(results_robust.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bootstrapped standard errors\n", "\n", "## Fitting the model in stata\n", "\n", "\n", "If instead, we want to bootstrap our standard errors, in stata we would do this:\n", "\n", "\n", ". bstrap: reg ln_wage educ pexp pexp2 broken_home\n", "(running regress on estimation sample)\n", "\n", "Bootstrap replications (50)\n", "----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 \n", ".................................................. 50\n", "\n", "Linear regression Number of obs = 1034\n", " Replications = 50\n", " Wald chi2(4) = 290.46\n", " Prob > chi2 = 0.0000\n", " R-squared = 0.1664\n", " Adj R-squared = 0.1632\n", " Root MSE = 0.4265\n", "\n", "------------------------------------------------------------------------------\n", " | Observed Bootstrap Normal-based\n", " ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]\n", "-------------+----------------------------------------------------------------\n", " educ | .0852725 .010622 8.03 0.000 .0644539 .1060912\n", " pexp | .2035214 .0217148 9.37 0.000 .1609611 .2460816\n", " pexp2 | -.0124126 .0021076 -5.89 0.000 -.0165433 -.0082818\n", " broken_home | -.0087254 .0330727 -0.26 0.792 -.0735466 .0560958\n", " _cons | .4603326 .1405716 3.27 0.001 .1848173 .7358479\n", "------------------------------------------------------------------------------\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to note that the above set of results implicitly assumes normality, since confidence intervals are symmetric about the mean (and since standard errors are reported). \n", "\n", "## Fitting the model in Ipython\n", "\n", "This isn't as straightforward in Ipython. We need to do a bit of wrangling to get the bootstrap working in Ipython. Write our own routine." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "R = 50\n", "\n", "results_boot = np.zeros((R,results.params.shape[0]))\n", "\n", "row_id = range(0,tobias_koop.shape[0])\n", "\n", "for r in range(R):\n", " this_sample = np.random.choice(row_id, size=tobias_koop.shape[0], replace=True) # gives sampled row numbers\n", " # Define data for this replicate: \n", " tobias_koop_r = tobias_koop.iloc[this_sample] \n", " # Estimate model\n", " results_r = smf.ols(formula,tobias_koop_r).fit(disp=0).params \n", " # Store in row r of results_boot:\n", " results_boot[r,:] = np.asarray(results_r) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array results_boot has all of our data. Put in pandas dataframe and summarize:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_Interceptb_educb_pexppexp2b_broken_home
count 50.000000 50.000000 50.000000 50.000000 50.000000
mean 0.462494 0.084486 0.208471 -0.012965 -0.017659
std 0.160763 0.011046 0.024721 0.002473 0.036356
min 0.004603 0.056518 0.148656 -0.019397 -0.102527
2.5% 0.126781 0.064798 0.157471 -0.018137 -0.100246
50% 0.462583 0.084843 0.209224 -0.013365 -0.010556
97.5% 0.783207 0.105074 0.260848 -0.008159 0.054402
max 0.801684 0.118317 0.273543 -0.007170 0.066333
\n", "
" ], "text/plain": [ " b_Intercept b_educ b_pexp pexp2 b_broken_home\n", "count 50.000000 50.000000 50.000000 50.000000 50.000000\n", "mean 0.462494 0.084486 0.208471 -0.012965 -0.017659\n", "std 0.160763 0.011046 0.024721 0.002473 0.036356\n", "min 0.004603 0.056518 0.148656 -0.019397 -0.102527\n", "2.5% 0.126781 0.064798 0.157471 -0.018137 -0.100246\n", "50% 0.462583 0.084843 0.209224 -0.013365 -0.010556\n", "97.5% 0.783207 0.105074 0.260848 -0.008159 0.054402\n", "max 0.801684 0.118317 0.273543 -0.007170 0.066333" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert results to pandas dataframe for easier analysis:\n", "results_boot = pd.DataFrame(results_boot,columns=['b_Intercept','b_educ','b_pexp','pexp2','b_broken_home'])\n", "results_boot.describe(percentiles=[.025,.975])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above set of results, the standard deviation (std) can be interpreted as the parameter standard error if we are willing to impose normality. Alternatively, we could construct 95% confidence intervals by looking at the 2.5% and 97.5% percentiles." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 0 }