{ "metadata": { "name": "", "signature": "sha256:41aa6133b672823c09644e4a2767026bc1b628a64403f6be3e1051d07c264c28" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Workshop on Bootstrapping and Bayesian Econometrics\n", "--------------------------------------------------\n", "\n", "The goal of this workshop is to\n", "1. Explore bootstrapping\n", " * See the implementation steps\n", " * Show subtleties of reported results\n", "2. Introduce Bayesian Econometrics\n", " * Brief discussion of Bayes Rule\n", " * Markov Chains and the Metropolis-Hastings Sampler\n", " * Simple example estimating mean and standard deviation\n", " * Application to Tobias and Koop\n", " * Heirarchical Models\n", " * Show you three different samplers" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#load python libraries for this ipython notebook:\n", "import numpy as np # numpy linear algebra library\n", "import pandas as pd # pandas data manipulation library\n", "import scipy.stats as scipy # access to stats libraries \n", "import emcee as emcee # a bayesian library\n", "#load pymc3 libraries (another Bayesian Library)\n", "import imp\n", "pm = imp.load_module(\"pymc3\", *imp.find_module(\"pymc\", [\"/home/robhicks/Dropbox/notebooks/pymc3/pymc\"]))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Estimation methods in Econometrics\n", "\n", "In class, we looked at these estimation methods:\n", "1. Linear Model \n", "2. Maximum Likelihood\n", "3. Generalized Method of Moments\n", "\n", "Leaving these as major estimation paradigms you haven't seen:\n", "4. Bayesian Estimation\n", "5. ~~Non-parametric Estimation~~\n", "\n", "In what follows, we will replicate our Bootstrapping work in Ipython (so we can compare to Bayesian Methods) and then move on to Bayesian Estimation.\n", "\n", "Note: the appeal of Bayesian modeling is to tackle computationally challenging models and the implementation of an OLS model is fairly trivial, but gives us a point of comparison since we know the OLS model very well. At the end of these notes, we show how Bayesian Econometrics opens the door to very rich models, that are difficult to replicate using frequentist methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Load Tobias and Koop data for time==4" ] }, { "cell_type": "code", "collapsed": false, "input": [ "tobias_koop=pd.read_csv('http://rlhick.people.wm.edu/econ407/data/tobias_koop_t_4.csv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "tobias_koop.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | id | \n", "educ | \n", "ln_wage | \n", "pexp | \n", "time | \n", "ability | \n", "meduc | \n", "feduc | \n", "broken_home | \n", "siblings | \n", "pexp2 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "4 | \n", "12 | \n", "2.14 | \n", "2 | \n", "4 | \n", "0.26 | \n", "12 | \n", "10 | \n", "1 | \n", "4 | \n", "4 | \n", "
1 | \n", "6 | \n", "15 | \n", "1.91 | \n", "4 | \n", "4 | \n", "0.44 | \n", "12 | \n", "16 | \n", "0 | \n", "2 | \n", "16 | \n", "
2 | \n", "8 | \n", "13 | \n", "2.32 | \n", "8 | \n", "4 | \n", "0.51 | \n", "12 | \n", "15 | \n", "1 | \n", "2 | \n", "64 | \n", "
3 | \n", "11 | \n", "14 | \n", "1.64 | \n", "1 | \n", "4 | \n", "1.82 | \n", "16 | \n", "17 | \n", "1 | \n", "2 | \n", "1 | \n", "
4 | \n", "12 | \n", "13 | \n", "2.16 | \n", "6 | \n", "4 | \n", "-1.30 | \n", "13 | \n", "12 | \n", "0 | \n", "5 | \n", "36 | \n", "
\n", " | id | \n", "educ | \n", "ln_wage | \n", "pexp | \n", "time | \n", "ability | \n", "meduc | \n", "feduc | \n", "broken_home | \n", "siblings | \n", "pexp2 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "1034.000000 | \n", "1034.000000 | \n", "1034.000000 | \n", "1034.000000 | \n", "1034 | \n", "1034.000000 | \n", "1034.000000 | \n", "1034.000000 | \n", "1034.000000 | \n", "1034.000000 | \n", "1034.000000 | \n", "
mean | \n", "1090.951644 | \n", "12.274662 | \n", "2.138259 | \n", "4.815280 | \n", "4 | \n", "0.016596 | \n", "11.403288 | \n", "11.585106 | \n", "0.169246 | \n", "3.200193 | \n", "27.979691 | \n", "
std | \n", "634.891728 | \n", "1.566838 | \n", "0.466280 | \n", "2.190298 | \n", "0 | \n", "0.920963 | \n", "3.027277 | \n", "3.735833 | \n", "0.375150 | \n", "2.126575 | \n", "22.598790 | \n", "
min | \n", "4.000000 | \n", "9.000000 | \n", "0.420000 | \n", "0.000000 | \n", "4 | \n", "-3.140000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
25% | \n", "537.250000 | \n", "12.000000 | \n", "1.820000 | \n", "3.000000 | \n", "4 | \n", "-0.550000 | \n", "11.000000 | \n", "10.000000 | \n", "0.000000 | \n", "2.000000 | \n", "9.000000 | \n", "
50% | \n", "1081.500000 | \n", "12.000000 | \n", "2.120000 | \n", "5.000000 | \n", "4 | \n", "0.170000 | \n", "12.000000 | \n", "12.000000 | \n", "0.000000 | \n", "3.000000 | \n", "25.000000 | \n", "
75% | \n", "1666.500000 | \n", "13.000000 | \n", "2.450000 | \n", "6.000000 | \n", "4 | \n", "0.720000 | \n", "12.000000 | \n", "14.000000 | \n", "0.000000 | \n", "4.000000 | \n", "36.000000 | \n", "
max | \n", "2177.000000 | \n", "19.000000 | \n", "3.590000 | \n", "12.000000 | \n", "4 | \n", "1.890000 | \n", "20.000000 | \n", "20.000000 | \n", "1.000000 | \n", "15.000000 | \n", "144.000000 | \n", "
\n", " | b_educ | \n", "b_pexp | \n", "pexp2 | \n", "b_broken_home | \n", "b_Intercept | \n", "
---|---|---|---|---|---|
0 | \n", "0.086845 | \n", "0.209767 | \n", "-0.013187 | \n", "0.042530 | \n", "0.424621 | \n", "
1 | \n", "0.055819 | \n", "0.151736 | \n", "-0.007761 | \n", "-0.058262 | \n", "0.931521 | \n", "
2 | \n", "0.075879 | \n", "0.137397 | \n", "-0.006732 | \n", "-0.010441 | \n", "0.719790 | \n", "
3 | \n", "0.073911 | \n", "0.205536 | \n", "-0.013938 | \n", "-0.106334 | \n", "0.662189 | \n", "
4 | \n", "0.082670 | \n", "0.209667 | \n", "-0.012387 | \n", "-0.071467 | \n", "0.430717 | \n", "
5 | \n", "0.079207 | \n", "0.218031 | \n", "-0.014434 | \n", "0.026968 | \n", "0.548299 | \n", "
6 | \n", "0.110818 | \n", "0.183666 | \n", "-0.009843 | \n", "0.012264 | \n", "0.156372 | \n", "
7 | \n", "0.098079 | \n", "0.211655 | \n", "-0.013272 | \n", "0.005513 | \n", "0.276762 | \n", "
8 | \n", "0.060890 | \n", "0.220493 | \n", "-0.016071 | \n", "-0.115970 | \n", "0.799638 | \n", "
9 | \n", "0.102324 | \n", "0.198275 | \n", "-0.011950 | \n", "0.064837 | \n", "0.212105 | \n", "
\n", " | b_educ | \n", "b_pexp | \n", "pexp2 | \n", "b_broken_home | \n", "b_Intercept | \n", "
---|---|---|---|---|---|
count | \n", "400.000000 | \n", "400.000000 | \n", "400.000000 | \n", "400.000000 | \n", "400.000000 | \n", "
mean | \n", "0.084648 | \n", "0.205918 | \n", "-0.012596 | \n", "-0.014049 | \n", "0.462193 | \n", "
std | \n", "0.015267 | \n", "0.036862 | \n", "0.003708 | \n", "0.051261 | \n", "0.209663 | \n", "
min | \n", "0.034552 | \n", "0.045842 | \n", "-0.023124 | \n", "-0.156739 | \n", "-0.213324 | \n", "
0.5% | \n", "0.044082 | \n", "0.099568 | \n", "-0.021045 | \n", "-0.134710 | \n", "-0.071113 | \n", "
2.5% | \n", "0.053761 | \n", "0.129801 | \n", "-0.019293 | \n", "-0.117726 | \n", "0.051967 | \n", "
5% | \n", "0.059072 | \n", "0.145110 | \n", "-0.018207 | \n", "-0.096306 | \n", "0.111766 | \n", "
50% | \n", "0.084729 | \n", "0.207228 | \n", "-0.012590 | \n", "-0.014617 | \n", "0.454001 | \n", "
95% | \n", "0.110858 | \n", "0.262188 | \n", "-0.006414 | \n", "0.073874 | \n", "0.824146 | \n", "
97.5% | \n", "0.115970 | \n", "0.276534 | \n", "-0.004944 | \n", "0.094848 | \n", "0.891660 | \n", "
99.5% | \n", "0.123583 | \n", "0.295535 | \n", "-0.001668 | \n", "0.126296 | \n", "0.962410 | \n", "
max | \n", "0.126530 | \n", "0.308454 | \n", "0.003527 | \n", "0.132844 | \n", "1.098337 | \n", "
\n", " | age | \n", "
---|---|
count | \n", "100.000000 | \n", "
mean | \n", "41.784593 | \n", "
std | \n", "12.949731 | \n", "
min | \n", "4.870258 | \n", "
25% | \n", "33.110703 | \n", "
50% | \n", "40.404375 | \n", "
75% | \n", "48.116807 | \n", "
max | \n", "74.027367 | \n", "
\n", " | mean | \n", "std | \n", "prob | \n", "
---|---|---|---|
count | \n", "9000.000000 | \n", "9000.000000 | \n", "9000.000000 | \n", "
mean | \n", "41.784150 | \n", "12.886972 | \n", "-39750.865206 | \n", "
std | \n", "0.128627 | \n", "0.089576 | \n", "1.009329 | \n", "
min | \n", "41.354163 | \n", "12.520079 | \n", "-39758.391179 | \n", "
0.5% | \n", "41.444949 | \n", "12.653458 | \n", "-39755.368671 | \n", "
2.5% | \n", "41.529483 | \n", "12.711870 | \n", "-39753.642675 | \n", "
5% | \n", "41.571783 | \n", "12.742053 | \n", "-39752.872637 | \n", "
50% | \n", "41.784452 | \n", "12.885293 | \n", "-39750.527944 | \n", "
95% | \n", "41.993022 | \n", "13.034135 | \n", "-39749.938687 | \n", "
97.5% | \n", "42.031078 | \n", "13.066558 | \n", "-39749.913259 | \n", "
99.5% | \n", "42.121277 | \n", "13.122512 | \n", "-39749.889520 | \n", "
max | \n", "42.236743 | \n", "13.208737 | \n", "-39749.884224 | \n", "
\n", " | educ | \n", "pexp | \n", "pexp2 | \n", "broken_home | \n", "intercept | \n", "sigma | \n", "
---|---|---|---|---|---|---|
count | \n", "400000.000000 | \n", "400000.000000 | \n", "400000.000000 | \n", "400000.000000 | \n", "400000.000000 | \n", "400000.000000 | \n", "
mean | \n", "0.085315 | \n", "0.204037 | \n", "-0.012465 | \n", "-0.008680 | \n", "0.458503 | \n", "0.426964 | \n", "
std | \n", "0.009377 | \n", "0.023304 | \n", "0.002262 | \n", "0.035665 | \n", "0.137912 | \n", "0.009571 | \n", "
min | \n", "0.042974 | \n", "0.096317 | \n", "-0.021915 | \n", "-0.153405 | \n", "-0.107116 | \n", "0.389304 | \n", "
0.5% | \n", "0.060932 | \n", "0.143789 | \n", "-0.018341 | \n", "-0.100915 | \n", "0.109621 | \n", "0.403358 | \n", "
2.5% | \n", "0.066856 | \n", "0.158400 | \n", "-0.016927 | \n", "-0.078533 | \n", "0.190340 | \n", "0.408892 | \n", "
5% | \n", "0.069787 | \n", "0.165834 | \n", "-0.016207 | \n", "-0.067292 | \n", "0.234387 | \n", "0.411656 | \n", "
50% | \n", "0.085413 | \n", "0.203957 | \n", "-0.012448 | \n", "-0.008594 | \n", "0.456475 | \n", "0.426738 | \n", "
95% | \n", "0.100624 | \n", "0.242422 | \n", "-0.008772 | \n", "0.050141 | \n", "0.687206 | \n", "0.443015 | \n", "
97.5% | \n", "0.103675 | \n", "0.249944 | \n", "-0.008073 | \n", "0.061648 | \n", "0.731061 | \n", "0.446241 | \n", "
99.5% | \n", "0.109362 | \n", "0.264178 | \n", "-0.006653 | \n", "0.084702 | \n", "0.818035 | \n", "0.452690 | \n", "
max | \n", "0.124412 | \n", "0.308641 | \n", "-0.002592 | \n", "0.155374 | \n", "1.099135 | \n", "0.474191 | \n", "
\n", " | county | \n", "log_radon | \n", "floor | \n", "
---|---|---|---|
0 | \n", "AITKIN | \n", "0.832909 | \n", "1 | \n", "
1 | \n", "AITKIN | \n", "0.832909 | \n", "0 | \n", "
2 | \n", "AITKIN | \n", "1.098612 | \n", "0 | \n", "
3 | \n", "AITKIN | \n", "0.095310 | \n", "0 | \n", "
4 | \n", "ANOKA | \n", "1.163151 | \n", "0 | \n", "