{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Introduction to Bayesian Econometrics\n", "\n", "This course will cover the application of Bayesian statistical methods for econometric inference. Broadly speaking we will\n", "\n", "1. Briefly discuss sampling methods for classical statistics\n", "2. Introduce Bayes Rule and Provide an Application \n", "3. Examine the use of Monte Carlo Markov Chains\n", " * Link to Bayes Rule\n", " * Metropolis-Hastings and other Samplers\n", " * Chain \"convergence\" and diagnostics\n", "4. Application: OLS, Time-Series Econometrics, Heirarchical Models" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The **frequentist paradigm**, arguably the dominant statistical paradigm in the social sciences (and what you all have studied), relies on the following notions:\n", "\n", "* $\\beta$ is not random but neither is it known. It is a fixed quantity.\n", "* To uncover information about $\\beta$, we observe part of some process (e.g. $\\mathbf{y=x\\beta+\\epsilon}$).\n", "* For statistical inference, we rely on **repeated trials** of $\\mathbf{y}$ and $\\mathbf{x}$, even if this repetition rarely (if ever) occurs in the social science context.\n", "* $\\mathbf{y}$ and $\\mathbf{x}$ are considered random\n", "* The model typically attempts to uncover information about $\\mathbf{\\beta}$ by examining the likelihood function\n", "$$\n", "prob(\\mathbf{y}|\\mathbf{b},\\mathbf{x})\n", "$$\n", "where $\\mathbf{b}$ are our estimates of $\\beta$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The **bayesian paradigm** tackles the issue of estimating $\\beta$ by\n", "\n", "* Treating $\\beta$ as random and unknown\n", "* Treating $\\mathbf{y}$ and $\\mathbf{x}$ as fixed and non-random (at least once they are recorded in your dataset)\n", "* Uncovers information about $\\mathbf{\\beta}$ by examining the posterior likelihood\n", "$$\n", "prob(\\mathbf{b}|\\mathbf{y},\\mathbf{x})\n", "$$\n", "where $\\mathbf{b}$ are our estimates of $\\beta$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "In quite a lot of instances, these two approaches give you the same estimate for $\\beta$.\n", "Until recently, Bayesian Statistical modeling wasn't used because calculating the posterior likelihood was computationally challenging, but recent advances in the theory and construction of Monte Carlo Markov Chains and computational ability has really opened the door for Bayesian analysis for problems that might not be estimated using the frequentist paradigm (ie. Maximim Likelihood). There is an **ongoing holy war** in the two statistical camps, during the semester I will attempt to highlite the pros and cons of each paradigm without taking a position on which one is better. My philosophy is that if it gets the job done, use it while being aware of limitations and advantages." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Repeated trials in frequentist statistics\n", "A good jumping off point for this course is to understand the use of sampling techniques in a classical statistical paradigm. \n", "- Underlying all statistical inference that you have learned in statistics and econometrics is the idea of **repeated trials**. Bootstrapping highlites this really well. \n", "- We will begin with an exploration of bootstrapping and see the implementation steps." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "#load python libraries for this ipython notebook:\n", "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sbn\n", "import statsmodels.formula.api as smf\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "plt.style.use('ggplot')\n", "\n", "np.random.seed(12578)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Load Tobias and Koop data for time==4" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "tobias_koop=pd.read_csv('https://rlhick.people.wm.edu/econ407/data/tobias_koop_t_4.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ideducln_wagepexptimeabilitymeducfeducbroken_homesiblingspexp2
04122.14240.261210144
16151.91440.4412160216
28132.32840.5112151264
311141.64141.821617121
412132.1664-1.3013120536
\n", "
" ], "text/plain": [ " id educ ln_wage pexp time ability meduc feduc broken_home \\\n", "0 4 12 2.14 2 4 0.26 12 10 1 \n", "1 6 15 1.91 4 4 0.44 12 16 0 \n", "2 8 13 2.32 8 4 0.51 12 15 1 \n", "3 11 14 1.64 1 4 1.82 16 17 1 \n", "4 12 13 2.16 6 4 -1.30 13 12 0 \n", "\n", " siblings pexp2 \n", "0 4 4 \n", "1 2 16 \n", "2 2 64 \n", "3 2 1 \n", "4 5 36 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tobias_koop.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ideducln_wagepexptimeabilitymeducfeducbroken_homesiblingspexp2
count1034.0000001034.0000001034.0000001034.0000001034.01034.0000001034.0000001034.0000001034.0000001034.0000001034.000000
mean1090.95164412.2746622.1382594.8152804.00.01659611.40328811.5851060.1692463.20019327.979691
std634.8917281.5668380.4662802.1902980.00.9209633.0272773.7358330.3751502.12657522.598790
min4.0000009.0000000.4200000.0000004.0-3.1400000.0000000.0000000.0000000.0000000.000000
25%537.25000012.0000001.8200003.0000004.0-0.55000011.00000010.0000000.0000002.0000009.000000
50%1081.50000012.0000002.1200005.0000004.00.17000012.00000012.0000000.0000003.00000025.000000
75%1666.50000013.0000002.4500006.0000004.00.72000012.00000014.0000000.0000004.00000036.000000
max2177.00000019.0000003.59000012.0000004.01.89000020.00000020.0000001.00000015.000000144.000000
\n", "
" ], "text/plain": [ " id educ ln_wage pexp time \\\n", "count 1034.000000 1034.000000 1034.000000 1034.000000 1034.0 \n", "mean 1090.951644 12.274662 2.138259 4.815280 4.0 \n", "std 634.891728 1.566838 0.466280 2.190298 0.0 \n", "min 4.000000 9.000000 0.420000 0.000000 4.0 \n", "25% 537.250000 12.000000 1.820000 3.000000 4.0 \n", "50% 1081.500000 12.000000 2.120000 5.000000 4.0 \n", "75% 1666.500000 13.000000 2.450000 6.000000 4.0 \n", "max 2177.000000 19.000000 3.590000 12.000000 4.0 \n", "\n", " ability meduc feduc broken_home siblings \\\n", "count 1034.000000 1034.000000 1034.000000 1034.000000 1034.000000 \n", "mean 0.016596 11.403288 11.585106 0.169246 3.200193 \n", "std 0.920963 3.027277 3.735833 0.375150 2.126575 \n", "min -3.140000 0.000000 0.000000 0.000000 0.000000 \n", "25% -0.550000 11.000000 10.000000 0.000000 2.000000 \n", "50% 0.170000 12.000000 12.000000 0.000000 3.000000 \n", "75% 0.720000 12.000000 14.000000 0.000000 4.000000 \n", "max 1.890000 20.000000 20.000000 1.000000 15.000000 \n", "\n", " pexp2 \n", "count 1034.000000 \n", "mean 27.979691 \n", "std 22.598790 \n", "min 0.000000 \n", "25% 9.000000 \n", "50% 25.000000 \n", "75% 36.000000 \n", "max 144.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tobias_koop.describe()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "