{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2016-03-02T07:49:55.087088", "start_time": "2016-03-02T07:49:55.073850" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "from scipy.stats import norm,uniform\n", "import matplotlib.pyplot as plt\n", "import seaborn as sbn\n", "from scipy.optimize import minimize\n", "import pandas as pd\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", " \n", "sbn.set_style('white')\n", "sbn.set_context('talk')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this workbook, I will briefly go over the theory of MCMC. We'll cover questions like why does MCMC work (why does it allow us to draw from the posterior)? and what conditions are necessary to get this result.\n", "\n", "## Asymptotics\n", "\n", "A decent starting point is to go back to our simple example using conjugate priors and analytical Bayes. Our data is distributed normal with mean $\\mu$ (unknown) and standard deviation equal to $\\sigma$ (known). We have priors describing our beliefs about the distribution of $\\mu$: distributed normal with mean $\\mu_0$ and standard deviation $\\sigma^2_0$. In this setting, our posterior can be solved for analytically as: \n", "\n", "$$\n", "N \\left (\\frac{\\sigma_0^2}{\\frac{\\sigma^2}{n} + \\sigma^2_0}\\frac{\\sum y_i}{n} + \\frac{\\frac{\\sigma^2}{n}}{\\frac{\\sigma^2}{n} + \\sigma^2_0} \\mu_0,\\left ( {\\frac{1}{\\sigma_0^2} + \\frac{n}{\\sigma^2}} \\right )^{-1} \\right ) \n", "$$\n", "\n", "How might we expect the posterior to behave as sample size get increasingly large? First examine the mean of the analytical posterior: \n", "\n", "$$\n", "\\lim_{n\\to \\infty} \\frac{\\sigma_0^2}{\\frac{\\sigma^2}{n} + \\sigma^2_0}\\frac{\\sum y_i}{n} + \\frac{\\frac{\\sigma^2}{n}}{\\frac{\\sigma^2}{n} + \\sigma^2_0} \\mu_0 \\to \\frac{\\sum y_i}{n}\n", "$$\n", "which means that the mean of the analytical posterior will converge on the Maximum Likelihood Estimate of the mean: the data completely dominates the prior.\n", "\n", "What about the variance? Doing a similar excercise shows that\n", "\n", "$$\n", "\\lim_{n\\to \\infty} \\left ( {\\frac{1}{\\sigma_0^2} + \\frac{n}{\\sigma^2}} \\right )^{-1} = \\lim_{n\\to \\infty} \\left ( \\frac{\\sigma^2_0 \\sigma^2}{(\\sigma^2 + n \\sigma^2_0)}\\right ) \\to 0\n", "$$\n", "\n", "So, the asymptotic properties of our Bayesian estimator, exactly parallels the frequentist worldview: our posterior collapses to the Maximum Likelihood Mean with zero variance for super huge sample sizes. \n", "\n", "We can also see this using numerical methods and revisit our example from 2 worksheets ago:\n", "\n", "1. Mean of data is unknown, variance is known ($\\sigma=3$)\n", "2. Priors $\\mu_0 = 8$, $\\sigma_0 = 2$\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2016-03-02T07:49:59.646410", "start_time": "2016-03-02T07:49:59.641277" } }, "outputs": [], "source": [ "# plot the MLE mean, and MAP estimate at different sample \n", "# sizes of data given below\n", "sample_size = [1,10,10**2,10**3,10**6]\n", "\n", "sigma = 3\n", "mu_0 = 8\n", "sigma_0 = 2" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2016-03-02T07:50:00.287112", "start_time": "2016-03-02T07:50:00.280967" } }, "outputs": [], "source": [ "def log_posterior_numerical(mu,data,sigma,mu_0,sigma_0):\n", " log_prior = norm(mu_0,sigma_0).logpdf(mu)\n", " log_like = norm(mu,sigma).logpdf(data).sum()\n", " return -1*(log_prior + log_like)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2016-03-02T07:50:08.910751", "start_time": "2016-03-02T07:50:00.945060" } }, "outputs": [], "source": [ "# for each sample size, generate a dataset and calculate\n", "# 1. MAP\n", "# 2. MLE (just the mean of the data)\n", "# and store results\n", "\n", "results = np.zeros((len(sample_size),4))\n", "\n", "count=0\n", "for i in sample_size:\n", " data = norm(10,3).rvs(i)\n", " # for this dataset, solve for map\n", " res = minimize(log_posterior_numerical,np.mean(data), \n", " args=(data,sigma,mu_0,sigma_0),options={'disp': False},\n", " method='BFGS')\n", " \n", " results[count,:] = [i,mu_0,res.x,np.mean(data)]\n", " count+=1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2016-03-02T07:50:08.918755", "start_time": "2016-03-02T07:50:08.912329" } }, "outputs": [], "source": [ "# convert to a pandas dataframe\n", "results=pd.DataFrame(results,columns=['Sample Size','Prior','MAP','MLE'])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2016-03-02T07:50:08.970270", "start_time": "2016-03-02T07:50:08.921221" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Sample Size | \n", "Prior | \n", "MAP | \n", "MLE | \n", "
---|---|---|---|---|
0 | \n", "1.0 | \n", "8.0 | \n", "9.019413 | \n", "11.313092 | \n", "
1 | \n", "10.0 | \n", "8.0 | \n", "8.071362 | \n", "8.087419 | \n", "
2 | \n", "100.0 | \n", "8.0 | \n", "9.999679 | \n", "10.044672 | \n", "
3 | \n", "1000.0 | \n", "8.0 | \n", "10.075359 | \n", "10.080028 | \n", "
4 | \n", "1000000.0 | \n", "8.0 | \n", "9.999080 | \n", "9.999085 | \n", "