Introduction to PyMC3

In [2]:
%matplotlib inline
import re as re
import pandas as pd
import numpy as np
import seaborn as sbn
from scipy.stats import norm
import matplotlib.pyplot as plt
from __future__ import division

sbn.set_style('white')
sbn.set_context('talk')

np.random.seed(12345678)

Note: these are my personal opinions and I'm sure that lots of people would disagree with my assessment of available software packages.

We have been working on getting under the hood of Metropolis-Hastings and Monte Carlo Markov Chain. But there are some very accessible Metropolis Markov Chain software packages out there. These are my personal views on what is available:

  1. Stata (bayesianmh)- just available in Stata 14. To be honest, I have never heard of anyone using Stata for Bayesian Sampling Methods although I haven't really looked.
  2. Matlab. There are numerous user and Matlab written routines. For example, there is a version of emcee that is implemented there (more on this later in the course). Matlab is for people who want to possibly tweak their own sampler code and who need the fastest possible computation.
  3. SAS. A decent set of packages, not used very much in the social sciences (maybe in the medical sciences??), but is well suited for big data.
  4. R. For a long time R was the de-facto standard and probably still is, with packages like R-stan which interfaces the STAN C Library. Also Jags, Rbugs and Open BUGS have been around for a long time. I almost elected to teach this course in R.
  5. Python. The libraries aren't as rich as R, but you might view Python as sitting between Matlab and R. Maybe (in my view probably) faster and also easier to code your own samplers than R. Python is the de facto language for many applied physicists, and they need speed and the latest samplers. Additionally Python is strong in the big-data environment, more so than any of the other packages mentioned above. The major players on python are:
    • PyMC and PyMC3 (in beta)
    • PyStan
    • EMCEE

Today, we are going to focus on PyMC3, which is a very easy to use package now that we have a solid understanding of how posteriors are constructed.

We will start with our very simple one parameter model and then move to slightly more complicated settings:

In [3]:
sigma = 3 # Note this is the std of our data
data = norm(10,sigma).rvs(100)
mu_prior = 8
sigma_prior = 1.5  # Note this is our prior on the std of mu

plt.hist(data,bins=20)
plt.show()

PyMC3

Let's use PyMC3 to model this:

In [4]:
import pymc3 as pm3

This is the model statement describing priors and the likelihood. Here, mu is defined as a stochastic variable (we want a chain of sampled values for this variable) and we provide a prior distribution and hyper-parameters for it. The likelihood function is chosen to be Normal, with one parameter to be estimated (mu), and we use known $\sigma$ (denoted as sigma). Our "dependent variable" is given by observed=data, where data is generated above and shown in the histogram. In more formal terms, the code below sets up a basic_model having the following form:

\begin{align} Prob(\mu|\sigma,\mathbf{data}) \propto& Prob(\mathbf{data}|\mu,\sigma) \times Prob(\mu | \mu^0_\mu, \sigma^0_\mu) \\ Prob(\mu|\sigma,\mathbf{data}) \propto& \phi \left( \frac{\mathbf{data} - \mu}{\sigma} \right ) \times \phi \left(\frac{\mu - \mu^0_\mu}{\sigma^0_\mu} \right ) \\ Prob(\mu|\sigma,\mathbf{data}) \propto& \phi \left( \frac{\mathbf{data} - \mu}{3} \right ) \times \phi \left(\frac{\mu - 8}{1.5} \right ) \end{align}

where $\phi(.)$ is the standard normal pdf.

In [5]:
basic_model = pm3.Model()

with basic_model:

    # Priors for unknown model parameters
    mu = pm3.Normal('Mean of Data',mu_prior,sigma_prior)
    
    # Likelihood (sampling distribution) of observations
    data_in = pm3.Normal('Y_obs', mu=mu, sd=sigma, observed=data)

This defines how the chain will be constructed. We will set startvals at MAP and use a Metropolis step method (which is Random Walk MH that we have studied).

In [6]:
chain_length = 10000 

with basic_model:
    # obtain starting values via MAP
    startvals = pm3.find_MAP(model=basic_model)
    print startvals
    # instantiate sampler
    step = pm3.Metropolis() 

    # draw 5000 posterior samples
    trace = pm3.sample(chain_length, step=step, start=startvals) 
{'Mean of Data': array(10.470294094378682)}
 [-----------------100%-----------------] 10000 of 10000 complete in 1.2 sec
In [8]:
pm3.traceplot(trace,figsize=(20,5));
In [9]:
pm3.summary(trace)
Mean of Data:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  10.469           0.272            0.006            [9.933, 10.984]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  9.939          10.284         10.465         10.654         10.998

Understanding the PyMC3 Results Object

All the results are contained in the trace variable. This is a pymc3 results object. It contains some information that we might want to extract at times. Varnames tells us all the variable names setup in our model.

In [7]:
trace.varnames
Out[7]:
['Mean of Data']

With the variable names, we can extract chain values for each variable:

In [10]:
trace['Mean of Data']
Out[10]:
array([ 10.47029409,  10.47029409,  10.47029409, ...,  10.41718711,
        10.41718711,  10.41718711])

So for each variable, we have an array of values, which is our chain.

Diagnostics

Autocorrelation Plots

In [11]:
pm3.plots.autocorrplot(trace,figsize=(17,5));

Acceptance Rate

In [12]:
accept = np.sum(trace['Mean of Data'][1:] != trace['Mean of Data'][:-1])
print "Acceptance Rate: ", accept/trace['Mean of Data'].shape[0]
Acceptance Rate:  0.318

Geweke Score

In [13]:
score=pm3.geweke(trace, first=0.1, last=0.5, intervals=20)
score
Out[13]:
{'Mean of Data': array([[  0.00000000e+00,  -2.57960458e-03],
        [  2.63000000e+02,   5.12516935e-02],
        [  5.26000000e+02,   9.03038438e-02],
        [  7.89000000e+02,   1.07273235e-01],
        [  1.05200000e+03,   1.20009676e-01],
        [  1.31500000e+03,   1.04571747e-01],
        [  1.57800000e+03,   4.95667775e-02],
        [  1.84100000e+03,   9.00828212e-02],
        [  2.10400000e+03,   4.62817192e-02],
        [  2.36700000e+03,   3.71138809e-02],
        [  2.63000000e+03,  -1.98595818e-02],
        [  2.89300000e+03,   3.62818489e-02],
        [  3.15600000e+03,   4.95909067e-02],
        [  3.41900000e+03,  -4.74356616e-03],
        [  3.68200000e+03,  -4.35861742e-02],
        [  3.94500000e+03,   1.21927551e-02],
        [  4.20800000e+03,   5.70852509e-02],
        [  4.47100000e+03,   2.71367174e-02],
        [  4.73400000e+03,   1.01657182e-01],
        [  4.99700000e+03,   2.47369405e-02]])}
In [14]:
score=pm3.geweke(trace, first=0.1, last=0.5, intervals=20)
plt.scatter(score['Mean of Data'][:,0],score['Mean of Data'][:,1], marker = 'o', s=100)
plt.axhline(-1.98, c='r')
plt.axhline(1.98, c='r')
plt.ylim(-2.5,2.5)
plt.xlim(0-10,trace['Mean of Data'].shape[0]/2+10)
plt.title('Geweke Plot Comparing first 10% and Slices of the Last 50% of Chain\nDifference in Mean Z score')
plt.show()

Gelmen-Rubin

For GR, we need multiple chains in order to test if each of the chains collapse to the limiting distribution which is our posterior. PyMC3 makes it very easy to generate these.

In [15]:
chain_length = 100000 

with basic_model:
    # obtain starting values via MAP
    startvals = pm3.find_MAP(model=basic_model)
    print startvals
    # instantiate sampler
    step = pm3.Metropolis() 

    # draw 5000 posterior samples
    trace = pm3.sample(chain_length, step=step, start=startvals, njobs=10)
{'Mean of Data': array(10.470294094378682)}
 [-----------------100%-----------------] 100000 of 100000 complete in 49.0 sec
In [16]:
pm3.traceplot(trace,figsize=(20,5));
In [17]:
pm3.gelman_rubin(trace)
Out[17]:
{'Mean of Data': 1.0000400190297385}

Based on Gelman-Rubin and Geweke, we can be confident we have a chain that has converged to the limiting distribution (although perhaps for Gelman-Rubin, we'd want to try different starting values rather than MAP).

In class excercise

Team up and modify the code below to estimate sigma as well as the mean:

In [ ]:
basic_model = pm3.Model()

with basic_model:

    # Priors for unknown model parameters
    mu = pm3.Normal('Mean of Data',mu_prior,sigma_prior)
    
    # Likelihood (sampling distribution) of observations
    data_in = pm3.Normal('Y_obs', mu=mu, sd=sigma, observed=data)
In [ ]:
chain_length = 10000 

with basic_model:
    # obtain starting values via MAP
    startvals = pm3.find_MAP(model=basic_model)
    print startvals
    # instantiate sampler
    step = pm3.Metropolis() 

    # draw 5000 posterior samples
    trace = pm3.sample(chain_length, step=step, start=startvals) 
In [31]:
score=pm3.geweke(trace, first=0.1, last=0.5, intervals=20)
plt.scatter(score['Mean of Data'][:,0],score['Mean of Data'][:,1], marker = 'o', s=100)
plt.axhline(-1.98, c='r')
plt.axhline(1.98, c='r')
plt.ylim(-2.5,2.5)
plt.xlim(0-10,trace['Mean of Data'].shape[0]/2+10)
plt.title('Geweke Plot Comparing first 10% and Slices of the Last 50% of Chain\nDifference in Mean Z score')
plt.show()
In [ ]:
 
In [ ]:
 
In [ ]:
 

My Answer

In [27]:
basic_model_2 = pm3.Model()

with basic_model_2:

    # Priors for unknown model parameters (mu and sigma)
    mu = pm3.Normal('Mean of Data',mu_prior,sigma_prior)
    sigmad = pm3.Uniform('Sigma of Data',.01,20)#pm3.Normal('Sigma of Data',3,1)
    
    # Likelihood (sampling distribution) of observations
    data_in = pm3.Normal('Y_obs', mu=mu, sd=sigmad, observed=data)
Applied interval-transform to Sigma of Data and added transformed Sigma of Data_interval to model.
In [28]:
chain_length = 10000 

with basic_model_2:
    # obtain starting values via MAP
    startvals = pm3.find_MAP(model=basic_model_2)
    print startvals
    # instantiate sampler
    step = pm3.Metropolis() 

    # draw 5000 posterior samples
    trace = pm3.sample(chain_length, step=step, start=startvals) 
{'Mean of Data': array(10.501236172863148), 'Sigma of Data_interval': array(-1.8024240203043396)}
 [-----------------100%-----------------] 10000 of 10000 complete in 1.4 sec
In [29]:
pm3.traceplot(trace,figsize=(18,5));
In [30]:
pm3.summary(trace)
Mean of Data:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  10.488           0.272            0.006            [9.974, 11.022]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  9.934          10.304         10.493         10.671         11.003


Sigma of Data_interval:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  -1.788           0.088            0.002            [-1.953, -1.620]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  -1.948         -1.852         -1.791         -1.726         -1.610


Sigma of Data:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  2.881            0.218            0.005            [2.482, 3.296]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  2.505          2.721          2.867          3.030          3.340

In [32]:
score=pm3.geweke(trace, first=0.1, last=0.5, intervals=20)
plt.scatter(score['Sigma of Data'][:,0],score['Sigma of Data'][:,1], marker = 'o', s=100)
plt.axhline(-1.98, c='r')
plt.axhline(1.98, c='r')
plt.ylim(-2.5,2.5)
plt.xlim(0-10,trace['Mean of Data'].shape[0]/2+10)
plt.title('Geweke Plot Comparing first 10% and Slices of the Last 50% of Chain\nDifference in Mean Z score')
plt.show()