{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import division\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "import urllib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Pandas\n", "\n", "Pandas works hand-in-hand with other python libraries (e.g. matplotlib and numpy) to make manipulating data (what the Pandas team calls \"Data Munging\") easy. With pandas it is easy to \n", "\n", "1. Easily access data using variable names, but have full linear algebra capabilities of numpy\n", "2. Group data by values (for example, calculate the mean income by state)\n", "3. Plot and summarize values\n", "4. Join (combine) different sources of data\n", "5. Powerful time series and panel data capabilities (beyond scope of course)\n", "\n", "Note: If you want to reinforce some of these concepts on your own, I recommend this superb youtube video (https://www.youtube.com/watch?v=5JnMutdy6Fw) and accompanying coursework (https://github.com/brandon-rhodes/pycon-pandas-tutorial) by Brandon Rhodes.\n", "\n", "There are numerous ways to get data into Pandas:\n", "\n", "* Import excel and comma delimited data\n", "* Import stata, sas, matlab, and other datasets\n", "* Import data from an SQL server\n", "* Import data scraped from the web\n", "* Manually building (by typing in values)\n", "\n", "In this tutorial we will focus on the first two methods for reading data, but just know that there is probably a way to get your data into pandas irrespective of what format it is in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading and Cleaning Data\n", "\n", "We will be loading 2010 \"Trip\" data from the NOAA Fisheries Service Recreational Fisheries Statistics Survey (called MRIP). More detail here: https://www.fisheries.noaa.gov/topic/recreational-fishing-data \n", "\n", "We will also load the Census Bureau's county and state fips file, obtained here: https://www.census.gov/2010census/xls/fips_codes_website.xls\n", "\n", "The MRIPs data needs to be downloaded from my website (you only need to do this once)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('mrips_2010.pydata', )" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# run this once and then comment out with #\n", "# urllib.urlretrieve (\"https://rlhick.people.wm.edu/pubp622/data/mrips_2010.pydata\", \"mrips_2010.pydata\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "trips_2010 = pd.io.pickle.read_pickle('mrips_2010.pydata')\n", "fips = pd.io.excel.read_excel('https://rlhick.people.wm.edu/pubp622/data/fips_codes_website.xls')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexadd_hrsadd_phageareaarea_ncarea_xart_reefasg_codeboat_hrs...timetourntsn1tsn2turtlewavewp_intyearzipintercept_date
00NaN1NaNC5NaN11710201011140...1330NaN167680NaN63411.8910632010064752010-11-14
11NaN2NaNC5NaN11710201011140...1332NaN167680NaN63411.8910632010064752010-11-14
2201NaNC5NaN11710201011160...1130NaN168559167680NaN617548.8856332010061102010-11-16
33NaN1NaNC5NaN11710201011230...1200NaN167680168559NaN64489.9235792010064162010-11-23
44NaN1NaNC5NaN11710201011230...1215NaN167680168559NaN64489.9235792010060672010-11-23
\n", "

5 rows × 80 columns

\n", "
" ], "text/plain": [ " index add_hrs add_ph age area area_nc area_x art_reef asg_code \\\n", "0 0 NaN 1 NaN C 5 NaN 1171020101114 \n", "1 1 NaN 2 NaN C 5 NaN 1171020101114 \n", "2 2 0 1 NaN C 5 NaN 1171020101116 \n", "3 3 NaN 1 NaN C 5 NaN 1171020101123 \n", "4 4 NaN 1 NaN C 5 NaN 1171020101123 \n", "\n", " boat_hrs ... time tourn tsn1 tsn2 turtle wave \\\n", "0 0 ... 1330 NaN 167680 NaN 6 \n", "1 0 ... 1332 NaN 167680 NaN 6 \n", "2 0 ... 1130 NaN 168559 167680 NaN 6 \n", "3 0 ... 1200 NaN 167680 168559 NaN 6 \n", "4 0 ... 1215 NaN 167680 168559 NaN 6 \n", "\n", " wp_int year zip intercept_date \n", "0 3411.891063 2010 06475 2010-11-14 \n", "1 3411.891063 2010 06475 2010-11-14 \n", "2 17548.885633 2010 06110 2010-11-16 \n", "3 4489.923579 2010 06416 2010-11-23 \n", "4 4489.923579 2010 06067 2010-11-23 \n", "\n", "[5 rows x 80 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_2010.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Trips is a very big dataset with lots of columns we'll never use. Let's trim it down:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index([u'index', u'add_hrs', u'add_ph', u'age', u'area', u'area_nc', u'area_x',\n", " u'art_reef', u'asg_code', u'boat_hrs', u'catch', u'celltype',\n", " u'cntrbtrs', u'cnty', u'cnty_res', u'coastal', u'compflag', u'county',\n", " u'date1', u'dist', u'distkeys', u'f_by_p', u'ffdays12', u'ffdays2',\n", " u'first', u'fshinsp_a', u'gear', u'gender', u'hrs_dtd', u'hrsf',\n", " u'id_code', u'intsite', u'kod', u'leader', u'license', u'mode2001',\n", " u'mode_asg', u'mode_f', u'mode_fx', u'monitor', u'month', u'muni_res',\n", " u'muni_trp', u'new_list', u'num_fish_a', u'num_typ2', u'num_typ3',\n", " u'num_typ4', u'num_typ6', u'num_typ9', u'on_list', u'party', u'prim1',\n", " u'prim1_common', u'prim2', u'prim2_common', u'prt_code', u'psu_id',\n", " u'pvt_res', u'reefcode', u'reg_res', u'region', u'rig', u'sep_fish',\n", " u'st', u'st_res', u'strat_id', u'strat_interval', u'sub_reg',\n", " u'telefon', u'time', u'tourn', u'tsn1', u'tsn2', u'turtle', u'wave',\n", " u'wp_int', u'year', u'zip', u'intercept_date'],\n", " dtype='object')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_2010.columns" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "trips = trips_2010[['id_code','year','wave','intercept_date','st','prim1',\n", " 'prim1_common','prim2','prim2_common','cnty','ffdays12',\n", " 'ffdays2']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is summary statistics for numeric data columns:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearwavestcntyffdays12ffdays2
count104709104709.000000104709.000000104519.000000104519.000000104519.000000
mean20103.79487923.97461578.79817140.7714396.191936
std01.34758613.160875130.751209122.45600413.037239
min20101.0000001.0000001.0000000.0000000.000000
25%20103.00000012.00000019.0000001.0000000.000000
50%20104.00000022.00000055.00000010.0000002.000000
75%20105.00000037.00000099.00000035.0000007.000000
max20106.00000051.000000810.000000999.00000099.000000
\n", "
" ], "text/plain": [ " year wave st cnty ffdays12 \\\n", "count 104709 104709.000000 104709.000000 104519.000000 104519.000000 \n", "mean 2010 3.794879 23.974615 78.798171 40.771439 \n", "std 0 1.347586 13.160875 130.751209 122.456004 \n", "min 2010 1.000000 1.000000 1.000000 0.000000 \n", "25% 2010 3.000000 12.000000 19.000000 1.000000 \n", "50% 2010 4.000000 22.000000 55.000000 10.000000 \n", "75% 2010 5.000000 37.000000 99.000000 35.000000 \n", "max 2010 6.000000 51.000000 810.000000 999.000000 \n", "\n", " ffdays2 \n", "count 104519.000000 \n", "mean 6.191936 \n", "std 13.037239 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 2.000000 \n", "75% 7.000000 \n", "max 99.000000 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Pandas DataFrames\n", "\n", "##### Referencing columns in data:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 25\n", "1 10\n", "2 12\n", "3 90\n", "4 75\n", "Name: ffdays12, dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# copy data frame into t (note, this is different than assignment [t=trips])\n", "t=trips.copy()\n", "\n", "t.ffdays12.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 25\n", "1 10\n", "2 12\n", "3 90\n", "4 75\n", "Name: ffdays12, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t['ffdays12'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Referencing rows in data:\n", "\n", "We can use numpy-like slicing:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2
511171020101111005201062010-11-1198839010101TAUTOG73510
521171020101111006201062010-11-1198839010101TAUTOG73010
531171020101113001201062010-11-1398839010101TAUTOG11108
541171020101113002201062010-11-1398839010101TAUTOG11108
551171020101113003201062010-11-1398839010101TAUTOG1132
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "51 1171020101111005 2010 6 2010-11-11 9 8839010101 TAUTOG \n", "52 1171020101111006 2010 6 2010-11-11 9 8839010101 TAUTOG \n", "53 1171020101113001 2010 6 2010-11-13 9 8839010101 TAUTOG \n", "54 1171020101113002 2010 6 2010-11-13 9 8839010101 TAUTOG \n", "55 1171020101113003 2010 6 2010-11-13 9 8839010101 TAUTOG \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 \n", "51 7 35 10 \n", "52 7 30 10 \n", "53 11 10 8 \n", "54 11 10 8 \n", "55 11 3 2 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rows 50-54\n", "t.loc[51:55]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datest
01171020101114001201062010-11-149
11171020101114002201062010-11-149
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st\n", "0 1171020101114001 2010 6 2010-11-14 9\n", "1 1171020101114002 2010 6 2010-11-14 9" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rows 1 and 2 for the first 5 columns\n", "t.iloc[0:2,0:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can select rows of data based on column values. Let's select all the rows where st is 9 (Connecticut)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2
01171020101114001201062010-11-1498835020102STRIPED BASS7255
11171020101114002201062010-11-1498835020102STRIPED BASS7102
21171020101116001201062010-11-1698835250101BLUEFISH8835020102STRIPED BASS11126
31171020101123001201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH79025
41171020101123002201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH77520
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "1 1171020101114002 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010 6 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 \n", "0 7 25 5 \n", "1 7 10 2 \n", "2 8835020102 STRIPED BASS 11 12 6 \n", "3 8835250101 BLUEFISH 7 90 25 \n", "4 8835250101 BLUEFISH 7 75 20 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[t.st == 9].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find all rows where days spent fishing during past 12 months exceeds 10:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2
01171020101114001201062010-11-1498835020102STRIPED BASS7255
21171020101116001201062010-11-1698835250101BLUEFISH8835020102STRIPED BASS11126
31171020101123001201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH79025
41171020101123002201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH77520
51171020101123003201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH77010
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010 6 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "5 1171020101123003 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 \n", "0 7 25 5 \n", "2 8835020102 STRIPED BASS 11 12 6 \n", "3 8835250101 BLUEFISH 7 90 25 \n", "4 8835250101 BLUEFISH 7 75 20 \n", "5 8835250101 BLUEFISH 7 70 10 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[t.ffdays12>10].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Math on columns: " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2temp
01171020101114001201062010-11-1498835020102STRIPED BASS725525.321303
11171020101114002201062010-11-1498835020102STRIPED BASS710210.281614
21171020101116001201062010-11-1698835250101BLUEFISH8835020102STRIPED BASS1112612.161692
31171020101123001201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH7902589.333737
41171020101123002201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH7752075.887685
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "1 1171020101114002 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010 6 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 temp \n", "0 7 25 5 25.321303 \n", "1 7 10 2 10.281614 \n", "2 8835020102 STRIPED BASS 11 12 6 12.161692 \n", "3 8835250101 BLUEFISH 7 90 25 89.333737 \n", "4 8835250101 BLUEFISH 7 75 20 75.887685 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can create a new variable and use numpy commands:\n", "t['temp'] = t.ffdays12 + .5*np.random.randn(t.shape[0])\n", "t.head()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40.768852202237284" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.temp.mean()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40.77045249565499" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or we could do this all in one step if we didn't want to create a new column in t:\n", "(t.ffdays12 + .5*np.random.randn(t.shape[0])).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: You can combine pandas with numpy. This is the standard deviation of the column `ffdays12`: " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "122.4554184530213" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std(t.ffdays12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can perform matrix/vector operations on pandas data.\n", "\n", "Here, we can transpose a slice of the data: " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
ffdays1225101290757010151250
ffdays252625201020050
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9\n", "ffdays12 25 10 12 90 75 70 10 15 1 250\n", "ffdays2 5 2 6 25 20 10 2 0 0 50" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[['ffdays12','ffdays2']].head(10).T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do matrix multiplication:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 82320., 17187.],\n", " [ 17187., 3694.]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(t[['ffdays12','ffdays2']].head(10).T,t[['ffdays12','ffdays2']].head(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Summary Statistics\n", "\n", "Find mean of ffdays12 and ffdays2" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ffdays12ffdays2
count104519.000000104519.000000
mean40.7714396.191936
std122.45600413.037239
min0.0000000.000000
25%1.0000000.000000
50%10.0000002.000000
75%35.0000007.000000
max999.00000099.000000
\n", "
" ], "text/plain": [ " ffdays12 ffdays2\n", "count 104519.000000 104519.000000\n", "mean 40.771439 6.191936\n", "std 122.456004 13.037239\n", "min 0.000000 0.000000\n", "25% 1.000000 0.000000\n", "50% 10.000000 2.000000\n", "75% 35.000000 7.000000\n", "max 999.000000 99.000000" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[['ffdays12','ffdays2']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find mean of ffdays12 and ffdays2 by state:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "'st'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mt\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'ffdays12'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'ffdays2'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'st'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdescribe\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc\u001b[0m in \u001b[0;36mgroupby\u001b[1;34m(self, by, axis, level, as_index, sort, group_keys, squeeze)\u001b[0m\n\u001b[0;32m 3157\u001b[0m \u001b[0maxis\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_get_axis_number\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0maxis\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3158\u001b[0m return groupby(self, by=by, axis=axis, level=level, as_index=as_index,\n\u001b[1;32m-> 3159\u001b[1;33m sort=sort, group_keys=group_keys, squeeze=squeeze)\n\u001b[0m\u001b[0;32m 3160\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3161\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0masfreq\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfreq\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mNone\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mNone\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc\u001b[0m in \u001b[0;36mgroupby\u001b[1;34m(obj, by, **kwds)\u001b[0m\n\u001b[0;32m 1197\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'invalid type: %s'\u001b[0m \u001b[1;33m%\u001b[0m \u001b[0mtype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1198\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1199\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mklass\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mby\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1200\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1201\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze)\u001b[0m\n\u001b[0;32m 386\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mgrouper\u001b[0m \u001b[1;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 387\u001b[0m grouper, exclusions, obj = _get_grouper(obj, keys, axis=axis,\n\u001b[1;32m--> 388\u001b[1;33m level=level, sort=sort)\n\u001b[0m\u001b[0;32m 389\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 390\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mobj\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mobj\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc\u001b[0m in \u001b[0;36m_get_grouper\u001b[1;34m(obj, key, axis, level, sort)\u001b[0m\n\u001b[0;32m 2146\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2147\u001b[0m \u001b[1;32melif\u001b[0m \u001b[0mis_in_axis\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgpr\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;31m# df.groupby('name')\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2148\u001b[1;33m \u001b[0min_axis\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgpr\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mTrue\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgpr\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mobj\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mgpr\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2149\u001b[0m \u001b[0mexclusions\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2150\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc\u001b[0m in \u001b[0;36m__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 1795\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_getitem_multilevel\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1796\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1797\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_getitem_column\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1798\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1799\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0m_getitem_column\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc\u001b[0m in \u001b[0;36m_getitem_column\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 1802\u001b[0m \u001b[1;31m# get column\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1803\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mis_unique\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1804\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_get_item_cache\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1805\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1806\u001b[0m \u001b[1;31m# duplicate columns & possible reduce dimensionaility\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc\u001b[0m in \u001b[0;36m_get_item_cache\u001b[1;34m(self, item)\u001b[0m\n\u001b[0;32m 1082\u001b[0m \u001b[0mres\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcache\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1083\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mres\u001b[0m \u001b[1;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1084\u001b[1;33m \u001b[0mvalues\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_data\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1085\u001b[0m \u001b[0mres\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_box_item_values\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalues\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1086\u001b[0m \u001b[0mcache\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mres\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc\u001b[0m in \u001b[0;36mget\u001b[1;34m(self, item, fastpath)\u001b[0m\n\u001b[0;32m 2849\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2850\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[0misnull\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2851\u001b[1;33m \u001b[0mloc\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2852\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2853\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0marange\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0misnull\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc\u001b[0m in \u001b[0;36mget_loc\u001b[1;34m(self, key, method)\u001b[0m\n\u001b[0;32m 1570\u001b[0m \"\"\"\n\u001b[0;32m 1571\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mmethod\u001b[0m \u001b[1;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1572\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0m_values_from_object\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1573\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1574\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_indexer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32mpandas/index.pyx\u001b[0m in \u001b[0;36mpandas.index.IndexEngine.get_loc (pandas/index.c:3824)\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas/index.pyx\u001b[0m in \u001b[0;36mpandas.index.IndexEngine.get_loc (pandas/index.c:3704)\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas/hashtable.pyx\u001b[0m in \u001b[0;36mpandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas/hashtable.pyx\u001b[0m in \u001b[0;36mpandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)\u001b[1;34m()\u001b[0m\n", "\u001b[1;31mKeyError\u001b[0m: 'st'" ] } ], "source": [ "t[['ffdays12','ffdays2']].groupby('st').describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened? We trimmed down the columns of t with the statement `t[['ffdays12','ffdays2']]` and pandas couldn't find the columns `st`. Let's try this again:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ffdays12ffdays2
st
1count2319.0000002319.000000
mean54.4700305.535576
std173.29516611.381926
min0.0000000.000000
25%1.0000000.000000
50%8.0000002.000000
75%30.0000006.000000
max998.00000098.000000
9count2068.0000002068.000000
mean19.0507744.711315
std45.7997248.832107
min0.0000000.000000
25%0.7500000.000000
50%5.0000002.000000
75%20.0000006.000000
max998.00000098.000000
10count3691.0000003691.000000
mean33.4153355.167163
std126.39932713.028735
min0.0000000.000000
25%1.0000000.000000
50%6.0000001.000000
75%20.0000005.000000
max999.00000099.000000
12count37354.00000037354.000000
mean59.6325437.921401
std154.87566415.925836
min0.0000000.000000
25%2.0000000.000000
50%20.0000003.000000
............
37std86.51845210.735914
min0.0000000.000000
25%1.0000000.000000
50%6.0000002.000000
75%25.0000006.000000
max999.00000099.000000
44count1488.0000001488.000000
mean20.7157264.751344
std39.3498509.322511
min0.0000000.000000
25%1.0000000.000000
50%5.0000001.000000
75%20.2500005.000000
max320.00000060.000000
45count2867.0000002867.000000
mean50.6763177.040112
std167.73733317.099136
min0.0000000.000000
25%0.0000000.000000
50%7.0000002.000000
75%30.0000006.000000
max999.00000099.000000
51count4822.0000004822.000000
mean24.9622564.214641
std81.2596198.909940
min0.0000000.000000
25%1.0000000.000000
50%6.0000001.000000
75%25.0000005.000000
max999.00000098.000000
\n", "

136 rows × 2 columns

\n", "
" ], "text/plain": [ " ffdays12 ffdays2\n", "st \n", "1 count 2319.000000 2319.000000\n", " mean 54.470030 5.535576\n", " std 173.295166 11.381926\n", " min 0.000000 0.000000\n", " 25% 1.000000 0.000000\n", " 50% 8.000000 2.000000\n", " 75% 30.000000 6.000000\n", " max 998.000000 98.000000\n", "9 count 2068.000000 2068.000000\n", " mean 19.050774 4.711315\n", " std 45.799724 8.832107\n", " min 0.000000 0.000000\n", " 25% 0.750000 0.000000\n", " 50% 5.000000 2.000000\n", " 75% 20.000000 6.000000\n", " max 998.000000 98.000000\n", "10 count 3691.000000 3691.000000\n", " mean 33.415335 5.167163\n", " std 126.399327 13.028735\n", " min 0.000000 0.000000\n", " 25% 1.000000 0.000000\n", " 50% 6.000000 1.000000\n", " 75% 20.000000 5.000000\n", " max 999.000000 99.000000\n", "12 count 37354.000000 37354.000000\n", " mean 59.632543 7.921401\n", " std 154.875664 15.925836\n", " min 0.000000 0.000000\n", " 25% 2.000000 0.000000\n", " 50% 20.000000 3.000000\n", "... ... ...\n", "37 std 86.518452 10.735914\n", " min 0.000000 0.000000\n", " 25% 1.000000 0.000000\n", " 50% 6.000000 2.000000\n", " 75% 25.000000 6.000000\n", " max 999.000000 99.000000\n", "44 count 1488.000000 1488.000000\n", " mean 20.715726 4.751344\n", " std 39.349850 9.322511\n", " min 0.000000 0.000000\n", " 25% 1.000000 0.000000\n", " 50% 5.000000 1.000000\n", " 75% 20.250000 5.000000\n", " max 320.000000 60.000000\n", "45 count 2867.000000 2867.000000\n", " mean 50.676317 7.040112\n", " std 167.737333 17.099136\n", " min 0.000000 0.000000\n", " 25% 0.000000 0.000000\n", " 50% 7.000000 2.000000\n", " 75% 30.000000 6.000000\n", " max 999.000000 99.000000\n", "51 count 4822.000000 4822.000000\n", " mean 24.962256 4.214641\n", " std 81.259619 8.909940\n", " min 0.000000 0.000000\n", " 25% 1.000000 0.000000\n", " 50% 6.000000 1.000000\n", " 75% 25.000000 5.000000\n", " max 999.000000 98.000000\n", "\n", "[136 rows x 2 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.groupby('st')[['ffdays12','ffdays2']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining DataFrames\n", "\n", "The state fips codes are hardly intuitive and even worse for the county fips codes. The goal of this section is to \n", "\n", "1. Add text-based labels\n", "2. Be able to use these labels for fast data selection\n", "\n", "To do this, we will attempt to match text based state and county labels to our trips dataframe. Pandas calls this a merge." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
State AbbreviationState FIPS CodeCounty FIPS CodeFIPS Entity CodeANSI CodeGU NameEntity Description
0AL1671242403054Abbevillecity
1AL1734602403063Adamsvillecity
2AL11178202403069Alabastercity
3AL1959882403074Albertvillecity
4AL112311322403077Alexander Citycity
\n", "
" ], "text/plain": [ " State Abbreviation State FIPS Code County FIPS Code FIPS Entity Code \\\n", "0 AL 1 67 124 \n", "1 AL 1 73 460 \n", "2 AL 1 117 820 \n", "3 AL 1 95 988 \n", "4 AL 1 123 1132 \n", "\n", " ANSI Code GU Name Entity Description \n", "0 2403054 Abbeville city \n", "1 2403063 Adamsville city \n", "2 2403069 Alabaster city \n", "3 2403074 Albertville city \n", "4 2403077 Alexander City city " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fips codes (numeric) uniquely identify counties in our trips file. So let's look at the fips file (that we grabbed from Census) above and combine it with our trips data. These column names are messy: let's clean it up.\n", "\n", "For me, column names should always:\n", "1. Be lower case\n", "2. Have no spaces or funky characters" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statestate_fipscounty_fipsfipsansi_codecounty_nameentity_type
0AL1671242403054Abbevillecity
1AL1734602403063Adamsvillecity
2AL11178202403069Alabastercity
3AL1959882403074Albertvillecity
4AL112311322403077Alexander Citycity
\n", "
" ], "text/plain": [ " state state_fips county_fips fips ansi_code county_name entity_type\n", "0 AL 1 67 124 2403054 Abbeville city\n", "1 AL 1 73 460 2403063 Adamsville city\n", "2 AL 1 117 820 2403069 Alabaster city\n", "3 AL 1 95 988 2403074 Albertville city\n", "4 AL 1 123 1132 2403077 Alexander City city" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names = ['state','state_fips','county_fips','fips','ansi_code',\n", " 'county_name','entity_type']\n", "fips.columns=names\n", "fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The merge statement takes two pandas dataframes and combines them. Notice in the merge command below we have trips,fips. The dataframe trips is known as the \"left\" dataframe and fips, the \"right\". I want to combine these columns based on the numeric fips values (state and county). These fields have different names in each of the dataframes, so I need to specify the names (in the same order) for the left (trips) and right (fips) dataframes. The option `how='left'` tells pandas to always keep rows from trips whether a match is made with fips or not. Note that if a row in the fips table doesn't match our trips data, it is discarded." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# join clean fips codes to data_trip\n", "trips_fips = pd.merge(trips,fips, left_on=['st','cnty'], \n", " right_on=['state_fips','county_fips'],\n", " how='left', sort=False)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(104709, 12)\n", "(1561000, 19)\n" ] } ], "source": [ "print trips.shape\n", "print trips_fips.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result we get is disconcerting. What we wanted to achieve with the merge was to add text names to the trips table, and our merged dataset has 1.5 million rows rather than 104,709 (what we want). This probably occured because there were duplicate (or redundant information on fips codes in the fips table). To check that use the `duplicate` method to find rows with duplicate state and county fips codes." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "township 12641\n", "city 8355\n", "town 8136\n", "village 3974\n", "County 2555\n", "borough 1136\n", "Reservation 438\n", "ANVSA 221\n", "CDP 147\n", "comunidad 124\n", "OTSA 101\n", "zona urbana 79\n", "charter township 78\n", "Hawaiian Home Land 75\n", "Municipio 58\n", "SDTSA 53\n", "Rancheria 47\n", "Parish 38\n", "Indian Reservation 36\n", "Township 35\n", "dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fips[fips.duplicated(['state_fips','county_fips'])]['entity_type'].value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, this file is (apparently) for census workers in the field who may encounter lots of different place names, and this file helps their data-entry program map into the correct state and county fips codes. This isn't ideal for our purposes. Since our trips table already has the correct numeric fips codes, we need a fips table that has the correct and **unique** county/city names.\n", "\n", "Grab another fips table from here: https://www.census.gov/geo/reference/codes/cou.html and choose United States.\n", "\n", "For convenience, I'll save to my website." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3234, 5)\n" ] } ], "source": [ "fips = pd.io.parsers.read_csv('https://rlhick.people.wm.edu/pubp622/data/national_county.txt',\n", " header=False,names=['state','state_fips','county_fips','county','fips_class_code'])\n", "print fips.shape" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statestate_fipscounty_fipscountyfips_class_code
0AL13Baldwin CountyH1
1AL15Barbour CountyH1
2AL17Bibb CountyH1
3AL19Blount CountyH1
4AL111Bullock CountyH1
\n", "
" ], "text/plain": [ " state state_fips county_fips county fips_class_code\n", "0 AL 1 3 Baldwin County H1\n", "1 AL 1 5 Barbour County H1\n", "2 AL 1 7 Bibb County H1\n", "3 AL 1 9 Blount County H1\n", "4 AL 1 11 Bullock County H1" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having worked with the data before, I know that counties/entities like Virginia Beach City are labeled \"Virginia Beach city\". Let's fix that:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "fips[\"county\"] = fips.county.str.replace(\"city\",\"City\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check for duplicates:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of duplicate rows: 0\n" ] } ], "source": [ "fips.duplicated(['state_fips','county_fips'])\n", "print \"Number of duplicate rows:\", fips[fips.duplicated(['state_fips','county_fips'])].size" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(104709, 12)\n", "(104709, 17)\n" ] } ], "source": [ "trips_fips = pd.merge(trips,fips, left_on=['st','cnty'], \n", " right_on=['state_fips','county_fips'],\n", " how='left', sort=False)\n", "print trips.shape\n", "print trips_fips.shape" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2statestate_fipscounty_fipscountyfips_class_code
01171020101114001201062010-11-1498835020102STRIPED BASS7255CT97Middlesex CountyH4
11171020101114002201062010-11-1498835020102STRIPED BASS7102CT97Middlesex CountyH4
21171020101116001201062010-11-1698835250101BLUEFISH8835020102STRIPED BASS11126CT911New London CountyH4
31171020101123001201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH79025CT97Middlesex CountyH4
41171020101123002201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH77520CT97Middlesex CountyH4
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "1 1171020101114002 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010 6 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 state state_fips \\\n", "0 7 25 5 CT 9 \n", "1 7 10 2 CT 9 \n", "2 8835020102 STRIPED BASS 11 12 6 CT 9 \n", "3 8835250101 BLUEFISH 7 90 25 CT 9 \n", "4 8835250101 BLUEFISH 7 75 20 CT 9 \n", "\n", " county_fips county fips_class_code \n", "0 7 Middlesex County H4 \n", "1 7 Middlesex County H4 \n", "2 11 New London County H4 \n", "3 7 Middlesex County H4 \n", "4 7 Middlesex County H4 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's rename our state and intercept column to make it more intuitive." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2state_interceptstate_fipscounty_fipscounty_interceptfips_class_code
01171020101114001201062010-11-1498835020102STRIPED BASS7255CT97Middlesex CountyH4
11171020101114002201062010-11-1498835020102STRIPED BASS7102CT97Middlesex CountyH4
21171020101116001201062010-11-1698835250101BLUEFISH8835020102STRIPED BASS11126CT911New London CountyH4
31171020101123001201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH79025CT97Middlesex CountyH4
41171020101123002201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH77520CT97Middlesex CountyH4
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "1 1171020101114002 2010 6 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010 6 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010 6 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 state_intercept \\\n", "0 7 25 5 CT \n", "1 7 10 2 CT \n", "2 8835020102 STRIPED BASS 11 12 6 CT \n", "3 8835250101 BLUEFISH 7 90 25 CT \n", "4 8835250101 BLUEFISH 7 75 20 CT \n", "\n", " state_fips county_fips county_intercept fips_class_code \n", "0 9 7 Middlesex County H4 \n", "1 9 7 Middlesex County H4 \n", "2 9 11 New London County H4 \n", "3 9 7 Middlesex County H4 \n", "4 9 7 Middlesex County H4 " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.rename(columns = {'county': 'county_intercept','state':'state_intercept'},inplace=True)\n", "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Indices\n", "\n", "Let's create an index on these new state and county fields:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
state_interceptcounty_intercept
CTMiddlesex County1171020101114001201062010-11-1498835020102STRIPED BASS725597H4
Middlesex County1171020101114002201062010-11-1498835020102STRIPED BASS710297H4
New London County1171020101116001201062010-11-1698835250101BLUEFISH8835020102STRIPED BASS11126911H4
Middlesex County1171020101123001201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH7902597H4
Middlesex County1171020101123002201062010-11-2398835020102STRIPED BASS8835250101BLUEFISH7752097H4
\n", "
" ], "text/plain": [ " id_code year wave \\\n", "state_intercept county_intercept \n", "CT Middlesex County 1171020101114001 2010 6 \n", " Middlesex County 1171020101114002 2010 6 \n", " New London County 1171020101116001 2010 6 \n", " Middlesex County 1171020101123001 2010 6 \n", " Middlesex County 1171020101123002 2010 6 \n", "\n", " intercept_date st prim1 \\\n", "state_intercept county_intercept \n", "CT Middlesex County 2010-11-14 9 8835020102 \n", " Middlesex County 2010-11-14 9 8835020102 \n", " New London County 2010-11-16 9 8835250101 \n", " Middlesex County 2010-11-23 9 8835020102 \n", " Middlesex County 2010-11-23 9 8835020102 \n", "\n", " prim1_common prim2 prim2_common \\\n", "state_intercept county_intercept \n", "CT Middlesex County STRIPED BASS \n", " Middlesex County STRIPED BASS \n", " New London County BLUEFISH 8835020102 STRIPED BASS \n", " Middlesex County STRIPED BASS 8835250101 BLUEFISH \n", " Middlesex County STRIPED BASS 8835250101 BLUEFISH \n", "\n", " cnty ffdays12 ffdays2 state_fips \\\n", "state_intercept county_intercept \n", "CT Middlesex County 7 25 5 9 \n", " Middlesex County 7 10 2 9 \n", " New London County 11 12 6 9 \n", " Middlesex County 7 90 25 9 \n", " Middlesex County 7 75 20 9 \n", "\n", " county_fips fips_class_code \n", "state_intercept county_intercept \n", "CT Middlesex County 7 H4 \n", " Middlesex County 7 H4 \n", " New London County 11 H4 \n", " Middlesex County 7 H4 \n", " Middlesex County 7 H4 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.set_index(['state_intercept','county_intercept'],inplace=True)\n", "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why use an index for these state and county codes? The biggest reason is that we can easily access values for looking at our data. Here we have 2 month avidity for anglers sampled in James City County:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/index.py:5091: PerformanceWarning: indexing past lexsort depth may impact performance.\n", " PerformanceWarning)\n" ] }, { "data": { "text/plain": [ "state_intercept county_intercept \n", "VA James City County 2\n", " James City County 1\n", " James City County 0\n", " James City County 0\n", " James City County 4\n", " James City County 6\n", " James City County 6\n", " James City County 6\n", " James City County 6\n", " James City County 3\n", " James City County 3\n", " James City County 8\n", " James City County 20\n", " James City County 1\n", " James City County 12\n", " James City County 2\n", " James City County 1\n", " James City County 8\n", " James City County 0\n", " James City County 4\n", " James City County 2\n", " James City County 2\n", " James City County 0\n", " James City County 0\n", " James City County 0\n", " James City County 0\n", " James City County 15\n", " James City County 15\n", " James City County 3\n", "Name: ffdays2, dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips['ffdays2'].loc['VA','James City County']" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# we want to create a richer index that includes the prim1_common species name:\n", "# 1. put state_intercept and county_intercept back in data\n", "trips_fips.reset_index(inplace=True)\n", "trips_fips.head()\n", "# 2. create new multiindex\n", "trips_fips.set_index(['state_intercept','county_intercept','prim1_common'],inplace=True)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
state_interceptcounty_interceptprim1_common
CTMiddlesex CountySTRIPED BASS1171020101114001201062010-11-1498835020102725597H4
STRIPED BASS1171020101114002201062010-11-1498835020102710297H4
New London CountyBLUEFISH1171020101116001201062010-11-16988352501018835020102STRIPED BASS11126911H4
Middlesex CountySTRIPED BASS1171020101123001201062010-11-23988350201028835250101BLUEFISH7902597H4
STRIPED BASS1171020101123002201062010-11-23988350201028835250101BLUEFISH7752097H4
\n", "
" ], "text/plain": [ " id_code year wave \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 1171020101114001 2010 6 \n", " STRIPED BASS 1171020101114002 2010 6 \n", " New London County BLUEFISH 1171020101116001 2010 6 \n", " Middlesex County STRIPED BASS 1171020101123001 2010 6 \n", " STRIPED BASS 1171020101123002 2010 6 \n", "\n", " intercept_date st prim1 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 2010-11-14 9 8835020102 \n", " STRIPED BASS 2010-11-14 9 8835020102 \n", " New London County BLUEFISH 2010-11-16 9 8835250101 \n", " Middlesex County STRIPED BASS 2010-11-23 9 8835020102 \n", " STRIPED BASS 2010-11-23 9 8835020102 \n", "\n", " prim2 prim2_common \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS \n", " STRIPED BASS \n", " New London County BLUEFISH 8835020102 STRIPED BASS \n", " Middlesex County STRIPED BASS 8835250101 BLUEFISH \n", " STRIPED BASS 8835250101 BLUEFISH \n", "\n", " cnty ffdays12 ffdays2 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 7 25 5 \n", " STRIPED BASS 7 10 2 \n", " New London County BLUEFISH 11 12 6 \n", " Middlesex County STRIPED BASS 7 90 25 \n", " STRIPED BASS 7 75 20 \n", "\n", " state_fips county_fips \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 9 7 \n", " STRIPED BASS 9 7 \n", " New London County BLUEFISH 9 11 \n", " Middlesex County STRIPED BASS 9 7 \n", " STRIPED BASS 9 7 \n", "\n", " fips_class_code \n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS H4 \n", " STRIPED BASS H4 \n", " New London County BLUEFISH H4 \n", " Middlesex County STRIPED BASS H4 \n", " STRIPED BASS H4 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.head()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
state_interceptcounty_interceptprim1_common
VAVirginia Beach CityRED DRUM1681420101127002201062010-11-275188354409018835440102SPOTTED SEATROUT810401051810C7
RED DRUM1681420101127004201062010-11-275188354409018835440102SPOTTED SEATROUT8101502451810C7
RED DRUM1681420101127005201062010-11-275188354409018835250101BLUEFISH810502051810C7
RED DRUM1681420101126003201062010-11-265188354409018835440102SPOTTED SEATROUT8108451810C7
RED DRUM1681420101126004201062010-11-265188354409018101503551810C7
\n", "
" ], "text/plain": [ " id_code year \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 1681420101127002 2010 \n", " RED DRUM 1681420101127004 2010 \n", " RED DRUM 1681420101127005 2010 \n", " RED DRUM 1681420101126003 2010 \n", " RED DRUM 1681420101126004 2010 \n", "\n", " wave intercept_date st \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 6 2010-11-27 51 \n", " RED DRUM 6 2010-11-27 51 \n", " RED DRUM 6 2010-11-27 51 \n", " RED DRUM 6 2010-11-26 51 \n", " RED DRUM 6 2010-11-26 51 \n", "\n", " prim1 prim2 \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 8835440901 8835440102 \n", " RED DRUM 8835440901 8835440102 \n", " RED DRUM 8835440901 8835250101 \n", " RED DRUM 8835440901 8835440102 \n", " RED DRUM 8835440901 \n", "\n", " prim2_common cnty \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM SPOTTED SEATROUT 810 \n", " RED DRUM SPOTTED SEATROUT 810 \n", " RED DRUM BLUEFISH 810 \n", " RED DRUM SPOTTED SEATROUT 810 \n", " RED DRUM 810 \n", "\n", " ffdays12 ffdays2 \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 40 10 \n", " RED DRUM 150 24 \n", " RED DRUM 50 20 \n", " RED DRUM 8 4 \n", " RED DRUM 150 35 \n", "\n", " state_fips county_fips \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 51 810 \n", " RED DRUM 51 810 \n", " RED DRUM 51 810 \n", " RED DRUM 51 810 \n", " RED DRUM 51 810 \n", "\n", " fips_class_code \n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM C7 \n", " RED DRUM C7 \n", " RED DRUM C7 \n", " RED DRUM C7 \n", " RED DRUM C7 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA','Virginia Beach City','RED DRUM'].head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
county_intercept
Virginia Beach City1662020101110001201062010-11-1051883502010281014551810C7
Virginia Beach City1662020101110002201062010-11-1051883502010281010551810C7
Hampton City1680820101122001201062010-11-225188350201028835440901RED DRUM650802451650C7
Virginia Beach City1681420101121005201062010-11-215188350201028104451810C7
Virginia Beach City1681420101121012201062010-11-215188350201028102051810C7
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st \\\n", "county_intercept \n", "Virginia Beach City 1662020101110001 2010 6 2010-11-10 51 \n", "Virginia Beach City 1662020101110002 2010 6 2010-11-10 51 \n", "Hampton City 1680820101122001 2010 6 2010-11-22 51 \n", "Virginia Beach City 1681420101121005 2010 6 2010-11-21 51 \n", "Virginia Beach City 1681420101121012 2010 6 2010-11-21 51 \n", "\n", " prim1 prim2 prim2_common cnty ffdays12 \\\n", "county_intercept \n", "Virginia Beach City 8835020102 810 14 \n", "Virginia Beach City 8835020102 810 10 \n", "Hampton City 8835020102 8835440901 RED DRUM 650 80 \n", "Virginia Beach City 8835020102 810 4 \n", "Virginia Beach City 8835020102 810 2 \n", "\n", " ffdays2 state_fips county_fips fips_class_code \n", "county_intercept \n", "Virginia Beach City 5 51 810 C7 \n", "Virginia Beach City 5 51 810 C7 \n", "Hampton City 24 51 650 C7 \n", "Virginia Beach City 4 51 810 C7 \n", "Virginia Beach City 0 51 810 C7 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA',:,'STRIPED BASS'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, indexes are pretty cool, but what if we need the county data for doing summary statistics? Example, suppose we want to get a frequency count of trips by county in VA targeting Striped Bass? Notice, we can't just access index values like you would a column of data:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'DataFrame' object has no attribute 'county_intercept'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mtrips_fips\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcounty_intercept\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m/home/robhicks/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc\u001b[0m in \u001b[0;36m__getattr__\u001b[1;34m(self, name)\u001b[0m\n\u001b[0;32m 2148\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2149\u001b[0m raise AttributeError(\"'%s' object has no attribute '%s'\" %\n\u001b[1;32m-> 2150\u001b[1;33m (type(self).__name__, name))\n\u001b[0m\u001b[0;32m 2151\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2152\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'county_intercept'" ] } ], "source": [ "trips_fips.county_intercept" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the strategy will be to copy the index values back into our table so we can use it for other purposes:\n", "\n", "(Note: you can also keep these columns in your data by including the option `drop=False` when you `set_index()`)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index([u'CT', u'CT', u'CT', u'CT', u'CT', u'CT', u'CT', u'CT', u'CT', u'CT', \n", " ...\n", " u'MS', u'MS', u'MS', u'MS', u'MS', u'MS', u'MS', u'MS', u'MS', u'MS'],\n", " dtype='object', name=u'state_intercept', length=104709)\n" ] } ], "source": [ "print trips_fips.index.get_level_values(0)\n", "trips_fips['state_intercept'] = trips_fips.index.get_level_values(0)\n", "trips_fips['county_intercept'] = trips_fips.index.get_level_values(1)\n", "trips_fips['species_name'] = trips_fips.index.get_level_values(2)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_codestate_interceptcounty_interceptspecies_name
state_interceptcounty_interceptprim1_common
CTMiddlesex CountySTRIPED BASS1171020101114001201062010-11-1498835020102725597H4CTMiddlesex CountySTRIPED BASS
STRIPED BASS1171020101114002201062010-11-1498835020102710297H4CTMiddlesex CountySTRIPED BASS
New London CountyBLUEFISH1171020101116001201062010-11-16988352501018835020102STRIPED BASS11126911H4CTNew London CountyBLUEFISH
Middlesex CountySTRIPED BASS1171020101123001201062010-11-23988350201028835250101BLUEFISH7902597H4CTMiddlesex CountySTRIPED BASS
STRIPED BASS1171020101123002201062010-11-23988350201028835250101BLUEFISH7752097H4CTMiddlesex CountySTRIPED BASS
\n", "
" ], "text/plain": [ " id_code year wave \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 1171020101114001 2010 6 \n", " STRIPED BASS 1171020101114002 2010 6 \n", " New London County BLUEFISH 1171020101116001 2010 6 \n", " Middlesex County STRIPED BASS 1171020101123001 2010 6 \n", " STRIPED BASS 1171020101123002 2010 6 \n", "\n", " intercept_date st prim1 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 2010-11-14 9 8835020102 \n", " STRIPED BASS 2010-11-14 9 8835020102 \n", " New London County BLUEFISH 2010-11-16 9 8835250101 \n", " Middlesex County STRIPED BASS 2010-11-23 9 8835020102 \n", " STRIPED BASS 2010-11-23 9 8835020102 \n", "\n", " prim2 prim2_common \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS \n", " STRIPED BASS \n", " New London County BLUEFISH 8835020102 STRIPED BASS \n", " Middlesex County STRIPED BASS 8835250101 BLUEFISH \n", " STRIPED BASS 8835250101 BLUEFISH \n", "\n", " cnty ffdays12 ffdays2 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 7 25 5 \n", " STRIPED BASS 7 10 2 \n", " New London County BLUEFISH 11 12 6 \n", " Middlesex County STRIPED BASS 7 90 25 \n", " STRIPED BASS 7 75 20 \n", "\n", " state_fips county_fips \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 9 7 \n", " STRIPED BASS 9 7 \n", " New London County BLUEFISH 9 11 \n", " Middlesex County STRIPED BASS 9 7 \n", " STRIPED BASS 9 7 \n", "\n", " fips_class_code \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS H4 \n", " STRIPED BASS H4 \n", " New London County BLUEFISH H4 \n", " Middlesex County STRIPED BASS H4 \n", " STRIPED BASS H4 \n", "\n", " state_intercept \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS CT \n", " STRIPED BASS CT \n", " New London County BLUEFISH CT \n", " Middlesex County STRIPED BASS CT \n", " STRIPED BASS CT \n", "\n", " county_intercept \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS Middlesex County \n", " STRIPED BASS Middlesex County \n", " New London County BLUEFISH New London County \n", " Middlesex County STRIPED BASS Middlesex County \n", " STRIPED BASS Middlesex County \n", "\n", " species_name \n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS STRIPED BASS \n", " STRIPED BASS STRIPED BASS \n", " New London County BLUEFISH BLUEFISH \n", " Middlesex County STRIPED BASS STRIPED BASS \n", " STRIPED BASS STRIPED BASS " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This allows us to use index lookups and also the index values for calculating interesting summary data. The following calculates the top 10 targeted species in VA:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " 1794\n", "SUMMER FLOUNDER 972\n", "STRIPED BASS 710\n", "ATLANTIC CROAKER 448\n", "SPOTTED SEATROUT 154\n", "TAUTOG 153\n", "SPOT 143\n", "COBIA 117\n", "RED DRUM 66\n", "BLUEFISH 50\n", "dtype: int64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA'].species_name.value_counts().head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This calculates the top targeted species in Gloucester County, VA." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " 37\n", "STRIPED BASS 23\n", "ATLANTIC CROAKER 20\n", "SPOT 18\n", "SUMMER FLOUNDER 15\n", "COBIA 7\n", "SPANISH MACKEREL 2\n", "BLUEFISH 1\n", "dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA','Gloucester County',:].species_name.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about top 10 counties in VA with respect to numbers of intercepted trips?" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Virginia Beach City 1876\n", "Hampton City 553\n", "Accomack County 505\n", "Norfolk City 440\n", "Newport News City 306\n", "Northampton County 297\n", "Middlesex County 215\n", "Gloucester County 123\n", "Mathews County 67\n", "Portsmouth City 64\n", "dtype: int64\n" ] } ], "source": [ "print trips_fips.loc['VA'].county_intercept.value_counts().head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tells us when summer flounder is being targeted in VA:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6 190\n", "7 176\n", "5 167\n", "8 163\n", "4 137\n", "9 95\n", "10 31\n", "3 12\n", "11 1\n", "dtype: int64" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA',:,'SUMMER FLOUNDER'].intercept_date.dt.month.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note we used the data/time field `intercept_date` to extract the month using the `dt.month` method. Pandas has immensely rich datetime capabilities that we won't demonstrate here.\n", "\n", "##Groupby\n", "\n", "We can compute interesting data by grouping over column values and using our index for fast selection. Suppose we want to know total number of trips in VA by county of intercept for each wave (wave 1 = Jan,Feb; 2 = Mar,Apr; etc). Groupby lets us do this:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "county_intercept wave\n", "Accomack County 2 101\n", " 3 172\n", " 4 181\n", " 5 50\n", " 6 1\n", "Chesapeake City 2 10\n", " 3 8\n", " 4 5\n", " 5 3\n", " 6 33\n", "Essex County 3 19\n", " 4 3\n", " 6 15\n", "Gloucester County 3 40\n", " 4 32\n", " 5 35\n", " 6 16\n", "Hampton City 2 59\n", " 3 108\n", " 4 173\n", " 5 141\n", " 6 72\n", "Isle of Wight County 2 17\n", " 5 9\n", "James City County 3 13\n", "Name: id_code, dtype: int64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA'].groupby(['county_intercept','wave'])['id_code'].count().head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stack and Unstack\n", "\n", "We can manipulate the result above further to make it more amenable for comparisons and graphing (e.g. we want all of the wave 2 records on the same row):" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wave2.03.04.05.06.0
county_intercept
Accomack County101172181501
Chesapeake City1085333
Essex CountyNaN193NaN15
Gloucester CountyNaN40323516
Hampton City5910817314172
Isle of Wight County17NaNNaN9NaN
James City CountyNaN13115NaN
King William CountyNaNNaN9NaNNaN
Mathews CountyNaN2030215
Middlesex CountyNaN64606328
Newport News City2563797465
Norfolk City181251599840
Northampton County7731084960
Northumberland CountyNaN11854
Poquoson CityNaN23111214
Portsmouth CityNaN3113NaN20
Richmond CountyNaN2144NaN
Suffolk CityNaN11139NaN
Surry CountyNaN222NaN
Virginia Beach City382427412341314
Westmoreland County554NaN3
York CountyNaN199510
\n", "
" ], "text/plain": [ "wave 2 3 4 5 6\n", "county_intercept \n", "Accomack County 101 172 181 50 1\n", "Chesapeake City 10 8 5 3 33\n", "Essex County NaN 19 3 NaN 15\n", "Gloucester County NaN 40 32 35 16\n", "Hampton City 59 108 173 141 72\n", "Isle of Wight County 17 NaN NaN 9 NaN\n", "James City County NaN 13 11 5 NaN\n", "King William County NaN NaN 9 NaN NaN\n", "Mathews County NaN 20 30 2 15\n", "Middlesex County NaN 64 60 63 28\n", "Newport News City 25 63 79 74 65\n", "Norfolk City 18 125 159 98 40\n", "Northampton County 7 73 108 49 60\n", "Northumberland County NaN 11 8 5 4\n", "Poquoson City NaN 23 11 12 14\n", "Portsmouth City NaN 31 13 NaN 20\n", "Richmond County NaN 21 4 4 NaN\n", "Suffolk City NaN 11 13 9 NaN\n", "Surry County NaN 2 2 2 NaN\n", "Virginia Beach City 382 427 412 341 314\n", "Westmoreland County 5 5 4 NaN 3\n", "York County NaN 19 9 5 10" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = trips_fips.loc['VA'].groupby(['county_intercept','wave'])['id_code'].count()\n", "t.unstack().head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we could put waves on the same row:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
county_interceptAccomack CountyChesapeake CityEssex CountyGloucester CountyHampton CityIsle of Wight CountyJames City CountyKing William CountyMathews CountyMiddlesex County...Northampton CountyNorthumberland CountyPoquoson CityPortsmouth CityRichmond CountySuffolk CitySurry CountyVirginia Beach CityWestmoreland CountyYork County
wave
210110NaNNaN5917NaNNaNNaNNaN...7NaNNaNNaNNaNNaNNaN3825NaN
317281940108NaN13NaN2064...7311233121112427519
41815332173NaN1193060...10881113413241249
5503NaN3514195NaN263...49512NaN492341NaN5
6133151672NaNNaNNaN1528...6041420NaNNaNNaN314310
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ "county_intercept Accomack County Chesapeake City Essex County \\\n", "wave \n", "2 101 10 NaN \n", "3 172 8 19 \n", "4 181 5 3 \n", "5 50 3 NaN \n", "6 1 33 15 \n", "\n", "county_intercept Gloucester County Hampton City Isle of Wight County \\\n", "wave \n", "2 NaN 59 17 \n", "3 40 108 NaN \n", "4 32 173 NaN \n", "5 35 141 9 \n", "6 16 72 NaN \n", "\n", "county_intercept James City County King William County Mathews County \\\n", "wave \n", "2 NaN NaN NaN \n", "3 13 NaN 20 \n", "4 11 9 30 \n", "5 5 NaN 2 \n", "6 NaN NaN 15 \n", "\n", "county_intercept Middlesex County ... Northampton County \\\n", "wave ... \n", "2 NaN ... 7 \n", "3 64 ... 73 \n", "4 60 ... 108 \n", "5 63 ... 49 \n", "6 28 ... 60 \n", "\n", "county_intercept Northumberland County Poquoson City Portsmouth City \\\n", "wave \n", "2 NaN NaN NaN \n", "3 11 23 31 \n", "4 8 11 13 \n", "5 5 12 NaN \n", "6 4 14 20 \n", "\n", "county_intercept Richmond County Suffolk City Surry County \\\n", "wave \n", "2 NaN NaN NaN \n", "3 21 11 2 \n", "4 4 13 2 \n", "5 4 9 2 \n", "6 NaN NaN NaN \n", "\n", "county_intercept Virginia Beach City Westmoreland County York County \n", "wave \n", "2 382 5 NaN \n", "3 427 5 19 \n", "4 412 4 9 \n", "5 341 NaN 5 \n", "6 314 3 10 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.unstack(0).head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can combine these with plot to quickly make cool charts. Let's focus on the peninsula:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAHzCAYAAAD8Yx6dAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VdW9///3JyCCIQkQMIwxWEEFEVAEBZSD4FAFRKso\nyqRUb3+OrXpVapWAtQ7XqXjrrVoFIUziBE78pGiAIoq1iCJSkMoogsyDIMF8vn8kHBM4CRlZIXk9\nH4/zcO+1p3WWh7zPWnufvc3dBQAAwogLXQEAAKoyghgAgIAIYgAAAiKIAQAIiCAGACAgghgAgIAK\nDWIze9HM1pvZF3nKOprZfDNbYGafmNkZeZYNM7NlZrbEzM4vz4oDAFAZHKpHPFrShQeUPSrpPndv\nL+n+3HmZWStJV0pqlbvNM2ZGjxsAgEIUGpTuPkfSlgOK10lKyp2uI2lt7vQlkia6e5a7r5D0taSO\nZVdVAAAqn+ol2OYeSf8ws8eUE+Rn5ZY3lvRRnvXWSGpSuuoBAFC5lWTo+AVJt7p7qqTfSXqxkHW5\nfyYAAIUoSY+4o7v3zJ1+RdLfcqfXSmqWZ72m+nnYOsrMCGcAQJXj7harvCQ94q/NrFvu9LmSluZO\nT5N0lZnVMLPmklpIml9AZY6o1/Dhw4PXobK/aGPaubK8aGPaONarMIX2iM1soqRukuqb2WrlXCV9\ng6S/mNnRknbnzsvdF5vZy5IWS9on6UY/1NEBAKjiCg1id+9fwKJOBaz/J0l/Km2lAACoKvidbxFE\nIpHQVaj0aOPDg3Yuf7Rx+atsbWyHe/TYzBixBgBUKWYmL+BirZJcNV0uzGLWDyh3fDEEEFKFCWKJ\nP4g4/PgCCCA0zhEDABAQQQwAQEAEMQAAARHEAAAERBAH9tRTT2n37t0l2rZLly7luv/S2rZtm/7v\n//4vyLEB4EhRYX5HnPsbq8Nal4qgefPm+uc//6nk5OQKs//s7GzFxZX+O9qKFSvUu3dvffHFF6Xe\nV3mpqp87AIdXYb8jpkdcBGPHjlXbtm3Vrl07DR48WCtXrtS5556rtm3bqmfPnlq9erUkaciQIXr1\n1Vej29WuXVuSlJmZqUgkoiuuuEInn3yyBgwYIEkaNWqUvv32W3Xv3l3nnnuuRo8erd/97nfR7Z9/\n/nndfvvtBdarOPvv0aOHJOm9995T586ddfrpp6tfv37atWuXJCktLU333HOPTj/9dE2ZMkXTp0/X\n6aefrnbt2qlnz5yHbe3atUvXXXedOnXqpNNOO03Tpk2TJI0ZM0aXXHKJunfvrpYtW2rkyJGSpHvu\nuUfLly9X+/btdffdd5fy/wIAVFIBnkDhsRRUHtqiRYu8ZcuWvmnTJnd337x5s/fq1cvHjh3r7u4v\nvvii9+3b193dhwwZ4q+88kp029q1a7u7+wcffOBJSUm+du1az87O9rPOOsvnzp3r7u5paWnRfe/c\nudN/8Ytf+L59+9zdvXPnzr5o0aIC61bc/X///fd+zjnn+A8//ODu7g8//LCPHDkyut7//M//uLv7\nhg0bvFmzZr5ixQp3d9+yZYu7uw8bNswzMjKiZS1btvRdu3b56NGjvVGjRr5582bfvXu3n3LKKf7P\nf/7TV6xY4aecckpJmv2wqaifOwCVS+7fmpi5SI/4EN5//33169dP9erVkyTVrVtXH330ka6++mpJ\n0oABA/SPf/zjkPvp2LGjGjduLDNTu3bttGLFioPWiY+P17nnnqs333xTS5YsUVZWllq3bl2kehZl\n/x999JEWL16szp07q3379ho7dqxWrVoVXX7llVdG1+vWrZuOO+44SVKdOnUk5fSmH374YbVv317d\nu3fXjz/+qFWrVsnMdP7556tu3bqqWbOmLrvssiK1CQCggt1ZqyIq6BxirLLq1asrOztbUs551r17\n90aXHX300dHpatWqad++fTGP9+tf/1oPPvigTj75ZF133XVFrmdR93/eeedpwoQJMZfFx8dLKvy8\n6WuvvaYWLVrkK/v444/zzbt7mZxjRtkr7p3ECvocACg7/LU8hHPPPVdTpkzR5s2bJUmbN29W586d\nNWnSJEnS+PHjdc4550jKOc/66aefSpKmTZumrKysQ+4/ISFB27dvj8537NhRa9as0YQJE9S/f0FP\noSy6vPvv1KmT5s6dq+XLl0vKOee7bNmyg7bp1KmTZs+eHe1V73/vF1xwgUaNGhVdb8GCBZJy/ljP\nmDFDW7Zs0e7duzV16lR16dJFtWvX1o4dO0r9HlC2vIgvAIcHQXwIrVq10r333qtu3bqpXbt2uvPO\nO/X0009r9OjRatu2rcaPH68///nPkqTrr79es2bNUrt27fTRRx9FL6aSCu6J3HDDDbrwwgujF1NJ\nUr9+/dS1a1clJSUVWre8+yzK/hs0aKAxY8aof//+atu2rTp37qx///vfB23ToEEDPffcc7rsssvU\nrl276BeC++67T1lZWTr11FN1yimnaPjw4dFjd+zYUb/61a/Utm1bXX755TrttNOUnJysLl26qE2b\nNlysBQAF4OdLFVDv3r11++23q3v37qGrUiRjxozRp59+qqeffjp0VYqtqn3uzKzIvV0TQ9NAWeHn\nS0eIrVu36sQTT9QxxxxzxISwlPMB4ylGAFAy9IgruE2bNkV/x5vXzJkzo1dyo+Sq2ueOHjEQRmE9\nYoIYVVpV+9wRxEAYDE0DAFBBEcQAAAREEAMAEBBBDABAQATxYZaWlqaZM2eGrgYAoIKo0EG8//ep\n5fkqigEDBhx03+dZs2apfv36Wr9+fYneU1G4u0aNGqU2bdqodu3aatasmfr166dFixYV65jFtWLF\nCsXFxUXvmw0AKD8VOohzFPXOuCV5Fc2oUaP07rvv6u9//7skac+ePbr++uv1xBNPKCUlpUj7KOgh\nDIW57bbbNGrUKD399NPasmWLli5dqr59++rtt98u9r5Kgp+uAED5OwKCOLx69erp6aef1g033KAf\nfvhBI0aMUIsWLTRo0CBNmzZNrVu3Vt26ddW9e3ctWbIkul1aWpoeffRRnXrqqUpISNBPP/2Ub79f\nffWVjj/+eE2ePPmgYy5btkzPPPOMJk2apEgkoqOOOkq1atXS1VdfHb1v87Zt2zRo0CAde+yxSktL\n04MPPhgNz/T0dA0cODC6vwN7uZFIRPfff7+6du2qxMREXXDBBdq0aZMkRR9iUadOHSUmJmr27NlK\nTk7O1xPfsGGD4uPjo9sAAEqGIC6i/Q8yuOqqq/T888/rueee09KlS3X11Vdr1KhR2rhxoy666CL1\n7t07X+930qRJevfdd7V161ZVq1YtWv6vf/1LF154of73f/83+hzgvGbOnKlmzZqpQ4cOBdbplltu\n0Y4dO/TNN99o1qxZGjt2rEaPHi2paI+7mzhxosaMGaMNGzZo7969euyxxyRJc+bMkZQT9Nu3b9c5\n55yjq666ShkZGfm27dmzp5KTkw95HABAwQjiYnjmmWf0wQcfaPjw4WrSpIkmT56sXr16qUePHqpW\nrZruvPNO7d69Wx9++KGknDC89dZb1aRJk3zPC541a5YuueQSjRs3ThdddFHMY23atEkNGzYssC4/\n/fSTJk+erIceekjx8fE67rjjdMcdd2jcuHGSDj2sbGa69tprdcIJJ6hmzZrq16+fPvvsswK3HTRo\nkCZOnBidHzduXL4eNwCgZKqHrsCR5Nhjj1X9+vXVunVrSdK6deuUmpoaXW5matasmdauXRsta9as\nWb59uLueffZZRSKR6BBwLMnJyVq3bl2Byzdu3KisrCwdd9xx0bLU1NR8xz6UvEFfq1Yt7dy5s8B1\nO3XqpFq1aikzM1MNGzbU8uXL1adPnyIfCwAQGz3iUmjcuLFWrlwZnXd3rV69Wk2aNImWHThEbGZ6\n9tlntXLlSt1+++0F7rtHjx5as2aNPv3005jL69evr6OOOkorVqyIlq1atUpNmzaVJMXHx+uHH36I\nLvvuu++K/L4KGtYePHiwMjIyNG7cOF1xxRWqUaNGkfcJAIiNIC6Ffv366e2339b777+vrKwsPf74\n46pZs6Y6d+5c6HYJCQmaPn26Zs+erWHDhsVcp0WLFrrxxhvVv39/zZo1S3v37tWePXs0adIkPfLI\nI6pWrZr69eune++9Vzt37tTKlSv15JNPasCAAZKk9u3ba/bs2Vq9erW2bdumhx566KBjFDR83aBB\nA8XFxWn58uX5ygcMGKDXXntN48eP16BBg4rSRACAQzgCgtjK8VU6LVu2VEZGhm655RY1aNBAb7/9\ntt58801Vr37oEf+kpCTNmDFD7777roYPHx5znVGjRunmm2/WTTfdpLp16+qEE07Q1KlTo0PCTz/9\ntOLj43X88cfr7LPP1jXXXKNrr71WktSzZ09deeWVOvXUU3XGGWeod+/eMXvneaf3zx9zzDG69957\n1aVLF9WtW1fz58+XlDPMftpppykuLk5du3YtfoMBAA7CYxBRLEOHDlWTJk00cuTI0FUpE1Xtc8dj\nEIEwCnsMIhdrochWrFih1157LXp1NQCg9I6AoWlUBPfdd5/atGmju+66K9+V2gCA0mFoGlVaVfvc\nMTQNhFHY0DQ9YgAAAiKIAQAIiCAGACCgQoPYzF40s/Vm9sUB5beY2VdmtsjMHslTPszMlpnZEjM7\nv7wqDQBAZXGony+NlvS0pLH7C8ysu6Q+kk519ywza5Bb3krSlZJaSWoi6e9m1tLdebo8AAAFKLRH\n7O5zJG05oPj/k/SQu2flrvN9bvklkia6e5a7r5D0taSOZVtdQHrooYd0/fXXh64GAJSJkpwjbiHp\nHDP7yMwyzWz/A3MbS1qTZ701yukZl9j+2y6W56uo0tLSlJKSku9BCn/729/UvXv30rzFcpeWlqb3\n33+/wOWZmZmKi4vTTTfdlK+8a9eueumll8q7egWaMGGCOnTooISEBDVu3FgXXXSR5s6dK0kaNmyY\nnn/+eUk5NxmJi4tTdjYDLwCOTCW5s1Z1SXXd/UwzO0PSy5KOL2DdmD9CTE9Pj05HIhFFIpGCj5Ze\n8KJSK+a+s7Oz9ec//7nABzVUJPv27VP16tWL9DvZ+Ph4ZWRk5LtZR3G/qJSlJ554Qo888oieffZZ\nXXDBBapRo4amT5+uadOmqUuXLjG34feuACqSzMxMZWZmFmndkvSI10h6TZLc/RNJ2WZWX9JaSXkf\nvts0t+wg6enp0VehIVyBmJnuvPNOPfbYY9q2bVvMdZYsWaLzzjtPycnJOumkkzRlyhRJ0jfffKO6\ndetG17v++uuVkpISnR84cKD+/Oc/S8r5YjJs2DB16tRJSUlJ6tu3r7Zs+fnswLRp09S6dWvVrVtX\n3bt315IlS6LL0tLS9Oijj6pt27aqXbu2rr76aq1atUq9e/dWQkKCHnvssZj1rlOnjoYMGaIRI0YU\n+P5ffPFFtWrVSvXq1dOFF16oVatWSZKGDx+uW2+9VZKUlZWl+Ph43XXXXZKk3bt3q2bNmtq6dav2\n7NmjAQMGqH79+qpbt646duyoDRs2HHScbdu2afjw4XrmmWfUt29f1apVS9WqVdPFF1+sRx7JuS4w\nPT1dAwcOlKToM53r1KmjxMREzZ49W8nJyVq0aFF0nxs2bFB8fLw2bdpU4PsDgLIUiUTyZV1hShLE\nb0g6V5LMrKWkGu6+UdI0SVeZWQ0za66cIez5Jdh/hdWhQwdFIpGYgbZr1y6dd955GjBggL7//ntN\nmjRJN954o5YsWaLmzZsrMTFRCxYskCTNnj1bCQkJ0RCdPXt2vi8k48aN0+jRo7Vu3TpVr149GnRL\nly7V1VdfrVGjRmnjxo266KKL1Lt3b+3bty+67aRJk/TOO+9o27ZtmjBhglJTU/XWW29px44duvPO\nOwt8b7///e/16quvaunSpQctmzp1qh566CG9/vrr2rhxo84++2z1799fUs6Hbf+3vk8++USNGjXS\n7NmzJUnz5s3TySefrDp16uill17S9u3btWbNGm3evFnPPvusatWqddCx5s2bpz179ujSSy8tsK55\ne+pz5syRlBPg27dv1znnnKOrrrpKGRkZ0XUmTpyonj17Kjk5ucB9AkAoh/r50kRJH0pqaWarzexa\nSS9KOj73J00TJQ2SJHdfrJxh6sWS3pV0Y8x7WR7BzEwjR47U008/rY0bN+Zb9tZbb6l58+YaPHiw\n4uLi1K5dO1122WV6+eWXJUndunVTZmamvvvuO5mZLr/8cs2aNUvffPONtm/frrZt20aPMWjQILVq\n1UrHHHOMHnjgAb388svKzs7W5MmT1atXL/Xo0UPVqlXTnXfeqd27d+vDDz+MbnvrrbeqSZMmOvro\no4v13lJSUvSb3/xG999//0HL/vrXv2rYsGE68cQTFRcXp2HDhumzzz7T6tWrdeaZZ2rZsmXavHmz\n5syZo6FDh2rt2rXatWuXZs2apW7dukmSatSooU2bNmnZsmUyM7Vv314JCQkHHWvTpk2qX7++4uIK\n/mjm/VjF+ogNGjRIEydOjM6PGzcu2oMGgIqm0HPE7t6/gEUx/6q5+58k/am0larIWrdurV69eunh\nhx/WySefHC1fuXKlPv7443xD0Pv27dOgQYMk5QTxtGnT1LRpU51zzjnq1q2bxo0bp5o1a+rss8/O\nd4xmzX4e4U9NTVVWVpY2btyodevWKTU1NbrMzNSsWTOtXbs25rbFddddd+mEE07Q559/nq985cqV\nuu2223THHXfkK1+7dq2aNWumDh06aNasWZo9e7buvfdeffbZZ5o7d65mz54d7c0PHDhQq1ev1lVX\nXaWtW7dqwIABevDBBw96dnNycrI2btyo7OzsQsO4MJ06dVKtWrWUmZmphg0bavny5dFnOANARcOd\ntUpgxIgRev755/MFYGpqqrp166YtW7ZEXzt27NBf/vIXSTlBPGfOHGVmZioSiahr166aO3euZs2a\nddB58v3nX/dPH3XUUWrQoIEaN26slStXRpe5u1avXq0mTX6+OP3AC6yKc8FVcnKyfvvb3+oPf/hD\nvvLU1FQ999xz+d7brl27dOaZZ0bf28yZM7VgwQKdccYZ6tatm6ZPn6758+dHz+FWr15d999/v778\n8kt9+OGHeuuttzR27NiD6nDWWWfp6KOP1uuvv15gPfO+p4Le3+DBg5WRkaFx48bpiiuuUI0aNYrc\nDgBwOBHEJfCLX/xCV155ZfQCK0m6+OKLtXTpUmVkZCgrK0tZWVn65JNPoueBTzjhBNWsWVMZGRnq\n1q2bEhISdOyxx+rVV1+NDt9KOeGakZGhr776Sj/88IPuv/9+XXHFFTIzXXHFFXr77bf1/vvvKysr\nS48//rhq1qypzp07F1jXlJQULV++vMjv7fbbb9e8efP01VdfRct+85vf6E9/+pMWL14sKed87P4L\n0aScIB47dqxat26to446SpFIRH/72990/PHHR8/LZmZm6osvvtBPP/2khIQEHXXUUapWrdpBx09K\nStLIkSN10003aerUqfrhhx+UlZWld999V3fffXe0jfZr0KCB4uLiDnqPAwYM0Guvvabx48dHRyUA\noEJy98P6yjnkwWKVK+fnT+X6Kqq0tDSfOXNmdH716tVes2ZN7969e7Ts3//+t1988cXeoEEDT05O\n9h49evjChQujy/v37+/HH398dP7OO+/0xMREz87OjpZFIhEfNmyYd+zY0RMTE71Pnz6+adOm6PLX\nX3/dW7Vq5UlJSR6JRHzx4sUF1tHdferUqZ6amup16tTxxx9//KD39cEHH3izZs3ylT366KMeFxfn\nL730UrRs3Lhx3qZNG09MTPRmzZr50KFDo8t27NjhRx11lI8cOdLd3bOzs/3YY4/1G2+8MbrOxIkT\n/cQTT/T4+HhPSUnx2267zX/66adYTe3u7uPHj/cOHTp4fHy8N2zY0Hv16uXz5s1zd/f09HQfOHBg\ndN3777/fGzRo4HXq1PGPP/44Wt6jRw9v3rx5gcdwj/25q8wkuRfxVdXaBihPuf+eYuYizyOuYLp3\n766BAwfquuuuC12VI97QoUPVpEkTjRw5ssB1qtrnjucRA2EU9jziktzQA+WMP36lt2LFCr322mv6\n7LPPQlcFAArFOeIKKNQdrSqL++67T23atMl3pzAAqKgYmkaVVtU+dwxNA2EUNjRNjxgAgIAIYgAA\nAiKIAQAIiCAGACAgghgAgIAIYhTbqlWrlJCQwBW1AFAGKnQQm1m5v4oqLS1NM2fOzFc2ZsyYg56c\ndLjExcXpP//5T7ntf+nSpbriiivUoEED1alTR23bttWTTz6p7OxspaamaseOHdH2i0QieuGFF8qt\nLgBQmVXoIJbK90bTxVHc4D4cyqtHunz5cnXq1EnHHXecFi1apK1bt2rKlCn69NNPtWPHjoPWr2jt\nAgBHkgofxBXZgQH08MMP64QTTlBiYqJat26tN954I7pszJgx6tKli26//XbVrVtXJ5xwgj788EON\nHj1aqampSklJyfdYwCFDhug3v/mNzj//fCUmJioSiUQfj7j/0YJt27ZVQkJC9ElIzz//vFq0aKHk\n5GRdcsklWrduXXR/cXFxevbZZ9WyZUvVrVtXN998c4Hva/jw4eratasee+wxpaSkSJJatmypjIwM\nJSUlacWKFYqLi9NPP/2ke++9V3PmzNHNN9+shIQE3XLLLbr55pt155135ttnnz599NRTT5WkmQGg\ncivoaRDl9VIxn75U1CfFlORVUF1iSUtL87///e/5ykaPHu1du3aNzk+ZMsXXrVvn7u6TJ0/2+Ph4\n/+6776LrVq9e3ceMGePZ2dn+hz/8wZs0aeI333yz792719977z1PSEjwXbt2ubv74MGDPSEhwefM\nmeM//vij33bbbfmOZWa+fPny6PzMmTO9fv36vmDBAv/xxx/9lltu8XPOOSff+r179/Zt27b5qlWr\nvEGDBj59+vSY77Vhw4Y+ZsyYAtvim2++cTOLPj0pEon4Cy+8EF0+f/58b9y4cfSpUt9//70fc8wx\nvmHDhkJaOIzifAYqg+L8m6pqbQOUJxXy9CV6xEXk7urbt6/q1q0bfd100035esWXX365GjZsKEnq\n16+fWrRooY8//ji6vHnz5ho8eLDMTP369dO3336r+++/X0cddZTOO+881ahRQ19//XV0/V69eqlr\n166qUaOGHnzwQc2bN09r166NWb/x48dr6NChateunWrUqKGHHnpI8+bNi/aiJemee+5RYmKimjVr\npu7duxf4QIRNmzapUaNGxW6f/c444wwlJSVFz6lPmjRJ3bt3V4MGDYq1TwCoCgjiIjIzTZ06VVu2\nbIm+nnnmmXwBNHbsWLVv3z4a1IsWLdKmTZuiy/cP80pSrVq1JClfONWqVUs7d+6MHq9p06bRZfHx\n8apXr56+/fbbmPVbt25dvgccxMfHKzk5OV9w7/+SIEnHHHNM9FgHSk5OLvA4BTlwmH7QoEHKyMiQ\nJGVkZGjgwIHF2h8AVBUEcSnkDeGVK1fqhhtu0F/+8hdt3rxZW7Zs0SmnnFLiC6rcXatXr47O79y5\nU5s3b1bjxo1jrt+4cWOtWLEiOr9r1y5t2rRJTZo0Kfaxe/bsqVdffbXI68e6WGvAgAGaOnWqFi5c\nqCVLlqhv377FrgcAVAUEcRnZtWuXzEz169dXdna2Ro8erUWLFpVqn++8847mzp2rvXv36r777tNZ\nZ50VDdaUlBQtX748um7//v01evRoLVy4UD/++KN+//vf68wzz1RqamrMfRf2BWHEiBH68MMPdddd\nd2n9+vWSpK+//loDBw7U9u3bD1r/wLpIUtOmTdWhQwcNGjRIl19+uY4++uhiv38AqAoqfBBbOb5K\nXbc8P2lq1aqV7rjjDp111llq2LChFi1apK5du8ZcN29ZYfu++uqrNWLECCUnJ2vBggXRoV5JSk9P\n1+DBg1W3bl298sor6tGjhx544AH96le/UuPGjfXNN99o0qRJBR6rsJ9jHX/88Zo3b55WrFih1q1b\nq06dOrr88st1xhlnqHbt2gft77bbbtMrr7yievXq6be//W20fPDgwfriiy8YlgaAQvA84grq2muv\nVdOmTfXAAw+ErkqJzZkzRwMGDNDKlStDV6VAVe1zx/OIgTB4HvER6Ej/A5iVlaWnnnpK119/feiq\nAECFRhBXUBXxTl5F9dVXX6lu3bpav359vqFqAMDBGJpGlVbVPncMTQNhMDQNAEAFRRADABAQQQwA\nQEAEMQAAARHEAAAERBBXIQkJCfnuRw0ACK9CB/H+39KW56s4IpGIXnjhhXJ6t6W3bt06DR06VI0b\nN1ZiYqJOPvlkpaen64cffpAk7dixQ2lpaZKkIUOG6L777ivV8SZMmKAOHTooISFBjRs31kUXXaS5\nc+eW9m0cUlxcnP7zn/+U+3EA4HCoHroCh/TBB+W37+7di7V6Rb7JxubNm3XWWWepa9eu+uijj5Sa\nmqo1a9bo8ccf1/Lly9WmTZsyPd4TTzyhRx55RM8++6wuuOAC1ahRQ9OnT9e0adPUpUuXMj1WLPy+\nFUBlUaF7xBXV1q1b1atXLx177LGqV6+eevfune+5v5FIRPfdd5+6dOmihIQE9enTRxs3btQ111yj\npKQkdezYMd/9l5csWaLzzjtPycnJOumkkzRlypTosnfeeUetW7dWYmKimjZtqscffzxmnZ544gkl\nJSUpIyMj+sSlpk2b6sknn4yGcFxcnJYvX67nnntOEyZM0KOPPhqt32OPPabLL7883z5vvfXWmHfG\n2rZtm4YPH65nnnlGffv2Va1atVStWjVdfPHFeuSRRyRJP/74o37729+qSZMmatKkiX73u99p7969\nkqQxY8bo7LPPzrfPvL3cIUOG6KabblKvXr2UmJioM888M7rsnHPOkSS1bdtWiYmJevnll9WmTRu9\n9dZb0X1lZWWpfv36WrhwYYH/DwGgoiCISyA7O1tDhw7VqlWrtGrVKtWqVUs333xzvnUmT56sjIwM\nrV27VsuXL9dZZ52loUOHavPmzTr55JM1YsQISTmPTzzvvPM0YMAAff/995o0aZJuvPFGLVmyRJI0\ndOhQPfdXB7HbAAAb1UlEQVTcc9q+fbu+/PJLnXvuuTHr9Pe//12XXXbZIetuZrrhhht0zTXX6O67\n79aOHTs0bdo0DRgwQNOnT9e2bdskSfv27dPkyZM1ePDgg/Yxb9487dmzR5deemmBx3nwwQc1f/58\nLVy4UAsXLtT8+fP1xz/+8ZD122/y5MlKT0/Xli1bdMIJJ+jee++VJM2ePVuS9Pnnn2v79u3q16+f\nBg0alO/JVO+8846aNGmitm3bFvl4ABAKQVwC9erV06WXXqqaNWuqdu3a+v3vf69Zs2ZFl5uZrr32\nWjVv3lyJiYn65S9/qZYtW+rcc89VtWrVdMUVV2jBggWSpLfeekvNmzfX4MGDFRcXp3bt2umyyy7T\nyy+/LEmqUaOGvvzyS23fvl1JSUlq3759zDpt3rxZjRo1Ktb7yDu827BhQ5199tnR3vj06dPVoEGD\nmMfbtGmT6tevr7i4gj8+EyZM0P3336/69eurfv36Gj58uMaNG1ekepmZLrvsMnXo0EHVqlXTNddc\no88++6zA9a+55hq9/fbb2rlzpyRp3LhxPHoRwBGDIC6B3bt367/+67+UlpampKQkdevWTdu2bcsX\nbCkpKdHpmjVr6thjj803vz80Vq5cqY8//lh169aNviZMmKD169dLkl599VW98847SktLUyQS0Ucf\nfRSzTsnJyfr2229L9b4GDx4c7VlmZGQUGGbJycnauHGjsrOzC9zXt99+q+OOOy46n5qaWqz65W2/\nWrVqRdsrlsaNG6tLly565ZVXtHXrVk2fPl3XXHNNkY8FACERxMXk7nrssce0dOlSzZ8/X9u2bdOs\nWbPk7gVeQFTYBV6pqanq1q2btmzZEn3t2LFDf/nLXyRJHTp00BtvvKHvv/9effv2Vb9+/WLup2fP\nnnr99deLfBFTrDpdcskl+vzzz7Vo0SK9/fbbBYbZWWedpaOPPlqvv/56gftv3Lhxvp9KrVq1So0b\nN5YkxcfHR6/klqTvvvuuSHUuzP4vEVOmTFHnzp2LPToAAKEQxCWwc+dO1apVS0lJSdq8eXP0fG9e\neQOxsHC8+OKLtXTpUmVkZCgrK0tZWVn65JNPtGTJEmVlZWn8+PHatm2bqlWrpoSEBFWrVi3mfm6/\n/XZt375dgwcP1qpVqyRJa9eu1R133KFFixYdtH5KSspBPwGqVauWfvWrX+nqq69Wp06d1LRp05jH\nSkpK0siRI3XTTTdp6tSp+uGHH5SVlaV3331Xd999tySpf//++uMf/6iNGzdq48aNGjlyZLSH3bZt\nW3355ZdauHCh9uzZo/T09ALbLpaUlBQtX748X9mll16qf/3rXxo1apQGDRpU6PYAUKHs78nFekl6\nUdJ6SV/EWHaHpGxJ9fKUDZO0TNISSecXsE+PJVa5pHJ/FUckEvEXX3zRv/32W49EIl67dm0/8cQT\n/dlnn/W4uDj/6aefouu98MIL0e3+8Ic/+LXXXhudnzFjhrdo0SI6/+9//9svvvhib9CggScnJ3uP\nHj184cKFvnfvXr/wwgu9bt26npiY6B07dvS5c+cWWL9vv/3Wr7vuOm/YsKEnJCT4SSed5CNHjvTd\nu3e7u3tcXJwvX77c3d2XLVvm7dq18zp16vill14a3cecOXPczHzMmDGHbI/x48d7hw4dPD4+3hs2\nbOi9evXyefPmubv7nj17/NZbb/VGjRp5o0aN/LbbbvMff/wxuu2DDz7o9evX99TUVM/IyMhXtyFD\nhvh9990XXfeDDz7wZs2aRef/+te/eqNGjbxOnTo+ZcqUaPnQoUO9du3avmvXrkPWfb/ifgaOdJLc\ni/iqam0DlKfcf08xs7bQ5xGb2dmSdkoa6+5t8pQ3k/S8pBMlne7um82slaQJks6Q1ETS3yW1dPfs\nA/bpsY55JDwX9vTTT9fw4cPVp0+f0FUpN6tXr9ZJJ52k9evXq3bt2qGrUywPPPCAli1bprFjxxZ5\nmyPhc1eWeB4xEEaJn0fs7nMkbYmx6AlJdx1Qdomkie6e5e4rJH0tqWPxq1sxffnll/rqq68KvGq5\nMsjOztbjjz+u/v37H3EhvHnzZr344ou64YYbQlcFVVx53t0PlVOxzxGb2SWS1rj75wcsaixpTZ75\nNcrpGR/x7r77bl1wwQV69NFH1axZs9DVKRe7du1SYmKiZs6cGfOcd0X2/PPPKzU1Vb/85S/VtWvX\n0NUBinxuDJBU+NC0JJlZmqQ33b2NmR0j6QNJ57n7djP7RlIHd99kZk9L+sjdx+du9zdJ77j7awfs\n74gdmkblU9U+dwxNlz/aGLEUNjRd3HtN/0JSmqSFuUMqTSV9amadJK2VlLe72DS37CB5r5KNRCKK\nRCLFrAYAABVXZmamMjMzi7RusXrEMZZ9o4Mv1uqony/WOuHA7i89YlQkVe1zR2+t/NHGiKXEF2uZ\n2URJH0pqaWarzezaA1aJfoLcfbGklyUtlvSupBtjJi4AAIg6ZI+4zA9YSI8YCKEqfV+kt1b+aGPE\nUpbniMsNH0YAQFXELS4BAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogB\nAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhi\nAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCC\nGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACKjSIzexF\nM1tvZl/kKfsfM/vKzBaa2WtmlpRn2TAzW2ZmS8zs/PKsOAAAlcGhesSjJV14QNl7klq7e1tJSyUN\nkyQzayXpSkmtcrd5xszocQMAUIhCg9Ld50jackDZDHfPzp39WFLT3OlLJE109yx3XyHpa0kdy7a6\nAABULqXtsV4n6Z3c6caS1uRZtkZSk1LuHwCASq16STc0s3sl7XX3CYWs5rEK09PTo9ORSESRSKSk\n1QAAoMLJzMxUZmZmkdY195hZ+fMKZmmS3nT3NnnKhki6XlIPd9+TW3aPJLn7w7nz0yUNd/ePD9if\nH+qYAMqHmcX+dhxrXUn8Wy0+2hixmJnc3WItK/bQtJldKOm/JV2yP4RzTZN0lZnVMLPmklpIml+S\nCgMAUFUUOjRtZhMldZNU38xWSxqunKuka0iaYWaSNM/db3T3xWb2sqTFkvZJupGuLwAAhTvk0HSZ\nH5ChaSAYhk3LH22MWMp0aBoAAJQdghgAgIAIYgAAAiKIAQAIiCAGACAgghgAgIAIYgAAAiKIAQAI\niCAGACAgghgAgIAIYgAAAiKIAQAIiCAGACAgghgAgIAIYgAAAiKIAQAIiCAGACAgghgAgIAIYgAA\nAiKIAQAIiCAGACAgghgAgIAIYgAAAiKIAQAIiCAGACCg6qErAEiSmRV7G3cvh5oAwOFFEKPCKE6s\nFj+2AaBiYmgaAICACGIAAAIiiAEACIggBgAgIIIYAICACGIAAAIiiAEACIggBgAgIIIYAICACGIA\nAAIiiAEACIggBgAgIIIYAICACGIAAAIqNIjN7EUzW29mX+Qpq2dmM8xsqZm9Z2Z18iwbZmbLzGyJ\nmZ1fnhUHAKAyOFSPeLSkCw8ou0fSDHdvKWlm7rzMrJWkKyW1yt3mGTOjxw0AQCEKDUp3nyNpywHF\nfSS9lDv9kqS+udOXSJro7lnuvkLS15I6ll1VAQCofErSY01x9/W50+slpeRON5a0Js96ayQ1KUXd\nAACo9KqXZmN3dzPzwlaJVZienh6djkQiikQipakGAAAVSmZmpjIzM4u0rrkXlqOSmaVJetPd2+TO\nL5EUcffvzKyRpA/c/SQzu0eS3P3h3PWmSxru7h8fsD8/1DFR9ZhZ7G9tBa0vic9R8RWnnWnjkqGN\nEYuZyd0t1rKSDE1PkzQ4d3qwpDfylF9lZjXMrLmkFpLml2D/AABUGYUOTZvZREndJNU3s9WS7pf0\nsKSXzWyopBWS+kmSuy82s5clLZa0T9KNdH0BACjcIYemy/yADE0jBoamDw+GTcsfbYxYynpoGgAA\nlBGCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogB\nAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhi\nAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCC\nGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAioxEFsZsPM7Esz+8LMJpjZ0WZWz8xm\nmNlSM3vPzOqUZWUBAKhsShTEZpYm6XpJp7l7G0nVJF0l6R5JM9y9paSZufMAAKAAJe0Rb5eUJekY\nM6su6RhJ30rqI+ml3HVektS31DUEAKASK1EQu/tmSY9LWqWcAN7q7jMkpbj7+tzV1ktKKZNaAgBQ\nSZV0aPoXkn4rKU1SY0m1zWxA3nXc3SV5aSsIAEBlVr2E23WQ9KG7b5IkM3tN0lmSvjOzhu7+nZk1\nkrQh1sbp6enR6UgkokgkUsJqAABQ8WRmZiozM7NI61pOx7V4zKytpPGSzpC0R9IYSfMlHSdpk7s/\nYmb3SKrj7vccsK2X5Jio3MysWMMnJonPUfEVp51p45KhjRGLmcndLeaykn4IzOwuSYMlZUv6l6Rf\nS0qQ9LKkVEkrJPVz960HbEcQ4yAEccmYxfx3XShConwRxIilXIK4FJUhiHEQgrhkzExKL8YG6QRx\neSOIEUthQcydtQAACIggBgAgIIIYAICACGIAAAIiiAEACIggBgAgIIIYAICACGIAAAIiiAEACIgg\nBgAgoJI+fQkAqoyS3NMbKCqCGACKIr2M1wNyMTQNAEBABDEAAAERxAAABMQ5YgDAEaW4F89V9Gc+\nE8QAgCNOUaP1SLjenaFpAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhi\nAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgKqHrgAAAGYWugrBEMQAgIoh\nvYzXO0IwNA0AQEAEMQAAARHEAAAExDlilJuqfPEFABQVQYzylV7G6wFAJcPQNAAAAZU4iM2sjpm9\nYmZfmdliM+tkZvXMbIaZLTWz98ysTllWFgCAyqY0PeI/S3rH3U+WdKqkJZLukTTD3VtKmpk7DwAA\nClCiIDazJElnu/uLkuTu+9x9m6Q+kl7KXe0lSX3LpJYAAFRSJe0RN5f0vZmNNrN/mdnzZhYvKcXd\n1+eus15SSpnUEgCASqqkV01Xl3SapJvd/RMze0oHDEO7u5uZx9o4PT09Oh2JRBSJREpYDQAAKp7M\nzExlZmYWad2SBvEaSWvc/ZPc+VckDZP0nZk1dPfvzKyRpA2xNs4bxAAAVDYHdjJHjBhR4LolGpp2\n9+8krTazlrlFPSV9KelNSYNzywZLeqMk+wcAoKoozQ09bpE03sxqSFou6VpJ1SS9bGZDJa2Q1K/U\nNawAinuHKPeYI/IAABykxEHs7gslnRFjUc+SV6fiKmq0clNHAEBxcGctAAACIogBAAiIIAYAICCC\nGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiI\nIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAAC\nIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACIogBAAiIIAYAICCCGACA\ngAhiAAACIogBAAiIIAYAICCCGACAgEoVxGZWzcwWmNmbufP1zGyGmS01s/fMrE7ZVBMAgMqptD3i\n2yQtluS58/dImuHuLSXNzJ0HAAAFKHEQm1lTSRdJ+pskyy3uI+ml3OmXJPUtVe0AAKjkStMjflLS\nf0vKzlOW4u7rc6fXS0opxf4BAKj0ShTEZtZL0gZ3X6Cfe8P5uLvr5yFrAAAQQ/USbtdZUh8zu0hS\nTUmJZjZO0noza+ju35lZI0kbYm2cnp4enY5EIopEIiWsBgAAFU9mZqYyMzOLtK7ldFxLzsy6SbrT\n3Xub2aOSNrn7I2Z2j6Q67n7PAet7aY95uJlZkbv2JulIe3/lxcyk9CKunF684RPaOUex2lgqVjvT\nxj8rr88ybfyzyt7GZiZ3jzmCXFa/I97/Lh+WdJ6ZLZV0bu48AAAoQEmHpqPcfZakWbnTmyX1LO0+\nAQCoKrizFgAAARHEAAAERBADABAQQQwAQEAEMQAAARHEAAAERBADABAQQQwAQEAEMQAAARHEAAAE\nRBADABAQQQwAQEAEMQAAARHEAAAERBADABAQQQwAQEAEMQAAARHEAAAERBADABAQQQwAQEAEMQAA\nARHEAAAERBADABAQQQwAQEAEMQAAARHEAAAERBADABAQQQwAQEAEMQAAARHEAAAERBADABAQQQwA\nQEAEMQAAAVUPXYFQzCx0FQAAqLpBLElKL+P1AAAoJoamAQAIiCAGACAgghgAgIAIYgAAAiKIAQAI\niCAGACCgEgWxmTUzsw/M7EszW2Rmt+aW1zOzGWa21MzeM7M6ZVtdAAAql5L2iLMk/c7dW0s6U9JN\nZnaypHskzXD3lpJm5s4DAIAClCiI3f07d/8sd3qnpK8kNZHUR9JLuau9JKlvWVQSAIDKqtTniM0s\nTVJ7SR9LSnH39bmL1ktKKe3+AQCozEp1i0szqy3pVUm3ufuOvPdvdnc3M4+1XXp6enQ6EokoEomU\nphoAAFQomZmZyszMLNK6JQ5iMztKOSE8zt3fyC1eb2YN3f07M2skaUOsbfMGMQAAlc2BncwRI0YU\nuG5Jr5o2SS9IWuzuT+VZNE3S4NzpwZLeOHBbAADws5L2iLtIGiDpczNbkFs2TNLDkl42s6GSVkjq\nV+oaokLh8ZEAULZKFMTu/g8V3JvuWfLq4MgQ89R/DIQ2ABwKd9YCACAgghgAgIAIYgAAAiKIAQAI\niCAGACAgghgAgIAIYgAAAiKIAQAIiCAGACAgghgAgIAIYgAAAiKIAQAIiCAGACCgkj4GEQiuOI9k\ndC/qE6MA4PAiiHHk+uCDoq3XvXv51gMASoGhaQAAAiKIAQAIiCAGACAgzhEDKBAXxAHljyAGUDAu\niAPKHUEMACgXxRlRqcoIYgBAOSrqKYuqG9pcrAUAQED0iIEKhuE8oGohiIEKieE8oKpgaBoAgIAI\nYgAAAiKIAQAIiCAGACAgghgAgIAIYgAAAiKIAQAIiCAGACAgbugBoEqqKHcw41GTIIgBVGEV4A5m\nPGqyymNoGgCAgAhiAAACIogBAAiIIAYAICCCGACAgAhiAAACKvMgNrMLzWyJmS0zs7vLev8AAFQm\nZRrEZlZN0v9KulBSK0n9zezksjwGAACVSVn3iDtK+trdV7h7lqRJki4p42MAAFBplHUQN5G0Os/8\nmtwyAAAQg5XlvUvN7FeSLnT363PnB0jq5O635FmHm6UCAKocd495r9Syvtf0WknN8sw3U06v+JAV\nAQCgKirroel/SmphZmlmVkPSlZKmlfExAACoNMq0R+zu+8zsZkn/v6Rqkl5w96/K8hgAAFQmZXqO\nGAAAFA/PI47BzFIkNVXOw0rXuvv6wFWqdGjjw4N2Ln+0cfmr7G1MjzgPM2sv6f8k1dHPF5k1lbRV\n0o3u/q9QdassaOPDg3Yuf7Rx+asqbUwQ52FmCyXd4O4fH1B+pqRn3b1tmJpVHrTx4UE7lz/auPxV\nlTbmoQ/5HXPg/3BJcvePJMUHqE9lRBsfHrRz+aONy1+VaGPOEef3rpm9I+kl5dwhzJTzW+hBkqaH\nrFglQhsfHrRz+aONy1+VaGOGpg9gZhdJ6qOfb825VtI0d38nXK0qF9r48KCdyx9tXP6qQhsTxAAA\nBMQ54iIys/8KXYfKjjY+PGjn8kcbl7/K1MYEMQAAATE0fQAzO1lSY0kfu/vOPOW/dPd3w9Ws8jCz\nrpI2u/tiM4tI6iBpgbvPDFuzys3Mxrr7oND1qKzM7GzlPJP9C3d/L3R9KoPcnyl95e7bzOwYSfdI\nOk3Sl5IecvetQStYRgjiPMzsVkk3SfpKUntJt7n7G7nLFrh7+5D1qwzM7CFJ3ZVzL/IPJJ0j6W1J\n50l6093/J2D1Kg0ze1M5dyHK+7SzcyW9L8ndvU+QilUiZjbf3TvmTl+vnL8dr0s6X9Jb7v5QyPpV\nBma2WNKpuc8xeF7SLkmvSOqZW35Z0AqWEYI4DzNbJOlMd99pZmnK+R+e4e5PEcRlY/8/LEk1JK2X\n1DT3224t5YxCnBq0gpWEmS2QtFjS3yRlKyeQJ0q6SpLcfVa42lUOef8mmNk/Jf3S3b83s3jlfJZP\nCVvDI5+ZfeXuJ+dO/8vdT8uzbCE39KicbP9wtLuvkBSR9Esze1L5exYoub3uvs/df5C03N23SZK7\n71ZOYKBsdJD0qaR7JW1390xJe9x9FiFcZqqZWT0zS5ZUzd2/lyR33yVpX9iqVRpfmtl1udMLzewM\nSTKzlpL2hqtW2SKI89tgZu32z+SGci9JycrpxaH0fsw91yPlnOuRJJlZHRHEZcbdf3L3JyQNkfR7\nM/uLuIFPWUtUzpedf0pKMrNGkmRmCUFrVbn8WlI3M/uPpFaSPjSzb5Qz0vProDUrQwxN52FmzSRl\nuft3B5SbpC7u/o8wNas8zKymu++JUV5fUiN3/yJAtSo9M+slqbO7/z50XSq73C+aKe7+Tei6VBZm\nliSpuXK+TK458G/0kY4gBgAgIIamAQAIiCAGACAgghgAgIAIYgAAAiKIAQAIiCAGjlBm9t9mdkvu\n9JNmNjN3+lwzyzCzZ8zsEzNbZGbpucsuNLOX8+wjkns7TJnZ+Wb2oZl9amYv594hCkA5I4iBI9ds\nSWfnTneQFG9m1XPLZkm6193PkNRWOTdFOEXSDEmdcm8pKklXSpqY+zvueyX1cPfTlXOjitsP31sB\nqi6CGDhy/UvS6bl3ctojaZ5yArmrpH9IutLMPs1dr7WkVu7+k6TpkvrkhvZFkqZKOlM/37logaRB\nklIP8/sBqiRueQccodw9K/d2f0MkfSjpc+U8YekESbsl3SGpQ+5DNUZLqpm76SRJN0vaLOkTd9+V\nc/M4zXD3qw/vuwBAjxg4ss2RdKdyhqLnSPqNcnrAicp5ZNx2M0uR9Ms828xWzn2+r1dOKEvSx5K6\nmNkvJMnM4s2sxWF5B0AVRxADR7Y5khpKmufuG5TTE57j7p9LWiBpiaTxyhmqlpTzQAhJb0m6MPe/\nyn1y0BDlnC9eqJwe9omH720AVRf3mgYAICB6xAAABEQQAwAQEEEMAEBABDEAAAERxAAABEQQAwAQ\nEEEMAEBABDEAAAH9P2ELfz1DuS6SAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "t.unstack(0)[['York County','Newport News City','Hampton City','James City County']].plot(kind='bar',figsize=(8, 8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also stack data we have unstacked:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "wave county_intercept \n", "2 Accomack County 101\n", " Chesapeake City 10\n", " Hampton City 59\n", " Isle of Wight County 17\n", " Newport News City 25\n", " Norfolk City 18\n", " Northampton County 7\n", " Virginia Beach City 382\n", " Westmoreland County 5\n", "3 Accomack County 172\n", " Chesapeake City 8\n", " Essex County 19\n", " Gloucester County 40\n", " Hampton City 108\n", " James City County 13\n", " Mathews County 20\n", " Middlesex County 64\n", " Newport News City 63\n", " Norfolk City 125\n", " Northampton County 73\n", " Northumberland County 11\n", " Poquoson City 23\n", " Portsmouth City 31\n", " Richmond County 21\n", " Suffolk City 11\n", " Surry County 2\n", " Virginia Beach City 427\n", " Westmoreland County 5\n", " York County 19\n", "4 Accomack County 181\n", " ... \n", "5 Isle of Wight County 9\n", " James City County 5\n", " Mathews County 2\n", " Middlesex County 63\n", " Newport News City 74\n", " Norfolk City 98\n", " Northampton County 49\n", " Northumberland County 5\n", " Poquoson City 12\n", " Richmond County 4\n", " Suffolk City 9\n", " Surry County 2\n", " Virginia Beach City 341\n", " York County 5\n", "6 Accomack County 1\n", " Chesapeake City 33\n", " Essex County 15\n", " Gloucester County 16\n", " Hampton City 72\n", " Mathews County 15\n", " Middlesex County 28\n", " Newport News City 65\n", " Norfolk City 40\n", " Northampton County 60\n", " Northumberland County 4\n", " Poquoson City 14\n", " Portsmouth City 20\n", " Virginia Beach City 314\n", " Westmoreland County 3\n", " York County 10\n", "dtype: float64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t_unstacked = t.unstack(0)\n", "t_unstacked.stack()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Pivot Table\n", "\n", "Borrowing from excel, pandas can create pivot tables of summary statistics. Suppose we want counties as rows and waves as columns, with the average and standard deviation of 2 and 12 month avidity. " ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanstd
ffdays2ffdays12ffdays2ffdays12
wave23456234562345623456
county_intercept
Accomack County2.7722772.2906983.5248624.421021.564369.66279111.1160216.7507.3059994.6950676.1305878.094064-35.4799717.5116419.149237.3195-
Chesapeake City3.25.7584.66666710.0606113.926.6253623.3333345.03034.1041987.75978601.1547019.45032916.2921234.5912413.416415.77350338.89448
Essex County-1.57894712-2.733333-25.4210560.66667-12.06667-2.65237415.6205-3.788454-38.4134259.00282-11.46714
Gloucester County-4.0589.7428576.5-23.57544.62549.4857139.0625-5.50500911.356417.400715.680376-32.0519563.03852166.512448.03536
Hampton City2.7457633.6111114.9075143.4255325.45833338.0169522.9629627.2890221.9007140.4583312.7814410.1951412.379789.48851516.67835131.303797.6488108.988489.36716163.9646
Isle of Wight County3--6.111111-24.47059--26.77778-7.656696--11.02774-42.89102--32.86631-
James City County-3.8461546.6363641.4--23.4615431.818184.2--5.4443575.3902271.67332--34.8965544.424813.420526-
King William County--5----9.666667----6.383573----9.526279--
Mathews County-7.111.833333.57.2-66.4582.333332121.33333-21.6087224.086172.121328.32552-220.9033249.337412.7279226.90636
Middlesex County-1.9843755.254.4603174.857143-10.6093824.9666717.761924.96429-2.7515787.0605735.8718175.133148-15.1548837.8807130.0562531.11208
Newport News City8.441.7142863.1139244.4864864.55384696.8840.8730212.5189916.4729729.1076926.975423.2943193.2659373.7388865.500087272.1627175.236720.6663717.7703939.94376
Norfolk City1.7777782.4083.0440256.6734694.87525.7222223.47214.3584925.244926.0752.510119.6303424.4697948.9267074.83145430.9948994.8705932.0538132.6038624.63413
Northampton County3.4285713.2739734.9166676.551022.53333337.2857115.9041114.3981528.4081618.416674.5039674.2923076.7299376.9643773.4712753.3376524.7460219.5127337.2110621.58962
Northumberland County-0.18181821.3754.673.5-2.1818188.12523.6749.25-0.60302272.7222634.21900549-5.92989310.7229724.47039498.1669
Poquoson City-7.60869610.181827.8333332.5-53.5217434.727272211.78571-6.4437364.9561725.7498352.710379-65.9344524.7551616.7549210.45267
Portsmouth City-3.4516135.461538-10.45-12.9677416.61538-47.15-5.7841094.313069-13.93018-14.086613.46315-54.04703
Richmond County-421.75--14.8571426--4.6260132.3094011.707825--23.55482.3094013.265986-
Suffolk City-2.7272733.8461544.222222--7.63636412.615389--3.6080724.6876823.666667--6.32886615.724396.800735-
Surry County-0522--0687.5--04.24264119.79899--05.65685488.38835-
Virginia Beach City2.5445033.6018743.7645634.9442825.53821728.0863929.4566718.8907820.0791831.767526.5897296.7410739.0494949.2964168.54943242.5247991.2588874.7388761.24855100.9227
Westmoreland County82.80.25-025.420.61.25-0.333333310.954452.1679480.5-032.4160514.028542.5-0.5773503
York County-4.7894747.7777783.86.6-29.9473736.222221141.5-3.85254510.109135.019967.834397-33.1553137.8740612.5698140.6072
\n", "
" ], "text/plain": [ " mean \\\n", " ffdays2 \n", "wave 2 3 4 5 6 \n", "county_intercept \n", "Accomack County 2.772277 2.290698 3.524862 4.42 10 \n", "Chesapeake City 3.2 5.75 8 4.666667 10.06061 \n", "Essex County - 1.578947 12 - 2.733333 \n", "Gloucester County - 4.05 8 9.742857 6.5 \n", "Hampton City 2.745763 3.611111 4.907514 3.425532 5.458333 \n", "Isle of Wight County 3 - - 6.111111 - \n", "James City County - 3.846154 6.636364 1.4 - \n", "King William County - - 5 - - \n", "Mathews County - 7.1 11.83333 3.5 7.2 \n", "Middlesex County - 1.984375 5.25 4.460317 4.857143 \n", "Newport News City 8.44 1.714286 3.113924 4.486486 4.553846 \n", "Norfolk City 1.777778 2.408 3.044025 6.673469 4.875 \n", "Northampton County 3.428571 3.273973 4.916667 6.55102 2.533333 \n", "Northumberland County - 0.1818182 1.375 4.6 73.5 \n", "Poquoson City - 7.608696 10.18182 7.833333 2.5 \n", "Portsmouth City - 3.451613 5.461538 - 10.45 \n", "Richmond County - 4 2 1.75 - \n", "Suffolk City - 2.727273 3.846154 4.222222 - \n", "Surry County - 0 5 22 - \n", "Virginia Beach City 2.544503 3.601874 3.764563 4.944282 5.538217 \n", "Westmoreland County 8 2.8 0.25 - 0 \n", "York County - 4.789474 7.777778 3.8 6.6 \n", "\n", " \\\n", " ffdays12 \n", "wave 2 3 4 5 6 \n", "county_intercept \n", "Accomack County 21.56436 9.662791 11.11602 16.7 50 \n", "Chesapeake City 13.9 26.625 36 23.33333 45.0303 \n", "Essex County - 25.42105 60.66667 - 12.06667 \n", "Gloucester County - 23.575 44.625 49.48571 39.0625 \n", "Hampton City 38.01695 22.96296 27.28902 21.90071 40.45833 \n", "Isle of Wight County 24.47059 - - 26.77778 - \n", "James City County - 23.46154 31.81818 4.2 - \n", "King William County - - 9.666667 - - \n", "Mathews County - 66.45 82.33333 21 21.33333 \n", "Middlesex County - 10.60938 24.96667 17.7619 24.96429 \n", "Newport News City 96.88 40.87302 12.51899 16.47297 29.10769 \n", "Norfolk City 25.72222 23.472 14.35849 25.2449 26.075 \n", "Northampton County 37.28571 15.90411 14.39815 28.40816 18.41667 \n", "Northumberland County - 2.181818 8.125 23.6 749.25 \n", "Poquoson City - 53.52174 34.72727 22 11.78571 \n", "Portsmouth City - 12.96774 16.61538 - 47.15 \n", "Richmond County - 14.85714 2 6 - \n", "Suffolk City - 7.636364 12.61538 9 - \n", "Surry County - 0 6 87.5 - \n", "Virginia Beach City 28.08639 29.45667 18.89078 20.07918 31.76752 \n", "Westmoreland County 25.4 20.6 1.25 - 0.3333333 \n", "York County - 29.94737 36.22222 11 41.5 \n", "\n", " std \\\n", " ffdays2 \n", "wave 2 3 4 5 6 \n", "county_intercept \n", "Accomack County 7.305999 4.695067 6.130587 8.094064 - \n", "Chesapeake City 4.104198 7.759786 0 1.154701 9.450329 \n", "Essex County - 2.652374 15.6205 - 3.788454 \n", "Gloucester County - 5.505009 11.3564 17.40071 5.680376 \n", "Hampton City 12.78144 10.19514 12.37978 9.488515 16.67835 \n", "Isle of Wight County 7.656696 - - 11.02774 - \n", "James City County - 5.444357 5.390227 1.67332 - \n", "King William County - - 6.383573 - - \n", "Mathews County - 21.60872 24.08617 2.12132 8.32552 \n", "Middlesex County - 2.751578 7.060573 5.871817 5.133148 \n", "Newport News City 26.97542 3.294319 3.265937 3.738886 5.500087 \n", "Norfolk City 2.51011 9.630342 4.469794 8.926707 4.831454 \n", "Northampton County 4.503967 4.292307 6.729937 6.964377 3.47127 \n", "Northumberland County - 0.6030227 2.722263 4.219005 49 \n", "Poquoson City - 6.443736 4.956172 5.749835 2.710379 \n", "Portsmouth City - 5.784109 4.313069 - 13.93018 \n", "Richmond County - 4.626013 2.309401 1.707825 - \n", "Suffolk City - 3.608072 4.687682 3.666667 - \n", "Surry County - 0 4.242641 19.79899 - \n", "Virginia Beach City 6.589729 6.741073 9.049494 9.296416 8.549432 \n", "Westmoreland County 10.95445 2.167948 0.5 - 0 \n", "York County - 3.852545 10.10913 5.01996 7.834397 \n", "\n", " \n", " ffdays12 \n", "wave 2 3 4 5 6 \n", "county_intercept \n", "Accomack County 35.47997 17.51164 19.1492 37.3195 - \n", "Chesapeake City 16.29212 34.59124 13.41641 5.773503 38.89448 \n", "Essex County - 38.41342 59.00282 - 11.46714 \n", "Gloucester County - 32.05195 63.03852 166.5124 48.03536 \n", "Hampton City 131.3037 97.6488 108.9884 89.36716 163.9646 \n", "Isle of Wight County 42.89102 - - 32.86631 - \n", "James City County - 34.89655 44.42481 3.420526 - \n", "King William County - - 9.526279 - - \n", "Mathews County - 220.9033 249.3374 12.72792 26.90636 \n", "Middlesex County - 15.15488 37.88071 30.05625 31.11208 \n", "Newport News City 272.1627 175.2367 20.66637 17.77039 39.94376 \n", "Norfolk City 30.99489 94.87059 32.05381 32.60386 24.63413 \n", "Northampton County 53.33765 24.74602 19.51273 37.21106 21.58962 \n", "Northumberland County - 5.929893 10.72297 24.47039 498.1669 \n", "Poquoson City - 65.93445 24.75516 16.75492 10.45267 \n", "Portsmouth City - 14.0866 13.46315 - 54.04703 \n", "Richmond County - 23.5548 2.309401 3.265986 - \n", "Suffolk City - 6.328866 15.72439 6.800735 - \n", "Surry County - 0 5.656854 88.38835 - \n", "Virginia Beach City 42.52479 91.25888 74.73887 61.24855 100.9227 \n", "Westmoreland County 32.41605 14.02854 2.5 - 0.5773503 \n", "York County - 33.15531 37.87406 12.56981 40.6072 " ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(trips_fips.loc['VA'],index='county_intercept',columns='wave',\n", " values=['ffdays2','ffdays12'],aggfunc=[np.mean, np.std],fill_value=\"-\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting Data from Pandas\n", "\n", "You may want to export dataframes and other information to other packages (e.g. latex, excel, or stata). Pandas has a lot of `to_` methods on pandas dataframes and series. For example we can export some interesting information to latex table code: " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "(t.unstack().head(25)).to_latex('excel_example.tex',na_rep='-')" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Or, we can save the same information to excel:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": true }, "outputs": [], "source": [ "(t.unstack().head(25)).to_excel('excel_example.xls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Econometrics in Python" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ideducln_wagepexptimeabilitymeducfeducbroken_homesiblingspexp2
04122.14240.261210144
16151.91440.4412160216
28132.32840.5112151264
311141.64141.821617121
412132.1664-1.3013120536
\n", "
" ], "text/plain": [ " id educ ln_wage pexp time ability meduc feduc broken_home \\\n", "0 4 12 2.14 2 4 0.26 12 10 1 \n", "1 6 15 1.91 4 4 0.44 12 16 0 \n", "2 8 13 2.32 8 4 0.51 12 15 1 \n", "3 11 14 1.64 1 4 1.82 16 17 1 \n", "4 12 13 2.16 6 4 -1.30 13 12 0 \n", "\n", " siblings pexp2 \n", "0 4 4 \n", "1 2 16 \n", "2 2 64 \n", "3 2 1 \n", "4 5 36 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statsmodels.formula.api as smf \n", "\n", "# load data and create dataframe\n", "tobias_koop=pd.read_csv('https://rlhick.people.wm.edu/econ407/data/tobias_koop_t_4.csv')\n", "\n", "tobias_koop.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: ln_wage R-squared: 0.166\n", "Model: OLS Adj. R-squared: 0.163\n", "Method: Least Squares F-statistic: 51.36\n", "Date: Thu, 17 Sep 2015 Prob (F-statistic): 1.83e-39\n", "Time: 14:52:15 Log-Likelihood: -583.66\n", "No. Observations: 1034 AIC: 1177.\n", "Df Residuals: 1029 BIC: 1202.\n", "Df Model: 4 \n", "Covariance Type: nonrobust \n", "===============================================================================\n", " coef std err t P>|t| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------\n", "Intercept 0.4603 0.137 3.353 0.001 0.191 0.730\n", "educ 0.0853 0.009 9.179 0.000 0.067 0.104\n", "pexp 0.2035 0.024 8.629 0.000 0.157 0.250\n", "pexp2 -0.0124 0.002 -5.438 0.000 -0.017 -0.008\n", "broken_home -0.0087 0.036 -0.244 0.807 -0.079 0.061\n", "==============================================================================\n", "Omnibus: 55.892 Durbin-Watson: 1.761\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 112.050\n", "Skew: -0.355 Prob(JB): 4.66e-25\n", "Kurtosis: 4.448 Cond. No. 391.\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "formula = \"ln_wage ~ educ + pexp + pexp2 + broken_home\"\n", "results = smf.ols(formula,tobias_koop).fit()\n", "print(results.summary())" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 1 }