{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This page has been moved to [https://econ.pages.code.wm.edu/414/syllabus/docs/index.html](https://econ.pages.code.wm.edu/407/notes/docs/index.html) and is no longer being maintained here." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.simplefilter(action='ignore', category=FutureWarning)\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "import urllib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Pandas\n", "\n", "Pandas works hand-in-hand with other python libraries (e.g. matplotlib and numpy) to make manipulating data (what the Pandas team calls \"Data Munging\") easy. With pandas it is easy to \n", "\n", "1. Easily access data using variable names, but have full linear algebra capabilities of numpy\n", "2. Group data by values (for example, calculate the mean income by state)\n", "3. Plot and summarize values\n", "4. Join (combine) different sources of data\n", "5. Powerful time series and panel data capabilities (beyond scope of course)\n", "\n", "Note: If you want to reinforce some of these concepts on your own, I recommend this superb youtube video (https://www.youtube.com/watch?v=5JnMutdy6Fw) and accompanying coursework (https://github.com/brandon-rhodes/pycon-pandas-tutorial) by Brandon Rhodes.\n", "\n", "There are numerous ways to get data into Pandas:\n", "\n", "* Import excel and comma delimited data\n", "* Import stata, sas, matlab, and other datasets\n", "* Import data from an SQL server\n", "* Import data scraped from the web\n", "* Manually building (by typing in values)\n", "\n", "In this tutorial we will focus on the first two methods for reading data, but just know that there is probably a way to get your data into pandas irrespective of what format it is in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading and Cleaning Data\n", "\n", "We will be loading 2010 \"Trip\" data from the NOAA Fisheries Service Recreational Fisheries Statistics Survey (called MRIP). More detail here: https://www.fisheries.noaa.gov/topic/recreational-fishing-data\n", "\n", "We will also load the Census Bureau's county and state fips file, obtained here: https://www.census.gov/2010census/xls/fips_codes_website.xls\n", "\n", "The MRIPs data needs to be downloaded from my website (you only need to do this once)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# run this once and then comment out with #\n", "# urllib.request.urlretrieve(\"https://rlhick.people.wm.edu/pubp622/data/mrips_2010.pydata\", \"mrips_2010.pydata\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "trips_2010 = pd.io.pickle.read_pickle('mrips_2010.pydata')\n", "fips = pd.io.excel.read_excel('https://rlhick.people.wm.edu/pubp622/data/fips_codes_website.xls')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexadd_hrsadd_phageareaarea_ncarea_xart_reefasg_codeboat_hrs...timetourntsn1tsn2turtlewavewp_intyearzipintercept_date
00NaN1NaNC5NaN11710201011140.0...1330.0NaN167680NaN6.03411.8910632010.0064752010-11-14
11NaN2NaNC5NaN11710201011140.0...1332.0NaN167680NaN6.03411.8910632010.0064752010-11-14
220.01NaNC5NaN11710201011160.0...1130.0NaN168559167680NaN6.017548.8856332010.0061102010-11-16
33NaN1NaNC5NaN11710201011230.0...1200.0NaN167680168559NaN6.04489.9235792010.0064162010-11-23
44NaN1NaNC5NaN11710201011230.0...1215.0NaN167680168559NaN6.04489.9235792010.0060672010-11-23
\n", "

5 rows × 80 columns

\n", "
" ], "text/plain": [ " index add_hrs add_ph age area area_nc area_x art_reef asg_code \\\n", "0 0 NaN 1 NaN C 5 NaN 1171020101114 \n", "1 1 NaN 2 NaN C 5 NaN 1171020101114 \n", "2 2 0.0 1 NaN C 5 NaN 1171020101116 \n", "3 3 NaN 1 NaN C 5 NaN 1171020101123 \n", "4 4 NaN 1 NaN C 5 NaN 1171020101123 \n", "\n", " boat_hrs ... time tourn tsn1 tsn2 turtle wave \\\n", "0 0.0 ... 1330.0 NaN 167680 NaN 6.0 \n", "1 0.0 ... 1332.0 NaN 167680 NaN 6.0 \n", "2 0.0 ... 1130.0 NaN 168559 167680 NaN 6.0 \n", "3 0.0 ... 1200.0 NaN 167680 168559 NaN 6.0 \n", "4 0.0 ... 1215.0 NaN 167680 168559 NaN 6.0 \n", "\n", " wp_int year zip intercept_date \n", "0 3411.891063 2010.0 06475 2010-11-14 \n", "1 3411.891063 2010.0 06475 2010-11-14 \n", "2 17548.885633 2010.0 06110 2010-11-16 \n", "3 4489.923579 2010.0 06416 2010-11-23 \n", "4 4489.923579 2010.0 06067 2010-11-23 \n", "\n", "[5 rows x 80 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_2010.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Trips is a very big dataset with lots of columns we'll never use. Let's trim it down:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['index', 'add_hrs', 'add_ph', 'age', 'area', 'area_nc', 'area_x',\n", " 'art_reef', 'asg_code', 'boat_hrs', 'catch', 'celltype', 'cntrbtrs',\n", " 'cnty', 'cnty_res', 'coastal', 'compflag', 'county', 'date1', 'dist',\n", " 'distkeys', 'f_by_p', 'ffdays12', 'ffdays2', 'first', 'fshinsp_a',\n", " 'gear', 'gender', 'hrs_dtd', 'hrsf', 'id_code', 'intsite', 'kod',\n", " 'leader', 'license', 'mode2001', 'mode_asg', 'mode_f', 'mode_fx',\n", " 'monitor', 'month', 'muni_res', 'muni_trp', 'new_list', 'num_fish_a',\n", " 'num_typ2', 'num_typ3', 'num_typ4', 'num_typ6', 'num_typ9', 'on_list',\n", " 'party', 'prim1', 'prim1_common', 'prim2', 'prim2_common', 'prt_code',\n", " 'psu_id', 'pvt_res', 'reefcode', 'reg_res', 'region', 'rig', 'sep_fish',\n", " 'st', 'st_res', 'strat_id', 'strat_interval', 'sub_reg', 'telefon',\n", " 'time', 'tourn', 'tsn1', 'tsn2', 'turtle', 'wave', 'wp_int', 'year',\n", " 'zip', 'intercept_date'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_2010.columns" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "trips = trips_2010[['id_code','year','wave','intercept_date','st','prim1',\n", " 'prim1_common','prim2','prim2_common','cnty','ffdays12',\n", " 'ffdays2']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is summary statistics for numeric data columns:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearwavestcntyffdays12ffdays2
count104709.0104709.000000104709.000000104519.000000104519.000000104519.000000
mean2010.03.79487923.97461578.79817140.7714396.191936
std0.01.34758613.160875130.751209122.45600413.037239
min2010.01.0000001.0000001.0000000.0000000.000000
25%2010.03.00000012.00000019.0000001.0000000.000000
50%2010.04.00000022.00000055.00000010.0000002.000000
75%2010.05.00000037.00000099.00000035.0000007.000000
max2010.06.00000051.000000810.000000999.00000099.000000
\n", "
" ], "text/plain": [ " year wave st cnty ffdays12 \\\n", "count 104709.0 104709.000000 104709.000000 104519.000000 104519.000000 \n", "mean 2010.0 3.794879 23.974615 78.798171 40.771439 \n", "std 0.0 1.347586 13.160875 130.751209 122.456004 \n", "min 2010.0 1.000000 1.000000 1.000000 0.000000 \n", "25% 2010.0 3.000000 12.000000 19.000000 1.000000 \n", "50% 2010.0 4.000000 22.000000 55.000000 10.000000 \n", "75% 2010.0 5.000000 37.000000 99.000000 35.000000 \n", "max 2010.0 6.000000 51.000000 810.000000 999.000000 \n", "\n", " ffdays2 \n", "count 104519.000000 \n", "mean 6.191936 \n", "std 13.037239 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 2.000000 \n", "75% 7.000000 \n", "max 99.000000 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Pandas DataFrames\n", "\n", "##### Referencing columns in data:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 25.0\n", "1 10.0\n", "2 12.0\n", "3 90.0\n", "4 75.0\n", "Name: ffdays12, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# copy data frame into t (note, this is different than assignment [t=trips])\n", "t=trips.copy()\n", "\n", "t.ffdays12.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 25.0\n", "1 10.0\n", "2 12.0\n", "3 90.0\n", "4 75.0\n", "Name: ffdays12, dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t['ffdays12'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Referencing rows in data:\n", "\n", "We can use numpy-like slicing:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2
5111710201011110052010.06.02010-11-119.08839010101TAUTOG7.035.010.0
5211710201011110062010.06.02010-11-119.08839010101TAUTOG7.030.010.0
5311710201011130012010.06.02010-11-139.08839010101TAUTOG11.010.08.0
5411710201011130022010.06.02010-11-139.08839010101TAUTOG11.010.08.0
5511710201011130032010.06.02010-11-139.08839010101TAUTOG11.03.02.0
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 \\\n", "51 1171020101111005 2010.0 6.0 2010-11-11 9.0 8839010101 \n", "52 1171020101111006 2010.0 6.0 2010-11-11 9.0 8839010101 \n", "53 1171020101113001 2010.0 6.0 2010-11-13 9.0 8839010101 \n", "54 1171020101113002 2010.0 6.0 2010-11-13 9.0 8839010101 \n", "55 1171020101113003 2010.0 6.0 2010-11-13 9.0 8839010101 \n", "\n", " prim1_common prim2 prim2_common cnty ffdays12 ffdays2 \n", "51 TAUTOG 7.0 35.0 10.0 \n", "52 TAUTOG 7.0 30.0 10.0 \n", "53 TAUTOG 11.0 10.0 8.0 \n", "54 TAUTOG 11.0 10.0 8.0 \n", "55 TAUTOG 11.0 3.0 2.0 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rows 50-54\n", "t.loc[51:55]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datest
011710201011140012010.06.02010-11-149.0
111710201011140022010.06.02010-11-149.0
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9.0\n", "1 1171020101114002 2010.0 6.0 2010-11-14 9.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rows 1 and 2 for the first 5 columns\n", "t.iloc[0:2,0:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can select rows of data based on column values. Let's select all the rows where st is 9 (Connecticut)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2
011710201011140012010.06.02010-11-149.08835020102STRIPED BASS7.025.05.0
111710201011140022010.06.02010-11-149.08835020102STRIPED BASS7.010.02.0
211710201011160012010.06.02010-11-169.08835250101BLUEFISH8835020102STRIPED BASS11.012.06.0
311710201011230012010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.090.025.0
411710201011230022010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.075.020.0
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 \\\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "1 1171020101114002 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "2 1171020101116001 2010.0 6.0 2010-11-16 9.0 8835250101 \n", "3 1171020101123001 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "4 1171020101123002 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "\n", " prim1_common prim2 prim2_common cnty ffdays12 ffdays2 \n", "0 STRIPED BASS 7.0 25.0 5.0 \n", "1 STRIPED BASS 7.0 10.0 2.0 \n", "2 BLUEFISH 8835020102 STRIPED BASS 11.0 12.0 6.0 \n", "3 STRIPED BASS 8835250101 BLUEFISH 7.0 90.0 25.0 \n", "4 STRIPED BASS 8835250101 BLUEFISH 7.0 75.0 20.0 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[t.st == 9].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find all rows where days spent fishing during past 12 months exceeds 10:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2
011710201011140012010.06.02010-11-149.08835020102STRIPED BASS7.025.05.0
211710201011160012010.06.02010-11-169.08835250101BLUEFISH8835020102STRIPED BASS11.012.06.0
311710201011230012010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.090.025.0
411710201011230022010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.075.020.0
511710201011230032010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.070.010.0
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 \\\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "2 1171020101116001 2010.0 6.0 2010-11-16 9.0 8835250101 \n", "3 1171020101123001 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "4 1171020101123002 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "5 1171020101123003 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "\n", " prim1_common prim2 prim2_common cnty ffdays12 ffdays2 \n", "0 STRIPED BASS 7.0 25.0 5.0 \n", "2 BLUEFISH 8835020102 STRIPED BASS 11.0 12.0 6.0 \n", "3 STRIPED BASS 8835250101 BLUEFISH 7.0 90.0 25.0 \n", "4 STRIPED BASS 8835250101 BLUEFISH 7.0 75.0 20.0 \n", "5 STRIPED BASS 8835250101 BLUEFISH 7.0 70.0 10.0 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[t.ffdays12>10].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Math on columns: " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2temp
011710201011140012010.06.02010-11-149.08835020102STRIPED BASS7.025.05.024.981628
111710201011140022010.06.02010-11-149.08835020102STRIPED BASS7.010.02.09.281947
211710201011160012010.06.02010-11-169.08835250101BLUEFISH8835020102STRIPED BASS11.012.06.011.958769
311710201011230012010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.090.025.090.170897
411710201011230022010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.075.020.075.427539
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 \\\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "1 1171020101114002 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "2 1171020101116001 2010.0 6.0 2010-11-16 9.0 8835250101 \n", "3 1171020101123001 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "4 1171020101123002 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "\n", " prim1_common prim2 prim2_common cnty ffdays12 ffdays2 temp \n", "0 STRIPED BASS 7.0 25.0 5.0 24.981628 \n", "1 STRIPED BASS 7.0 10.0 2.0 9.281947 \n", "2 BLUEFISH 8835020102 STRIPED BASS 11.0 12.0 6.0 11.958769 \n", "3 STRIPED BASS 8835250101 BLUEFISH 7.0 90.0 25.0 90.170897 \n", "4 STRIPED BASS 8835250101 BLUEFISH 7.0 75.0 20.0 75.427539 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can create a new variable and use numpy commands:\n", "t['temp'] = t.ffdays12 + .5*np.random.randn(t.shape[0])\n", "t.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40.76974409766304" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.temp.mean()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40.77151062295116" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or we could do this all in one step if we didn't want to create a new column in t:\n", "(t.ffdays12 + .5*np.random.randn(t.shape[0])).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: You can combine pandas with numpy. This is the standard deviation of the column `ffdays12`: " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "122.45541845299698" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.std(t.ffdays12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can perform matrix/vector operations on pandas data.\n", "\n", "Here, we can transpose a slice of the data: " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
ffdays1225.010.012.090.075.070.010.015.01.0250.0
ffdays25.02.06.025.020.010.02.00.00.050.0
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9\n", "ffdays12 25.0 10.0 12.0 90.0 75.0 70.0 10.0 15.0 1.0 250.0\n", "ffdays2 5.0 2.0 6.0 25.0 20.0 10.0 2.0 0.0 0.0 50.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[['ffdays12','ffdays2']].head(10).T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do matrix multiplication:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[82320., 17187.],\n", " [17187., 3694.]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(t[['ffdays12','ffdays2']].head(10).T,t[['ffdays12','ffdays2']].head(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Note: the remainder of this notebook is not particularly relevant for Bayesian Econometrics\n", "\n", "While we won't use it in our course, you may find the following very useful for data munging.\n", "\n", "\n", "#### Summary Statistics\n", "\n", "Find mean of ffdays12 and ffdays2" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ffdays12ffdays2
count104519.000000104519.000000
mean40.7714396.191936
std122.45600413.037239
min0.0000000.000000
25%1.0000000.000000
50%10.0000002.000000
75%35.0000007.000000
max999.00000099.000000
\n", "
" ], "text/plain": [ " ffdays12 ffdays2\n", "count 104519.000000 104519.000000\n", "mean 40.771439 6.191936\n", "std 122.456004 13.037239\n", "min 0.000000 0.000000\n", "25% 1.000000 0.000000\n", "50% 10.000000 2.000000\n", "75% 35.000000 7.000000\n", "max 999.000000 99.000000" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t[['ffdays12','ffdays2']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find mean of ffdays12 and ffdays2 by state:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2temp
011710201011140012010.06.02010-11-149.08835020102STRIPED BASS7.025.05.024.981628
111710201011140022010.06.02010-11-149.08835020102STRIPED BASS7.010.02.09.281947
211710201011160012010.06.02010-11-169.08835250101BLUEFISH8835020102STRIPED BASS11.012.06.011.958769
311710201011230012010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.090.025.090.170897
411710201011230022010.06.02010-11-239.08835020102STRIPED BASS8835250101BLUEFISH7.075.020.075.427539
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 \\\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "1 1171020101114002 2010.0 6.0 2010-11-14 9.0 8835020102 \n", "2 1171020101116001 2010.0 6.0 2010-11-16 9.0 8835250101 \n", "3 1171020101123001 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "4 1171020101123002 2010.0 6.0 2010-11-23 9.0 8835020102 \n", "\n", " prim1_common prim2 prim2_common cnty ffdays12 ffdays2 temp \n", "0 STRIPED BASS 7.0 25.0 5.0 24.981628 \n", "1 STRIPED BASS 7.0 10.0 2.0 9.281947 \n", "2 BLUEFISH 8835020102 STRIPED BASS 11.0 12.0 6.0 11.958769 \n", "3 STRIPED BASS 8835250101 BLUEFISH 7.0 90.0 25.0 90.170897 \n", "4 STRIPED BASS 8835250101 BLUEFISH 7.0 75.0 20.0 75.427539 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.head()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "'st'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'ffdays12'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'ffdays2'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'st'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdescribe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mgroupby\u001b[0;34m(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)\u001b[0m\n\u001b[1;32m 5160\u001b[0m return groupby(self, by=by, axis=axis, level=level, as_index=as_index,\n\u001b[1;32m 5161\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup_keys\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mgroup_keys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msqueeze\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msqueeze\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5162\u001b[0;31m **kwargs)\n\u001b[0m\u001b[1;32m 5163\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5164\u001b[0m def asfreq(self, freq, method=None, how=None, normalize=False,\n", "\u001b[0;32m~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py\u001b[0m in \u001b[0;36mgroupby\u001b[0;34m(obj, by, **kwds)\u001b[0m\n\u001b[1;32m 1846\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'invalid type: %s'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1847\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1848\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mby\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1849\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1850\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)\u001b[0m\n\u001b[1;32m 514\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 515\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 516\u001b[0;31m mutated=self.mutated)\n\u001b[0m\u001b[1;32m 517\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 518\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py\u001b[0m in \u001b[0;36m_get_grouper\u001b[0;34m(obj, key, axis, level, sort, mutated, validate)\u001b[0m\n\u001b[1;32m 2932\u001b[0m \u001b[0min_axis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgpr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgpr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2933\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2934\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2935\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgpr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mGrouper\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mgpr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkey\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2936\u001b[0m \u001b[0;31m# Add key to exclusions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: 'st'" ] } ], "source": [ "t[['ffdays12','ffdays2']].groupby('st').describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened? We trimmed down the columns of t with the statement `t[['ffdays12','ffdays2']]` and pandas couldn't find the columns `st`. Let's try this again:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ffdays12ffdays2
countmeanstdmin25%50%75%maxcountmeanstdmin25%50%75%max
st
1.02319.054.470030173.2951660.01.008.030.00998.02319.05.53557611.3819260.00.02.06.098.0
9.02068.019.05077445.7997240.00.755.020.00998.02068.04.7113158.8321070.00.02.06.098.0
10.03691.033.415335126.3993270.01.006.020.00999.03691.05.16716313.0287350.00.01.05.099.0
12.037354.059.632543154.8756640.02.0020.050.00999.037354.07.92140115.9258360.00.03.08.099.0
13.01725.017.68289934.6274970.00.003.024.00360.01725.03.3553626.0875840.00.01.04.060.0
22.05212.023.14620153.7645610.03.0010.025.00998.05212.03.5825026.5582660.00.02.04.098.0
23.01440.025.009028112.3460570.00.002.013.00998.01440.04.97152812.4793040.00.00.04.098.0
24.04163.022.87581191.2029390.01.004.016.00999.04163.04.35791510.7964960.00.01.04.099.0
25.04181.029.579048106.8790880.01.006.024.00999.04181.05.56087112.2208320.00.01.06.099.0
28.01784.050.39517997.0102030.05.0025.050.00999.01784.06.43105411.7312760.00.03.08.099.0
33.02375.014.13136867.8480610.00.002.010.00999.02375.03.1246328.2313610.00.00.03.099.0
34.04760.035.825000111.3515860.03.0010.030.00998.04760.06.00504211.2232300.00.02.07.099.0
36.03981.036.994223101.6570530.04.0020.040.00999.03981.07.94951012.5626360.01.04.010.099.0
37.020289.029.62299886.5184520.01.006.025.00999.020289.05.41741810.7359140.00.02.06.099.0
44.01488.020.71572639.3498500.01.005.020.25320.01488.04.7513449.3225110.00.01.05.060.0
45.02867.050.676317167.7373330.00.007.030.00999.02867.07.04011217.0991360.00.02.06.099.0
51.04822.024.96225681.2596190.01.006.025.00999.04822.04.2146418.9099400.00.01.05.098.0
\n", "
" ], "text/plain": [ " ffdays12 ffdays2 \\\n", " count mean std min 25% 50% 75% max count \n", "st \n", "1.0 2319.0 54.470030 173.295166 0.0 1.00 8.0 30.00 998.0 2319.0 \n", "9.0 2068.0 19.050774 45.799724 0.0 0.75 5.0 20.00 998.0 2068.0 \n", "10.0 3691.0 33.415335 126.399327 0.0 1.00 6.0 20.00 999.0 3691.0 \n", "12.0 37354.0 59.632543 154.875664 0.0 2.00 20.0 50.00 999.0 37354.0 \n", "13.0 1725.0 17.682899 34.627497 0.0 0.00 3.0 24.00 360.0 1725.0 \n", "22.0 5212.0 23.146201 53.764561 0.0 3.00 10.0 25.00 998.0 5212.0 \n", "23.0 1440.0 25.009028 112.346057 0.0 0.00 2.0 13.00 998.0 1440.0 \n", "24.0 4163.0 22.875811 91.202939 0.0 1.00 4.0 16.00 999.0 4163.0 \n", "25.0 4181.0 29.579048 106.879088 0.0 1.00 6.0 24.00 999.0 4181.0 \n", "28.0 1784.0 50.395179 97.010203 0.0 5.00 25.0 50.00 999.0 1784.0 \n", "33.0 2375.0 14.131368 67.848061 0.0 0.00 2.0 10.00 999.0 2375.0 \n", "34.0 4760.0 35.825000 111.351586 0.0 3.00 10.0 30.00 998.0 4760.0 \n", "36.0 3981.0 36.994223 101.657053 0.0 4.00 20.0 40.00 999.0 3981.0 \n", "37.0 20289.0 29.622998 86.518452 0.0 1.00 6.0 25.00 999.0 20289.0 \n", "44.0 1488.0 20.715726 39.349850 0.0 1.00 5.0 20.25 320.0 1488.0 \n", "45.0 2867.0 50.676317 167.737333 0.0 0.00 7.0 30.00 999.0 2867.0 \n", "51.0 4822.0 24.962256 81.259619 0.0 1.00 6.0 25.00 999.0 4822.0 \n", "\n", " \n", " mean std min 25% 50% 75% max \n", "st \n", "1.0 5.535576 11.381926 0.0 0.0 2.0 6.0 98.0 \n", "9.0 4.711315 8.832107 0.0 0.0 2.0 6.0 98.0 \n", "10.0 5.167163 13.028735 0.0 0.0 1.0 5.0 99.0 \n", "12.0 7.921401 15.925836 0.0 0.0 3.0 8.0 99.0 \n", "13.0 3.355362 6.087584 0.0 0.0 1.0 4.0 60.0 \n", "22.0 3.582502 6.558266 0.0 0.0 2.0 4.0 98.0 \n", "23.0 4.971528 12.479304 0.0 0.0 0.0 4.0 98.0 \n", "24.0 4.357915 10.796496 0.0 0.0 1.0 4.0 99.0 \n", "25.0 5.560871 12.220832 0.0 0.0 1.0 6.0 99.0 \n", "28.0 6.431054 11.731276 0.0 0.0 3.0 8.0 99.0 \n", "33.0 3.124632 8.231361 0.0 0.0 0.0 3.0 99.0 \n", "34.0 6.005042 11.223230 0.0 0.0 2.0 7.0 99.0 \n", "36.0 7.949510 12.562636 0.0 1.0 4.0 10.0 99.0 \n", "37.0 5.417418 10.735914 0.0 0.0 2.0 6.0 99.0 \n", "44.0 4.751344 9.322511 0.0 0.0 1.0 5.0 60.0 \n", "45.0 7.040112 17.099136 0.0 0.0 2.0 6.0 99.0 \n", "51.0 4.214641 8.909940 0.0 0.0 1.0 5.0 98.0 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.groupby('st')[['ffdays12','ffdays2']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining DataFrames\n", "\n", "The state fips codes are hardly intuitive and even worse for the county fips codes. The goal of this section is to \n", "\n", "1. Add text-based labels\n", "2. Be able to use these labels for fast data selection\n", "\n", "To do this, we will attempt to match text based state and county labels to our trips dataframe. Pandas calls this a merge." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
State AbbreviationState FIPS CodeCounty FIPS CodeFIPS Entity CodeANSI CodeGU NameEntity Description
0AL1671242403054Abbevillecity
1AL1734602403063Adamsvillecity
2AL11178202403069Alabastercity
3AL1959882403074Albertvillecity
4AL112311322403077Alexander Citycity
\n", "
" ], "text/plain": [ " State Abbreviation State FIPS Code County FIPS Code FIPS Entity Code \\\n", "0 AL 1 67 124 \n", "1 AL 1 73 460 \n", "2 AL 1 117 820 \n", "3 AL 1 95 988 \n", "4 AL 1 123 1132 \n", "\n", " ANSI Code GU Name Entity Description \n", "0 2403054 Abbeville city \n", "1 2403063 Adamsville city \n", "2 2403069 Alabaster city \n", "3 2403074 Albertville city \n", "4 2403077 Alexander City city " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fips codes (numeric) uniquely identify counties in our trips file. So let's look at the fips file (that we grabbed from Census) above and combine it with our trips data. These column names are messy: let's clean it up.\n", "\n", "For me, column names should always:\n", "1. Be lower case\n", "2. Have no spaces or funky characters" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statestate_fipscounty_fipsfipsansi_codecounty_nameentity_type
0AL1671242403054Abbevillecity
1AL1734602403063Adamsvillecity
2AL11178202403069Alabastercity
3AL1959882403074Albertvillecity
4AL112311322403077Alexander Citycity
\n", "
" ], "text/plain": [ " state state_fips county_fips fips ansi_code county_name entity_type\n", "0 AL 1 67 124 2403054 Abbeville city\n", "1 AL 1 73 460 2403063 Adamsville city\n", "2 AL 1 117 820 2403069 Alabaster city\n", "3 AL 1 95 988 2403074 Albertville city\n", "4 AL 1 123 1132 2403077 Alexander City city" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names = ['state','state_fips','county_fips','fips','ansi_code',\n", " 'county_name','entity_type']\n", "fips.columns=names\n", "fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The merge statement takes two pandas dataframes and combines them. Notice in the merge command below we have trips,fips. The dataframe trips is known as the \"left\" dataframe and fips, the \"right\". I want to combine these columns based on the numeric fips values (state and county). These fields have different names in each of the dataframes, so I need to specify the names (in the same order) for the left (trips) and right (fips) dataframes. The option `how='left'` tells pandas to always keep rows from trips whether a match is made with fips or not. Note that if a row in the fips table doesn't match our trips data, it is discarded." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# join clean fips codes to data_trip\n", "trips_fips = pd.merge(trips,fips, left_on=['st','cnty'], \n", " right_on=['state_fips','county_fips'],\n", " how='left', sort=False)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(104709, 12)\n", "(1561000, 19)\n" ] } ], "source": [ "print(trips.shape)\n", "print(trips_fips.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result we get is disconcerting. What we wanted to achieve with the merge was to add text names to the trips table, and our merged dataset has 1.5 million rows rather than 104,709 (what we want). This probably occured because there were duplicate (or redundant information on fips codes in the fips table). To check that use the `duplicate` method to find rows with duplicate state and county fips codes." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "township 12641\n", "city 8355\n", "town 8136\n", "village 3974\n", "County 2555\n", "borough 1136\n", "Reservation 438\n", "ANVSA 221\n", "CDP 147\n", "comunidad 124\n", "OTSA 101\n", "zona urbana 79\n", "charter township 78\n", "Hawaiian Home Land 75\n", "Municipio 58\n", "SDTSA 53\n", "Rancheria 47\n", "Parish 38\n", "Indian Reservation 36\n", "Township 35\n", "Name: entity_type, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fips[fips.duplicated(['state_fips','county_fips'])]['entity_type'].value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, this file is (apparently) for census workers in the field who may encounter lots of different place names, and this file helps their data-entry program map into the correct state and county fips codes. This isn't ideal for our purposes. Since our trips table already has the correct numeric fips codes, we need a fips table that has the correct and **unique** county/city names.\n", "\n", "Grab another fips table from here: https://www.census.gov/geo/reference/codes/cou.html and choose United States.\n", "\n", "For convenience, I'll save to my website." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3235, 5)\n" ] } ], "source": [ "fips = pd.io.parsers.read_csv('https://rlhick.people.wm.edu/pubp622/data/national_county.txt',\n", " header=None, names=['state','state_fips','county_fips','county','fips_class_code'])\n", "print(fips.shape)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statestate_fipscounty_fipscountyfips_class_code
0AL11Autauga CountyH1
1AL13Baldwin CountyH1
2AL15Barbour CountyH1
3AL17Bibb CountyH1
4AL19Blount CountyH1
\n", "
" ], "text/plain": [ " state state_fips county_fips county fips_class_code\n", "0 AL 1 1 Autauga County H1\n", "1 AL 1 3 Baldwin County H1\n", "2 AL 1 5 Barbour County H1\n", "3 AL 1 7 Bibb County H1\n", "4 AL 1 9 Blount County H1" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having worked with the data before, I know that counties/entities like Virginia Beach City are labeled \"Virginia Beach city\". Let's fix that:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "fips[\"county\"] = fips.county.str.replace(\"city\",\"City\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check for duplicates:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of duplicate rows: 0\n" ] } ], "source": [ "fips.duplicated(['state_fips','county_fips'])\n", "print(\"Number of duplicate rows:\", fips[fips.duplicated(['state_fips','county_fips'])].size)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(104709, 12)\n", "(104709, 17)\n" ] } ], "source": [ "trips_fips = pd.merge(trips,fips, left_on=['st','cnty'], \n", " right_on=['state_fips','county_fips'],\n", " how='left', sort=False)\n", "print(trips.shape)\n", "print(trips_fips.shape)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2statestate_fipscounty_fipscountyfips_class_code
011710201011140012010.06.02010-11-1498835020102STRIPED BASS725.05.0CT9.07.0Middlesex CountyH4
111710201011140022010.06.02010-11-1498835020102STRIPED BASS710.02.0CT9.07.0Middlesex CountyH4
211710201011160012010.06.02010-11-1698835250101BLUEFISH8835020102STRIPED BASS1112.06.0CT9.011.0New London CountyH4
311710201011230012010.06.02010-11-2398835020102STRIPED BASS8835250101BLUEFISH790.025.0CT9.07.0Middlesex CountyH4
411710201011230022010.06.02010-11-2398835020102STRIPED BASS8835250101BLUEFISH775.020.0CT9.07.0Middlesex CountyH4
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9 8835020102 STRIPED BASS \n", "1 1171020101114002 2010.0 6.0 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010.0 6.0 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010.0 6.0 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010.0 6.0 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 state state_fips \\\n", "0 7 25.0 5.0 CT 9.0 \n", "1 7 10.0 2.0 CT 9.0 \n", "2 8835020102 STRIPED BASS 11 12.0 6.0 CT 9.0 \n", "3 8835250101 BLUEFISH 7 90.0 25.0 CT 9.0 \n", "4 8835250101 BLUEFISH 7 75.0 20.0 CT 9.0 \n", "\n", " county_fips county fips_class_code \n", "0 7.0 Middlesex County H4 \n", "1 7.0 Middlesex County H4 \n", "2 11.0 New London County H4 \n", "3 7.0 Middlesex County H4 \n", "4 7.0 Middlesex County H4 " ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's rename our state and intercept column to make it more intuitive." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2state_interceptstate_fipscounty_fipscounty_interceptfips_class_code
011710201011140012010.06.02010-11-1498835020102STRIPED BASS725.05.0CT9.07.0Middlesex CountyH4
111710201011140022010.06.02010-11-1498835020102STRIPED BASS710.02.0CT9.07.0Middlesex CountyH4
211710201011160012010.06.02010-11-1698835250101BLUEFISH8835020102STRIPED BASS1112.06.0CT9.011.0New London CountyH4
311710201011230012010.06.02010-11-2398835020102STRIPED BASS8835250101BLUEFISH790.025.0CT9.07.0Middlesex CountyH4
411710201011230022010.06.02010-11-2398835020102STRIPED BASS8835250101BLUEFISH775.020.0CT9.07.0Middlesex CountyH4
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st prim1 prim1_common \\\n", "0 1171020101114001 2010.0 6.0 2010-11-14 9 8835020102 STRIPED BASS \n", "1 1171020101114002 2010.0 6.0 2010-11-14 9 8835020102 STRIPED BASS \n", "2 1171020101116001 2010.0 6.0 2010-11-16 9 8835250101 BLUEFISH \n", "3 1171020101123001 2010.0 6.0 2010-11-23 9 8835020102 STRIPED BASS \n", "4 1171020101123002 2010.0 6.0 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 ffdays2 state_intercept \\\n", "0 7 25.0 5.0 CT \n", "1 7 10.0 2.0 CT \n", "2 8835020102 STRIPED BASS 11 12.0 6.0 CT \n", "3 8835250101 BLUEFISH 7 90.0 25.0 CT \n", "4 8835250101 BLUEFISH 7 75.0 20.0 CT \n", "\n", " state_fips county_fips county_intercept fips_class_code \n", "0 9.0 7.0 Middlesex County H4 \n", "1 9.0 7.0 Middlesex County H4 \n", "2 9.0 11.0 New London County H4 \n", "3 9.0 7.0 Middlesex County H4 \n", "4 9.0 7.0 Middlesex County H4 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.rename(columns = {'county': 'county_intercept','state':'state_intercept'},inplace=True)\n", "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Indices\n", "\n", "Let's create an index on these new state and county fields:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim1_commonprim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
state_interceptcounty_intercept
CTMiddlesex County11710201011140012010.06.02010-11-1498835020102STRIPED BASS725.05.09.07.0H4
Middlesex County11710201011140022010.06.02010-11-1498835020102STRIPED BASS710.02.09.07.0H4
New London County11710201011160012010.06.02010-11-1698835250101BLUEFISH8835020102STRIPED BASS1112.06.09.011.0H4
Middlesex County11710201011230012010.06.02010-11-2398835020102STRIPED BASS8835250101BLUEFISH790.025.09.07.0H4
Middlesex County11710201011230022010.06.02010-11-2398835020102STRIPED BASS8835250101BLUEFISH775.020.09.07.0H4
\n", "
" ], "text/plain": [ " id_code year wave \\\n", "state_intercept county_intercept \n", "CT Middlesex County 1171020101114001 2010.0 6.0 \n", " Middlesex County 1171020101114002 2010.0 6.0 \n", " New London County 1171020101116001 2010.0 6.0 \n", " Middlesex County 1171020101123001 2010.0 6.0 \n", " Middlesex County 1171020101123002 2010.0 6.0 \n", "\n", " intercept_date st prim1 prim1_common \\\n", "state_intercept county_intercept \n", "CT Middlesex County 2010-11-14 9 8835020102 STRIPED BASS \n", " Middlesex County 2010-11-14 9 8835020102 STRIPED BASS \n", " New London County 2010-11-16 9 8835250101 BLUEFISH \n", " Middlesex County 2010-11-23 9 8835020102 STRIPED BASS \n", " Middlesex County 2010-11-23 9 8835020102 STRIPED BASS \n", "\n", " prim2 prim2_common cnty ffdays12 \\\n", "state_intercept county_intercept \n", "CT Middlesex County 7 25.0 \n", " Middlesex County 7 10.0 \n", " New London County 8835020102 STRIPED BASS 11 12.0 \n", " Middlesex County 8835250101 BLUEFISH 7 90.0 \n", " Middlesex County 8835250101 BLUEFISH 7 75.0 \n", "\n", " ffdays2 state_fips county_fips \\\n", "state_intercept county_intercept \n", "CT Middlesex County 5.0 9.0 7.0 \n", " Middlesex County 2.0 9.0 7.0 \n", " New London County 6.0 9.0 11.0 \n", " Middlesex County 25.0 9.0 7.0 \n", " Middlesex County 20.0 9.0 7.0 \n", "\n", " fips_class_code \n", "state_intercept county_intercept \n", "CT Middlesex County H4 \n", " Middlesex County H4 \n", " New London County H4 \n", " Middlesex County H4 \n", " Middlesex County H4 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.set_index(['state_intercept','county_intercept'],inplace=True)\n", "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why use an index for these state and county codes? The biggest reason is that we can easily access values for looking at our data. Here we have 2 month avidity for anglers sampled in James City County:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/robhicks/anaconda/envs/python36/lib/python3.6/site-packages/ipykernel_launcher.py:1: PerformanceWarning: indexing past lexsort depth may impact performance.\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] }, { "data": { "text/plain": [ "state_intercept county_intercept \n", "VA James City County 2.0\n", " James City County 1.0\n", " James City County 0.0\n", " James City County 0.0\n", " James City County 4.0\n", " James City County 6.0\n", " James City County 6.0\n", " James City County 6.0\n", " James City County 6.0\n", " James City County 3.0\n", " James City County 3.0\n", " James City County 8.0\n", " James City County 20.0\n", " James City County 1.0\n", " James City County 12.0\n", " James City County 2.0\n", " James City County 1.0\n", " James City County 8.0\n", " James City County 0.0\n", " James City County 4.0\n", " James City County 2.0\n", " James City County 2.0\n", " James City County 0.0\n", " James City County 0.0\n", " James City County 0.0\n", " James City County 0.0\n", " James City County 15.0\n", " James City County 15.0\n", " James City County 3.0\n", "Name: ffdays2, dtype: float64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips['ffdays2'].loc['VA','James City County']" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "# we want to create a richer index that includes the prim1_common species name:\n", "# 1. put state_intercept and county_intercept back in data\n", "trips_fips.reset_index(inplace=True)\n", "trips_fips.head()\n", "# 2. create new multiindex\n", "trips_fips.set_index(['state_intercept','county_intercept','prim1_common'],inplace=True)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
state_interceptcounty_interceptprim1_common
CTMiddlesex CountySTRIPED BASS11710201011140012010.06.02010-11-1498835020102725.05.09.07.0H4
STRIPED BASS11710201011140022010.06.02010-11-1498835020102710.02.09.07.0H4
New London CountyBLUEFISH11710201011160012010.06.02010-11-16988352501018835020102STRIPED BASS1112.06.09.011.0H4
Middlesex CountySTRIPED BASS11710201011230012010.06.02010-11-23988350201028835250101BLUEFISH790.025.09.07.0H4
STRIPED BASS11710201011230022010.06.02010-11-23988350201028835250101BLUEFISH775.020.09.07.0H4
\n", "
" ], "text/plain": [ " id_code year \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 1171020101114001 2010.0 \n", " STRIPED BASS 1171020101114002 2010.0 \n", " New London County BLUEFISH 1171020101116001 2010.0 \n", " Middlesex County STRIPED BASS 1171020101123001 2010.0 \n", " STRIPED BASS 1171020101123002 2010.0 \n", "\n", " wave intercept_date st \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 6.0 2010-11-14 9 \n", " STRIPED BASS 6.0 2010-11-14 9 \n", " New London County BLUEFISH 6.0 2010-11-16 9 \n", " Middlesex County STRIPED BASS 6.0 2010-11-23 9 \n", " STRIPED BASS 6.0 2010-11-23 9 \n", "\n", " prim1 prim2 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 8835020102 \n", " STRIPED BASS 8835020102 \n", " New London County BLUEFISH 8835250101 8835020102 \n", " Middlesex County STRIPED BASS 8835020102 8835250101 \n", " STRIPED BASS 8835020102 8835250101 \n", "\n", " prim2_common cnty ffdays12 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 7 25.0 \n", " STRIPED BASS 7 10.0 \n", " New London County BLUEFISH STRIPED BASS 11 12.0 \n", " Middlesex County STRIPED BASS BLUEFISH 7 90.0 \n", " STRIPED BASS BLUEFISH 7 75.0 \n", "\n", " ffdays2 state_fips \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 5.0 9.0 \n", " STRIPED BASS 2.0 9.0 \n", " New London County BLUEFISH 6.0 9.0 \n", " Middlesex County STRIPED BASS 25.0 9.0 \n", " STRIPED BASS 20.0 9.0 \n", "\n", " county_fips fips_class_code \n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 7.0 H4 \n", " STRIPED BASS 7.0 H4 \n", " New London County BLUEFISH 11.0 H4 \n", " Middlesex County STRIPED BASS 7.0 H4 \n", " STRIPED BASS 7.0 H4 " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.head()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/robhicks/anaconda/envs/python36/lib/python3.6/site-packages/ipykernel_launcher.py:1: PerformanceWarning: indexing past lexsort depth may impact performance.\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
state_interceptcounty_interceptprim1_common
VAVirginia Beach CityRED DRUM16814201011270022010.06.02010-11-275188354409018835440102SPOTTED SEATROUT81040.010.051.0810.0C7
RED DRUM16814201011270042010.06.02010-11-275188354409018835440102SPOTTED SEATROUT810150.024.051.0810.0C7
RED DRUM16814201011270052010.06.02010-11-275188354409018835250101BLUEFISH81050.020.051.0810.0C7
RED DRUM16814201011260032010.06.02010-11-265188354409018835440102SPOTTED SEATROUT8108.04.051.0810.0C7
RED DRUM16814201011260042010.06.02010-11-26518835440901810150.035.051.0810.0C7
\n", "
" ], "text/plain": [ " id_code year \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 1681420101127002 2010.0 \n", " RED DRUM 1681420101127004 2010.0 \n", " RED DRUM 1681420101127005 2010.0 \n", " RED DRUM 1681420101126003 2010.0 \n", " RED DRUM 1681420101126004 2010.0 \n", "\n", " wave intercept_date st \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 6.0 2010-11-27 51 \n", " RED DRUM 6.0 2010-11-27 51 \n", " RED DRUM 6.0 2010-11-27 51 \n", " RED DRUM 6.0 2010-11-26 51 \n", " RED DRUM 6.0 2010-11-26 51 \n", "\n", " prim1 prim2 \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 8835440901 8835440102 \n", " RED DRUM 8835440901 8835440102 \n", " RED DRUM 8835440901 8835250101 \n", " RED DRUM 8835440901 8835440102 \n", " RED DRUM 8835440901 \n", "\n", " prim2_common cnty \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM SPOTTED SEATROUT 810 \n", " RED DRUM SPOTTED SEATROUT 810 \n", " RED DRUM BLUEFISH 810 \n", " RED DRUM SPOTTED SEATROUT 810 \n", " RED DRUM 810 \n", "\n", " ffdays12 ffdays2 \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 40.0 10.0 \n", " RED DRUM 150.0 24.0 \n", " RED DRUM 50.0 20.0 \n", " RED DRUM 8.0 4.0 \n", " RED DRUM 150.0 35.0 \n", "\n", " state_fips county_fips \\\n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM 51.0 810.0 \n", " RED DRUM 51.0 810.0 \n", " RED DRUM 51.0 810.0 \n", " RED DRUM 51.0 810.0 \n", " RED DRUM 51.0 810.0 \n", "\n", " fips_class_code \n", "state_intercept county_intercept prim1_common \n", "VA Virginia Beach City RED DRUM C7 \n", " RED DRUM C7 \n", " RED DRUM C7 \n", " RED DRUM C7 \n", " RED DRUM C7 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA','Virginia Beach City','RED DRUM'].head()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_code
county_intercept
Virginia Beach City16620201011100012010.06.02010-11-1051883502010281014.05.051.0810.0C7
Virginia Beach City16620201011100022010.06.02010-11-1051883502010281010.05.051.0810.0C7
Hampton City16808201011220012010.06.02010-11-225188350201028835440901RED DRUM65080.024.051.0650.0C7
Virginia Beach City16814201011210052010.06.02010-11-215188350201028104.04.051.0810.0C7
Virginia Beach City16814201011210122010.06.02010-11-215188350201028102.00.051.0810.0C7
\n", "
" ], "text/plain": [ " id_code year wave intercept_date st \\\n", "county_intercept \n", "Virginia Beach City 1662020101110001 2010.0 6.0 2010-11-10 51 \n", "Virginia Beach City 1662020101110002 2010.0 6.0 2010-11-10 51 \n", "Hampton City 1680820101122001 2010.0 6.0 2010-11-22 51 \n", "Virginia Beach City 1681420101121005 2010.0 6.0 2010-11-21 51 \n", "Virginia Beach City 1681420101121012 2010.0 6.0 2010-11-21 51 \n", "\n", " prim1 prim2 prim2_common cnty ffdays12 \\\n", "county_intercept \n", "Virginia Beach City 8835020102 810 14.0 \n", "Virginia Beach City 8835020102 810 10.0 \n", "Hampton City 8835020102 8835440901 RED DRUM 650 80.0 \n", "Virginia Beach City 8835020102 810 4.0 \n", "Virginia Beach City 8835020102 810 2.0 \n", "\n", " ffdays2 state_fips county_fips fips_class_code \n", "county_intercept \n", "Virginia Beach City 5.0 51.0 810.0 C7 \n", "Virginia Beach City 5.0 51.0 810.0 C7 \n", "Hampton City 24.0 51.0 650.0 C7 \n", "Virginia Beach City 4.0 51.0 810.0 C7 \n", "Virginia Beach City 0.0 51.0 810.0 C7 " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA',:,'STRIPED BASS'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, indexes are pretty cool, but what if we need the county data for doing summary statistics? Example, suppose we want to get a frequency count of trips by county in VA targeting Striped Bass? Notice, we can't just access index values like you would a column of data:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'DataFrame' object has no attribute 'county_intercept'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtrips_fips\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcounty_intercept\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 3612\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3613\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3614\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3615\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3616\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'county_intercept'" ] } ], "source": [ "trips_fips.county_intercept" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the strategy will be to copy the index values back into our table so we can use it for other purposes:\n", "\n", "(Note: you can also keep these columns in your data by including the option `drop=False` when you `set_index()`)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['CT', 'CT', 'CT', 'CT', 'CT', 'CT', 'CT', 'CT', 'CT', 'CT',\n", " ...\n", " 'MS', 'MS', 'MS', 'MS', 'MS', 'MS', 'MS', 'MS', 'MS', 'MS'],\n", " dtype='object', name='state_intercept', length=104709)\n" ] } ], "source": [ "print(trips_fips.index.get_level_values(0))\n", "trips_fips['state_intercept'] = trips_fips.index.get_level_values(0)\n", "trips_fips['county_intercept'] = trips_fips.index.get_level_values(1)\n", "trips_fips['species_name'] = trips_fips.index.get_level_values(2)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id_codeyearwaveintercept_datestprim1prim2prim2_commoncntyffdays12ffdays2state_fipscounty_fipsfips_class_codestate_interceptcounty_interceptspecies_name
state_interceptcounty_interceptprim1_common
CTMiddlesex CountySTRIPED BASS11710201011140012010.06.02010-11-1498835020102725.05.09.07.0H4CTMiddlesex CountySTRIPED BASS
STRIPED BASS11710201011140022010.06.02010-11-1498835020102710.02.09.07.0H4CTMiddlesex CountySTRIPED BASS
New London CountyBLUEFISH11710201011160012010.06.02010-11-16988352501018835020102STRIPED BASS1112.06.09.011.0H4CTNew London CountyBLUEFISH
Middlesex CountySTRIPED BASS11710201011230012010.06.02010-11-23988350201028835250101BLUEFISH790.025.09.07.0H4CTMiddlesex CountySTRIPED BASS
STRIPED BASS11710201011230022010.06.02010-11-23988350201028835250101BLUEFISH775.020.09.07.0H4CTMiddlesex CountySTRIPED BASS
\n", "
" ], "text/plain": [ " id_code year \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 1171020101114001 2010.0 \n", " STRIPED BASS 1171020101114002 2010.0 \n", " New London County BLUEFISH 1171020101116001 2010.0 \n", " Middlesex County STRIPED BASS 1171020101123001 2010.0 \n", " STRIPED BASS 1171020101123002 2010.0 \n", "\n", " wave intercept_date st \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 6.0 2010-11-14 9 \n", " STRIPED BASS 6.0 2010-11-14 9 \n", " New London County BLUEFISH 6.0 2010-11-16 9 \n", " Middlesex County STRIPED BASS 6.0 2010-11-23 9 \n", " STRIPED BASS 6.0 2010-11-23 9 \n", "\n", " prim1 prim2 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 8835020102 \n", " STRIPED BASS 8835020102 \n", " New London County BLUEFISH 8835250101 8835020102 \n", " Middlesex County STRIPED BASS 8835020102 8835250101 \n", " STRIPED BASS 8835020102 8835250101 \n", "\n", " prim2_common cnty ffdays12 \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 7 25.0 \n", " STRIPED BASS 7 10.0 \n", " New London County BLUEFISH STRIPED BASS 11 12.0 \n", " Middlesex County STRIPED BASS BLUEFISH 7 90.0 \n", " STRIPED BASS BLUEFISH 7 75.0 \n", "\n", " ffdays2 state_fips \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 5.0 9.0 \n", " STRIPED BASS 2.0 9.0 \n", " New London County BLUEFISH 6.0 9.0 \n", " Middlesex County STRIPED BASS 25.0 9.0 \n", " STRIPED BASS 20.0 9.0 \n", "\n", " county_fips fips_class_code \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS 7.0 H4 \n", " STRIPED BASS 7.0 H4 \n", " New London County BLUEFISH 11.0 H4 \n", " Middlesex County STRIPED BASS 7.0 H4 \n", " STRIPED BASS 7.0 H4 \n", "\n", " state_intercept \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS CT \n", " STRIPED BASS CT \n", " New London County BLUEFISH CT \n", " Middlesex County STRIPED BASS CT \n", " STRIPED BASS CT \n", "\n", " county_intercept \\\n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS Middlesex County \n", " STRIPED BASS Middlesex County \n", " New London County BLUEFISH New London County \n", " Middlesex County STRIPED BASS Middlesex County \n", " STRIPED BASS Middlesex County \n", "\n", " species_name \n", "state_intercept county_intercept prim1_common \n", "CT Middlesex County STRIPED BASS STRIPED BASS \n", " STRIPED BASS STRIPED BASS \n", " New London County BLUEFISH BLUEFISH \n", " Middlesex County STRIPED BASS STRIPED BASS \n", " STRIPED BASS STRIPED BASS " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This allows us to use index lookups and also the index values for calculating interesting summary data. The following calculates the top 10 targeted species in VA:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " 1794\n", "SUMMER FLOUNDER 972\n", "STRIPED BASS 710\n", "ATLANTIC CROAKER 448\n", "SPOTTED SEATROUT 154\n", "TAUTOG 153\n", "SPOT 143\n", "COBIA 117\n", "RED DRUM 66\n", "BLUEFISH 50\n", "Name: species_name, dtype: int64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA'].species_name.value_counts().head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This calculates the top targeted species in Gloucester County, VA." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " 37\n", "STRIPED BASS 23\n", "ATLANTIC CROAKER 20\n", "SPOT 18\n", "SUMMER FLOUNDER 15\n", "COBIA 7\n", "SPANISH MACKEREL 2\n", "BLUEFISH 1\n", "Name: species_name, dtype: int64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA','Gloucester County',:].species_name.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about top 10 counties in VA with respect to numbers of intercepted trips?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Virginia Beach City 1876\n", "Hampton City 553\n", "Accomack County 505\n", "Norfolk City 440\n", "Newport News City 306\n", "Northampton County 297\n", "Middlesex County 215\n", "Gloucester County 123\n", "Mathews County 67\n", "Portsmouth City 64\n", "Name: county_intercept, dtype: int64\n" ] } ], "source": [ "print(trips_fips.loc['VA'].county_intercept.value_counts().head(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tells us when summer flounder is being targeted in VA:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6 190\n", "7 176\n", "5 167\n", "8 163\n", "4 137\n", "9 95\n", "10 31\n", "3 12\n", "11 1\n", "Name: intercept_date, dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA',:,'SUMMER FLOUNDER'].intercept_date.dt.month.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note we used the data/time field `intercept_date` to extract the month using the `dt.month` method. Pandas has immensely rich datetime capabilities that we won't demonstrate here.\n", "\n", "## Groupby\n", "\n", "We can compute interesting data by grouping over column values and using our index for fast selection. Suppose we want to know total number of trips in VA by county of intercept for each wave (wave 1 = Jan,Feb; 2 = Mar,Apr; etc). Groupby lets us do this:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "county_intercept wave\n", "Accomack County 2.0 101\n", " 3.0 172\n", " 4.0 181\n", " 5.0 50\n", " 6.0 1\n", "Chesapeake City 2.0 10\n", " 3.0 8\n", " 4.0 5\n", " 5.0 3\n", " 6.0 33\n", "Essex County 3.0 19\n", " 4.0 3\n", " 6.0 15\n", "Gloucester County 3.0 40\n", " 4.0 32\n", " 5.0 35\n", " 6.0 16\n", "Hampton City 2.0 59\n", " 3.0 108\n", " 4.0 173\n", " 5.0 141\n", " 6.0 72\n", "Isle of Wight County 2.0 17\n", " 5.0 9\n", "James City County 3.0 13\n", "Name: id_code, dtype: int64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trips_fips.loc['VA'].groupby(['county_intercept','wave'])['id_code'].count().head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stack and Unstack\n", "\n", "We can manipulate the result above further to make it more amenable for comparisons and graphing (e.g. we want all of the wave 2 records on the same row):" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wave2.03.04.05.06.0
county_intercept
Accomack County101.0172.0181.050.01.0
Chesapeake City10.08.05.03.033.0
Essex CountyNaN19.03.0NaN15.0
Gloucester CountyNaN40.032.035.016.0
Hampton City59.0108.0173.0141.072.0
Isle of Wight County17.0NaNNaN9.0NaN
James City CountyNaN13.011.05.0NaN
King William CountyNaNNaN9.0NaNNaN
Mathews CountyNaN20.030.02.015.0
Middlesex CountyNaN64.060.063.028.0
Newport News City25.063.079.074.065.0
Norfolk City18.0125.0159.098.040.0
Northampton County7.073.0108.049.060.0
Northumberland CountyNaN11.08.05.04.0
Poquoson CityNaN23.011.012.014.0
Portsmouth CityNaN31.013.0NaN20.0
Richmond CountyNaN21.04.04.0NaN
Suffolk CityNaN11.013.09.0NaN
Surry CountyNaN2.02.02.0NaN
Virginia Beach City382.0427.0412.0341.0314.0
Westmoreland County5.05.04.0NaN3.0
York CountyNaN19.09.05.010.0
\n", "
" ], "text/plain": [ "wave 2.0 3.0 4.0 5.0 6.0\n", "county_intercept \n", "Accomack County 101.0 172.0 181.0 50.0 1.0\n", "Chesapeake City 10.0 8.0 5.0 3.0 33.0\n", "Essex County NaN 19.0 3.0 NaN 15.0\n", "Gloucester County NaN 40.0 32.0 35.0 16.0\n", "Hampton City 59.0 108.0 173.0 141.0 72.0\n", "Isle of Wight County 17.0 NaN NaN 9.0 NaN\n", "James City County NaN 13.0 11.0 5.0 NaN\n", "King William County NaN NaN 9.0 NaN NaN\n", "Mathews County NaN 20.0 30.0 2.0 15.0\n", "Middlesex County NaN 64.0 60.0 63.0 28.0\n", "Newport News City 25.0 63.0 79.0 74.0 65.0\n", "Norfolk City 18.0 125.0 159.0 98.0 40.0\n", "Northampton County 7.0 73.0 108.0 49.0 60.0\n", "Northumberland County NaN 11.0 8.0 5.0 4.0\n", "Poquoson City NaN 23.0 11.0 12.0 14.0\n", "Portsmouth City NaN 31.0 13.0 NaN 20.0\n", "Richmond County NaN 21.0 4.0 4.0 NaN\n", "Suffolk City NaN 11.0 13.0 9.0 NaN\n", "Surry County NaN 2.0 2.0 2.0 NaN\n", "Virginia Beach City 382.0 427.0 412.0 341.0 314.0\n", "Westmoreland County 5.0 5.0 4.0 NaN 3.0\n", "York County NaN 19.0 9.0 5.0 10.0" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = trips_fips.loc['VA'].groupby(['county_intercept','wave'])['id_code'].count()\n", "t.unstack().head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we could put waves on the same row:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
county_interceptAccomack CountyChesapeake CityEssex CountyGloucester CountyHampton CityIsle of Wight CountyJames City CountyKing William CountyMathews CountyMiddlesex County...Northampton CountyNorthumberland CountyPoquoson CityPortsmouth CityRichmond CountySuffolk CitySurry CountyVirginia Beach CityWestmoreland CountyYork County
wave
2.0101.010.0NaNNaN59.017.0NaNNaNNaNNaN...7.0NaNNaNNaNNaNNaNNaN382.05.0NaN
3.0172.08.019.040.0108.0NaN13.0NaN20.064.0...73.011.023.031.021.011.02.0427.05.019.0
4.0181.05.03.032.0173.0NaN11.09.030.060.0...108.08.011.013.04.013.02.0412.04.09.0
5.050.03.0NaN35.0141.09.05.0NaN2.063.0...49.05.012.0NaN4.09.02.0341.0NaN5.0
6.01.033.015.016.072.0NaNNaNNaN15.028.0...60.04.014.020.0NaNNaNNaN314.03.010.0
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ "county_intercept Accomack County Chesapeake City Essex County \\\n", "wave \n", "2.0 101.0 10.0 NaN \n", "3.0 172.0 8.0 19.0 \n", "4.0 181.0 5.0 3.0 \n", "5.0 50.0 3.0 NaN \n", "6.0 1.0 33.0 15.0 \n", "\n", "county_intercept Gloucester County Hampton City Isle of Wight County \\\n", "wave \n", "2.0 NaN 59.0 17.0 \n", "3.0 40.0 108.0 NaN \n", "4.0 32.0 173.0 NaN \n", "5.0 35.0 141.0 9.0 \n", "6.0 16.0 72.0 NaN \n", "\n", "county_intercept James City County King William County Mathews County \\\n", "wave \n", "2.0 NaN NaN NaN \n", "3.0 13.0 NaN 20.0 \n", "4.0 11.0 9.0 30.0 \n", "5.0 5.0 NaN 2.0 \n", "6.0 NaN NaN 15.0 \n", "\n", "county_intercept Middlesex County ... Northampton County \\\n", "wave ... \n", "2.0 NaN ... 7.0 \n", "3.0 64.0 ... 73.0 \n", "4.0 60.0 ... 108.0 \n", "5.0 63.0 ... 49.0 \n", "6.0 28.0 ... 60.0 \n", "\n", "county_intercept Northumberland County Poquoson City Portsmouth City \\\n", "wave \n", "2.0 NaN NaN NaN \n", "3.0 11.0 23.0 31.0 \n", "4.0 8.0 11.0 13.0 \n", "5.0 5.0 12.0 NaN \n", "6.0 4.0 14.0 20.0 \n", "\n", "county_intercept Richmond County Suffolk City Surry County \\\n", "wave \n", "2.0 NaN NaN NaN \n", "3.0 21.0 11.0 2.0 \n", "4.0 4.0 13.0 2.0 \n", "5.0 4.0 9.0 2.0 \n", "6.0 NaN NaN NaN \n", "\n", "county_intercept Virginia Beach City Westmoreland County York County \n", "wave \n", "2.0 382.0 5.0 NaN \n", "3.0 427.0 5.0 19.0 \n", "4.0 412.0 4.0 9.0 \n", "5.0 341.0 NaN 5.0 \n", "6.0 314.0 3.0 10.0 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.unstack(0).head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can combine these with plot to quickly make cool charts. Let's focus on the peninsula:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeoAAAH2CAYAAAC/ec/aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl4VeW59/HvzaCIihPoqaKCI0iIYUYZxKHOY1UcKIKKVFurta2tra9H7LFWq1WL57TUCRxQiyNWrUXrhLMBI4JaBYuIIiIIKqACPu8fe2c3wQCBJGQRvp/rysXea3jWvVc2+7eftVbWEyklJElSNjWq7wIkSdKKGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEGtehMRoyLi0nradkTEyIj4NCJero8a1gUR0S8iZtbTtodFxO1rYTu/jogb63o70poyqFUQEdMjYnZEbFxh2pCIeKoey6orvYHvAq1TSt2XnxkRgyPi2eo0VJ9htipr48tQ/kvPuxHxRl1up66klC5LKQ1Zk3Xz+/friPg8/zM5In4XEZutRhvTI+KANdn+6lhb21HtM6i1vCbAufVdxOqKiMarucqOwPSU0sK6qGd1REST+q6hhvoCWwM7RUS3+i4m/8VhbX62/T6ltCnQCjgV6Ak8V/ELr1QTBrWWdyXw84jYfPkZEdEmIlLFYImIpyJiSP7x4Ih4LiKuiYj5+V7W3vnp70fExxExaLlmW0bEY/neyNMRsWOFttvl582LiH9FRP8K80ZFxJ8j4pGIWAjsW0W920bEg/n1p0bEGfnppwM3AntFxBcRccmqdkq+N/LziJgUEQsi4q8R0Sz/Yfx3YNt8W1/kt9soIi6IiGkRMTcixkTElsvtx9MjYgbwRH5674h4Pr/v3o+IwfnpG0bEVRExI3/EY0REbJSf1y8iZuYP336Sr3NAft5QYADwi3xdf6uwX+6NiDkR8e+IOKfC69wov28/zfeQqxO8g4CxwCP5xxX321MR8T/598XnETEuIlpWmH9KRLyX30cXrazXFxE9K+yf1yKi33Lb+W1EPAcsIvelYXD+Pfh5/nUOWEG7hUPsFX43g/L7+5OIuLAa+4CU0pcppVeAI4GtyIU2EbFzRDyRf42fRMTo8v9fEXEbsAPwt/zv6Bf56XdHxEf599ozEdGhQr2HRsQb+df1QUT8vMK8wyOiLL+Pno+I4pVtR+uIlJI//pBSApgOHADcB1yanzYEeCr/uA2QgCYV1nkKGJJ/PBhYSu4DqjFwKTAD+D9gQ+BA4HNgk/zyo/LP++bn/xF4Nj9vY+D9fFtNgM7AJ0CHCusuAHqR+8LZrIrX8zTwJ6AZUALMAfavUOuzK9kXlebn983LwLbAlsCbwJn5ef2Amcut/xPgRaB1/rX9Bbhzuf14a/51bkTuQ/Rz4CSgKbkP+pL88tcCD+a3uynwN+B3Fba9FLg6v519gIXA7hX206UV6moETAD+G9gA2Al4FzgoP/9yYHx+W9sDk5d/bcu9zubAZ8ChwLH539EGy70/pgG75V/nU8Dl+Xl7AF+QOw2xAXAVsAQ4ID9/GHB7/vF2wNz8dhqRO20xF2hVYTszgA7k3i+b5esq3w/fIf/eqeI1VNxO+e/mhny9ewJfAe1XsG6l/Vth+q3AX/OPd8nXuyG5XvczwLXL/79bbv3T8r/rDfO//7IK82YBffKPtwA65x93Bj4GepD7/zco3/aGK9qOP+vGjz1qVeW/gR9HRKs1WPffKaWRKaVlwF/Jfdj/JqX0VUppHPA1uQ+ucg+nlJ5JKX0FXEiul7s9cDi5Q9MjU0pLU0oTgXuB4yqsOzal9FxK6ZuU0pcVi8i30Rv4Zcr1dMrI9aIHrsFrKjc8pfRhSmkeubAsWcmyPwAuTCnNzL+2YcBxUfkw97CU0sKU0mJyPd/HU0p3ppSWpJTmppTKIiKAM4DzUkrzUkqfA5cBJy63vYvy+/hp4GGgP1XrRi7cfpNS+jql9C65UCpvrz/w2/y23geGr2KffI9ckI0DHiIXkoctt8zIlNLb+dc5hv/st+OAv6WUnk0pfU3ufbeiwQe+DzySUnok//t+DCglF9zlRqWUpqSUlpL78vINUBQRG6WUZqWUpqzitVR0SUppcUrpNeA1coG9Oj4k92WHlNLUlNJj+d/PHHJfqvZZ2coppZtTSp9XeO/sGf85770E2CMiWqSUPs3/34Dc++QvKaWXUkrLUkq3kPvd9FzN2pUxBrW+JaU0mdyH7gVrsPrsCo8X59tbftomFZ6/X2G7XwDzyPVadwR65A/hzY+I+eTC7L+qWrcK2wLlwVbuPXI9szX1UYXHi6j8Opa3I3B/hdrfBJYB21RYpmL925PreS6vFble64QKbT2an17u01T5XPt75F7/iuradrn9+usKdW27XF3vreQ1Qq7XNib/Zeorckdjlj+9saL9VmlbKaVF5HrJK6r7+OXq7k2up1yuYlsLgROAM4FZEfFwRLRbxWupTs3VtR259zIRsXVE3JU/TP0ZcDvQckUrRkTjiLg8f9rkM3I9YSqscyy5LyjvRe500V756TsCP1tuH23Pit8LWkcY1FqRi8l9Q68YbOVh0LzCtIrBuSa2L38QEZuQ64V8SO5D9+mU0uYVfjZJKZ1VYd2VDf32IbBlRGxaYdoOwAc1rLcqVdXxPnDIcvU3Syl9sIL13gd2rqKdT8h9uelQoZ3NUkoVg2OLqHzh0g7kXn9Vtb1P7qhHxbo2TSmV90xnUeF3km+rShHRGtgP+H7+fOpH5HrJh1Y8D70Ss8idGihvbyNyh/yr8j5w23J1b5xSurzCMpVea0rpHyml75IL87fIHTmoc/n38QHkTiEA/C5fW3FKqQW5owNRsdTlmjgZOCrfxmbkDsdTvk5K6ZWU0lHkLuB7gNxRCsjto98ut4+ap5TuXMF2tI4wqFWllNJUcoeuz6kwbQ65oPt+/lv/aVQdLqvj0MhdRLUB8D/AS/lDrg8Bu0XEwIhomv/pFhHtq1n/+8DzwO8id9FXMXA6MLqG9VZlNrBVVP6TnBHAbyN/cVxEtIqIo1bSxmjggIjoHxFNImKriChJKX1DLmCuiYit821tFxEHLbf+JRGxQUT0IXfa4O4Kte1UYbmXgc8i4peRu3CscUQUxX+u1h4D/CoitsgH8Y9XUvNA4G1gd3KHs0vInYueSe5c+6rcAxwRuQsONwAuoXKAVXR7ftmD8jU3i9yFdK2rWjgitomII/NfYL4idy58WTVqWmORu+ivC7nw/BQYmZ+1aX778yNiO+D85VZd/ne0ab7mueS+FF9WYRsbRMSAiNgspbSE3Hn48td1A3BmRPSInI0j4rAKX1aX347WEQa1VuY35C52qugMch80c8lduPN8DbdxB7ne+zygC7nD2+QPWR9I7tzph+QORV5B7uKa6jqJXG/kQ+B+4OL8uc1alVJ6C7gTeDd/yHFbchfGPQiMi4jPyV1Y1mMlbcwgdzjzZ+T2RRn/OS/6S2Aq8GL+UOjj5MKx3EfkguFDcoF/Zr4mgJvInc+cHxEP5K8dOIJcqP6bXI/9RnI9N8iF5Xv5eeOA21by0gcBf0opfVTxh9yXlOUPf1f1mqeQ+yJwF7ne9efkLob6qopl3yfXy/w1uYsC3yf3PlzRZ1gjcvvyQ3L7cx/gh6uqaQ39Iv87nkfuIrIJwN4VTkdcQu5CrwXkrh+4b7n1fwf8v/zv6Of5Nt4j96X4DXLvnYoGAtPz74UzyfXQSSmVkvv/+b/k3g9TyV0UuaLtaB0RKXk0RFpXRe5PlG5PKVXZs1yX5A8Zzwd2TSn9u77rkbLCHrWkehMRR0RE8/wh6quA1/nPxVOSMKgl1a+jyB2e/hDYFTgxeZhPqsRD35IkZZg9akmSMsygliQpwzIxak/Lli1TmzZt6rsMSZLWmgkTJnySUlrlrZozEdRt2rShtLS0vsuQJGmtiYhV3aIX8NC3JEmZZlBLkpRhBrUkSRmWiXPUktQQLVmyhJkzZ/Lll1+uemE1WM2aNaN169Y0bdp0jdY3qCWpjsycOZNNN92UNm3aELGigcHUkKWUmDt3LjNnzqRt27Zr1IaHviWpjnz55ZdstdVWhvR6LCLYaqutanRUxaCWpDpkSKum7wGDWpKkDDOoJUlVuvbaa1m0aNEarbv33nvXafu1ob63X10GtSSpSjUJsueff75O2l+2bNka1VNb268PqwzqiLg5Ij6OiMkVpv01IsryP9Mjoiw/vU1ELK4wb0RdFi9J67tbb72V4uJi9txzTwYOHMh7773H/vvvT3FxMfvvvz8zZswAYPDgwdxzzz2F9TbZZBMAnnrqKfr168dxxx1Hu3btGDBgACklhg8fzocffsi+++7Lvvvuy0033cR5551XWP+GG27gpz/96QrrWt32AcaNG8dee+1F586dOf744/niiy+A3G2mf/Ob39C7d2/uvvtupk6dygEHHMCee+5J586dmTZtGgBXXnkl3bp1o7i4mIsvvhiA6dOn065dOwYNGkRxcTHHHXccixYtqnL7mZVSWukP0BfoDExewfw/AP+df9xmRcut7KdLly5JkhqaN954o07bnzx5ctptt93SnDlzUkopzZ07Nx1++OFp1KhRKaWUbrrppnTUUUellFIaNGhQuvvuuwvrbrzxximllJ588snUokWL9P7776dly5alnj17pvHjx6eUUtpxxx0LbX/xxRdpp512Sl9//XVKKaW99torTZo0aYW1rW77c+bMSX369ElffPFFSimlyy+/PF1yySWF5a644opC2927d0/33XdfSimlxYsXp4ULF6Z//OMf6YwzzkjffPNNWrZsWTrssMPS008/nf79738nID377LMppZROPfXUdOWVV35r+3WtqvcCUJqqkZGr7FGnlJ4B5lU1L3KXsvUH7qz5VwZJ0up44oknOO6442jZsiUAW265JS+88AInn3wyAAMHDuTZZ59dZTvdu3endevWNGrUiJKSEqZPn/6tZTbeeGP2228/HnroId566y2WLFlCx44dq1Vnddp/8cUXeeONN+jVqxclJSXccsstvPfef8asOOGEEwD4/PPP+eCDDzjmmGOA3M1Emjdvzrhx4xg3bhydOnWic+fOvPXWW7zzzjsAbL/99vTq1QuA73//+9XaJ1lS0xue9AFmp5TeqTCtbUS8CnwG/L+U0vgabkOSVIWU0ir/9Kd8fpMmTfjmm28K63399deFZTbccMPC48aNG7N06dIq2xoyZAiXXXYZ7dq149RTT612ndVpP6XEd7/7Xe68s+p+38Ybb1xYriopJX71q1/xgx/8oNL06dOnf2sfrWt/MlfTi8lOonJvehawQ0qpE/BT4I6IaFHVihExNCJKI6J0zpw5NSxDktY/+++/P2PGjGHu3LkAzJs3j7333pu77roLgNGjR9O7d28gd553woQJAIwdO5YlS5assv1NN92Uzz//vPC8R48evP/++9xxxx2cdNJJNa6/Yvs9e/bkueeeY+rUqQAsWrSIt99++1vrtGjRgtatW/PAAw8A8NVXX7Fo0SIOOuggbr755sJ57Q8++ICPP/4YgBkzZvDCCy8AcOeddxb2yfKvL6vWOKgjognwPeCv5dNSSl+llObmH08ApgG7VbV+Sun6lFLXlFLXVq1WOW62JGk5HTp04MILL2SfffZhzz335Kc//SnDhw9n5MiRFBcXc9ttt/HHP/4RgDPOOIOnn36a7t2789JLLxV6qCszdOhQDjnkkEoXW/Xv359evXqxxRZb1Lj+iu23atWKUaNGcdJJJ1FcXEzPnj156623qlzvtttuY/jw4RQXF7P33nvz0UcfceCBB3LyySez11570bFjR4477rhCCLdv355bbrmF4uJi5s2bx1lnnbXC15dFsaLDCJUWimgDPJRSKqow7WDgVymlfSpMawXMSykti4idgPFAx5RSlee4y3Xt2jWVlpau2SuQpIx68803ad++fX2XUasOP/xwzjvvPPbff//6LqVapk+fzuGHH87kyZNXvXAdquq9EBETUkpdV7Vudf48607gBWD3iJgZEafnZ53Ity8i6wtMiojXgHuAM1cV0pKk7Js/fz677bYbG2200ToT0g3FKi8mSylVeSIipTS4imn3AvfWvCxJUpZsvvnm3zpnPHfu3CpD+5///CdbbbXV2iptpdq0aVPvvemacphLaT3X8Zbq/YkNwOuDXq/DSrSu2WqrrSgrK6vvMho8byEqSVKGGdSSJGWYQS1JUoYZ1JLUQKWU6N27N3//+98L08aMGcPBBx9crfWXLl3K5ptvvsrlPvzwQ/r3788uu+zCHnvswWGHHVa4cUlteeKJJ3jxxRdrtc11hReTSdJa0uaCh2u1vemXH7bS+RHBiBEjOP7449l3331ZtmwZF154IY8++ugq204pFW45uqrljj76aIYOHcqYMWMAmDhxIrNnz2aXXXap3guphieeeIKWLVvSs2fPWmtzXWGPWpIasKKiIo444giuuOIKLrnkEk455RR23nlnfv/731NUVERRURHXXXcdAFOnTqWoqIgzzzyTzp07M2vWrEI7c+bMoUePHt8K+ccee4xNNtmEIUOGFKZ17tyZXr168c033/DTn/6UoqIiOnbsWBhm8/HHH+foo48uLH/mmWdy++23A9C6dWuGDRtGp06dKC4u5u2332batGnceOONXHnllZSUlPD888+z0047Fe4ZPn/+fNq2bVurY1VniT1qSWrgLr74Yjp37swGG2xAaWkpL7/8MqNHj+bll19m2bJldO/enX322YfmzZvzxhtvMHLkSEaMGFEIwlmzZnHkkUdyxRVXsN9++1Vqe/LkyXTp0qXK7d5999288cYbvPbaa8yZM4du3brRt2/fVda7zTbb8OqrrzJ8+HCuvvpqRowYwZAhQ2jZsiU/+clPAOjVqxePPvoohx9+OHfccQf9+/encePGNdxT2WSPWpIauI033pgTTjiBgQMHsuGGGzJ+/HiOPfZYmjdvzqabbsrRRx9dGPpx5513plu3boV1v/76aw444ACuueaab4X0qjz77LOcfPLJNG7cmP/6r/+id+/eVOd20d/73vcA6NKlS5VDYkJuJK+RI0cCMHLkyNUazWtdY1BL0nqgUaNGNGqU+8hf2RgPyw/W0bRpUzp16sS4ceOqXL5Dhw6FUbmWt6LtVBxyE+DLL7+sNL98WMyVDbm5zz778Pbbb/Pkk0/StGlT2rVrV/ULagAMaklaz/Tt25f777+fxYsX88UXXzB27Fj69OlT5bIRwahRo3jttde46qqrvjX/wAMP5LPPPuPmm28uTHvppZcYP348ffv25a677mLZsmXMnj2b5557jq5du7LjjjsyZcoUvv76az799FOeeOKJVdZc1ZCU3//+9xkwYECD7k2DQS1J653u3btz0kkn0a1bN3r27MlZZ51Fx44rvpVskyZNGDNmDI8++ih/+ctfKs2LCMaOHcsjjzzCzjvvTFFREZdeeinbbrstxx13HO3atWPPPffkgAMO4Oqrr2brrbembdu2HH300XTs2JFTTjmFzp07r7Lmo446ijFjxtCpUyeef/55AAYMGMCCBQs44YQTarZDMq5aw1zWNYe5lOqP9/quOw1xmMssueuuu/jHP/5ROFedZTUZ5tKrviVJ65yzzjqLxx9/vFp/E76uM6glSeucP//5z/VdwlrjOWpJkjLMoJYkKcMMakmSMsygliQpwwxqSWrAIoKf/exnhedXXXUVw4YNq7+C8i677LIVzmvTpg3HHnts4fk999zD4MGD67ymJUuWcMEFF7DrrrtSVFRE9+7dC0OEHnroocyfP5/58+fzpz/9qc5rqcirviVpbRm2WS23t2CVi2y44Ybcd999/OpXv6Jly5a1u/01kFIipcRll13Gr3/96xUuV1paypQpU+jQocNaq+2iiy5i1qxZTJ48mQ033JDZs2fz9NNPA/DII48AMH36dP70pz/xwx/+cK3VZY9akhqwJk2aMHToUK655ppvzZszZw7HHnss3bp1o1u3bjz33HMAdOzYkfnz55NSYquttuLWW28FYODAgTz++OOMGjWKo446ioMPPpjdd9+dSy65pNDm1VdfXRg+89prrwVy4da+fXt++MMf0rlzZ04//XQWL15MSUkJAwYMqLLun//851X2uhcuXMhpp51Gt27d6NSpE2PHjgVyPd5JkyYB0KlTJ37zm98AufC98cYbmTVrFn379qWkpISioiLGjx9fqd1FixZxww03cN111xXuNb7NNtvQv39/INfL/+STT7jggguYNm0aJSUlnH/++QwcOLBQA+Tulvbggw+u6teyWgxqSWrgfvSjHzF69GgWLKjcAz/33HM577zzeOWVV7j33nsLY0r36tWL5557jilTprDTTjsVQu3FF1+kZ8+eAIWhMsvKyrj77rspLS1lwoQJjBw5kpdeeokXX3yRG264gVdffRWAf/3rX5xyyim8+uqrjBw5ko022oiysjJGjx5dZc39+/dn4sSJTJ06tdL03/72t+y333688sorPPnkk5x//vksXLiQvn37Mn78eD777DOaNGlS+NLx7LPP0qdPH+644w4OOuggysrKeO211ygpKanU7tSpU9lhhx1o0aLFSvfl5Zdfzs4770xZWRlXXnllpVG8FixYwPPPP8+hhx66yt/J6vDQtyQ1cC1atOCUU05h+PDhbLTRRoXpjz/+OG+88Ubh+Weffcbnn39Onz59eOaZZ9hxxx0566yzuP766/nggw/Ycsst2WSTTQD47ne/y1ZbbQXkhqV89tlniQiOOeaYwghc3/ve9xg/fjxHHnkkO+64YyHkq6Nx48acf/75/O53v+OQQw4pTB83bhwPPvhgYYCQL7/8khkzZtCnTx+GDx9O27ZtOeyww3jsscdYtGgR06dPZ/fdd2f27NmcdtppLFmyhKOPPvpbQb2m9tlnH370ox/x8ccfc99993HsscfSpEntRqs9aklaD/zkJz/hpptuYuHChYVp33zzDS+88AJlZWWUlZXxwQcfsOmmmxZ6p+PHj6dfv360atWKe+65p9IIWxFRqf2IWK3hM6tj4MCBPPPMM8yYMaMwLaXEvffeW6h5xowZtG/fnm7dulFaWloYtatTp07ccMMNdOnSBciNGPbMM8+w3XbbMXDgwMLh/HK77LILM2bM+NYIXdWtc/To0XU2LrZBLUnrgS233JL+/ftz0003FaYdeOCB/O///m/heVlZGQDbb789n3zyCe+88w477bQTvXv35qqrrqoU1I899hjz5s1j8eLFPPDAA/Tq1Yu+ffvywAMPsGjRIhYuXMj999+/wuEzmzZtypIlS1Zac9OmTTnvvPMK57oBDjroIK677rrCl4LyQ+sbbLAB22+/PWPGjKFnz5706dOnUs3vvfceW2+9NWeccQann346EydOrLSt5s2bc/rpp3POOefw9ddfAzBr1ixuv/32SstVNdzm4MGDCzXWxcVvBrUkrSd+9rOf8cknnxSeDx8+nNLSUoqLi9ljjz0YMWJEYV6PHj3YbbfdAOjTpw8ffPABvXv3Lszv3bs3AwcOpKSkhGOPPZauXbvSuXNnBg8eTPfu3enRowdDhgyhU6dOVdYydOhQiouLV3gxWbnTTz+dpUuXFp5fdNFFLFmyhOLiYoqKirjooosK8/r06cM222xD8+bN6dOnDzNnziwE9VNPPUVJSQmdOnXi3nvv5dxzz/3Wti699FJatWrFHnvsQVFREUcffTStWrWqtMxWW21Fr169KCoq4vzzzwdyF521b9++zsbFdphLaT3nMJd1p6EOczlq1ChKS0sr9cbXZ4sWLaJjx45MnDiRzTar+k/wajLMpT1qSZLW0OOPP067du348Y9/vMKQrimv+pYkrZbBgwevlTuFrQsOOOCAShe71QV71JIkZZhBLUlShhnUkiRlmEEtSVKGGdSS1ICV3/Kz3KhRozj77LPrdJsPPPBApVuT1tTbb7/NoYceyi677EL79u3p378/s2fPprS0lHPOOQfI/Z30888/X2vbzBKv+paktWR1/ma9OrL6d+0PPPAAhx9+OHvssUeN2/ryyy857LDDuPrqqzniiCMAePLJJ5kzZw5du3ala9fcnyE/9dRTbLLJJuy999413mbW2KOWpPXU3/72N3r06EGnTp044IADmD17NgDDhg1j0KBBHHjggbRp04b77ruPX/ziF3Ts2JGDDz64cOvPNm3a8Mtf/pLu3bvTvXt3pk6dyvPPP8+DDz7I+eefT0lJCdOmTaOsrIyePXtSXFzMMcccw6effgpAv379Cuvvtttu3xp6EuCOO+5gr732KoQ0wL777ktRURFPPfUUhx9+ONOnT2fEiBFcc801lJSUMH78eNq2bVuo87PPPqNNmzarvGVpVhnUktSAlY/7XP7z3//934V5vXv35sUXX+TVV1/lxBNP5Pe//31h3rRp03j44YcZO3Ys3//+99l33315/fXX2WijjXj44YcLy7Vo0YKXX36Zs88+m5/85CfsvffeHHnkkVx55ZWUlZWx8847c8opp3DFFVcwadIkOnbsWGn86qVLl/Lyyy9z7bXXVppebvLkyYWBNVakTZs2nHnmmZx33nmUlZXRp08f+vXrV6jzrrvu4thjj6Vp06ZrvB/rk4e+JakBKx/3uVz57T8BZs6cyQknnMCsWbP4+uuvadu2bWG5Qw45hKZNm9KxY0eWLVvGwQcfDEDHjh2ZPn16YbmTTjqp8O955533re0vWLCA+fPns88++wAwaNAgjj/++ML8733vewB06dKlUrs1NWTIEH7/+99z9NFHM3LkSG644YZaa3tts0ctSeupH//4x5x99tm8/vrr/OUvf+HLL78szNtwww0BaNSoEU2bNi0Ma9moUaNKg2RUHO5y+aEvq6N8O40bN67UbrkOHTowYcKE1W63V69eTJ8+naeffpply5ZRVFS02m1khUEtSeupBQsWsN122wFwyy23rFEbf/3rXwv/7rXXXkDloSA322wztthii8L559tuu63Qu66Ok08+meeff77S4fZHH32U11+vfCFdVcNPnnLKKZx00kmLdZ+cAAAb6klEQVR1NqrV2mJQS9J6atiwYRx//PH06dOHli1brlEbX331FT169OCPf/wj11xzDQAnnngiV155JZ06dWLatGnccsstnH/++RQXF1NWVlbpPPmqbLTRRjz00ENcd9117Lrrruyxxx6MGjWKrbfeutJyRxxxBPfff3/hYjKAAQMG8OmnnxYOz6+rHOZSWs85zGXdaajDXJZr06YNpaWlaxzyde2ee+5h7Nix3HbbbfVdSo2GufRiMklSg/PjH/+Yv//97zzyyCP1XUqNGdSSpDVSm1dp17brrruuvkuoNZ6jliQpwwxqSZIyzKCWJCnDDGpJkjLMoJakBmz5YS7Xti+++IIf/OAH7LzzznTo0IG+ffvy0ksvARRGupo+fTp33HHHarf98ssv07dvX3bffXfatWvHkCFDWLRoUa3WP2rUKD788MNabXN1edW3JK0lb7ar3b+pbv/Wm7XaXl0YMmQIbdu25Z133qFRo0a8++67vPlmru7y8aPLg/rkk0+udruzZ8/m+OOP56677mKvvfYipcS9997L559/TvPmzWut/lGjRlFUVMS2225ba22urlX2qCPi5oj4OCImV5g2LCI+iIiy/M+hFeb9KiKmRsS/IuKguipcklQ9X3zxBfvvvz+dO3emY8eOjB07FsgFZHlPtKioiAEDBvD444/Tq1cvdt11V15++WUAFi5cyGmnnUa3bt3o1KlTYf0pU6bQvXt3SkpKKC4u5p133qm03WnTpvHSSy9x6aWX0qhRLm522mknDjvsMOA/vf0LLriA8ePHU1JSwjXXXEOfPn0qDSTSq1cvJk2aVKnt//u//2PQoEGF25ZGBMcddxzbbLMN8+bN4+ijj6a4uJiePXsW1h02bBhXXXVVoY2ioiKmT5/O9OnTad++PWeccQYdOnTgwAMPZPHixdxzzz2UlpYyYMAASkpKePjhhznmmGMK6z/22GOFQUXqUnUOfY8CDq5i+jUppZL8zyMAEbEHcCLQIb/OnyKicW0VK0lafc2aNeP+++9n4sSJPPnkk/zsZz+j/K6UU6dO5dxzz2XSpEm89dZb3HHHHTz77LNcddVVXHbZZQD89re/Zb/99uOVV17hySef5Pzzz2fhwoWMGDGCc889l7KyMkpLS2ndunWl7U6ZMoWSkhIaN155DFx++eWFcD7vvPMYMmQIo0aNAuDtt9/mq6++ori4uNI6Kxv+8uKLL6ZTp05MmjSJyy67jFNOOWWV++idd97hRz/6EVOmTGHzzTfn3nvv5bjjjqNr166MHj2asrIyDj30UN58803mzJkDwMiRI9fKfcRXGdQppWeAedVs7yjgrpTSVymlfwNTge41qE+SVEMpJX79619TXFzMAQccwAcffMDs2bMBaNu2LR07dqRRo0Z06NCB/fffn4ioNJzluHHjuPzyyykpKaFfv358+eWXzJgxg7322ovLLruMK664gvfee4+NNtqoVuo9/vjjeeihh1iyZAk333wzgwcPXq31n332WQYOHAjAfvvtx9y5c1mwYMFK12nbti0lJSXAiofcjAgGDhzI7bffzvz583nhhRc45JBDVqu2NVGTc9RnR8QpQCnws5TSp8B2wIsVlpmZnyZJqiejR49mzpw5TJgwgaZNm9KmTZvCkJblw0xCbgjLisNblg87WX7+d/fdd6/Ubvv27enRowcPP/wwBx10EDfeeCP77bdfYX6HDh147bXX+OabbwqHvqujefPmfPe732Xs2LGMGTOGqsaCKB/+8qijjvrWvKrGsIgImjRpwjfffFOYVtWwnpAbcnPx4sVV1nbqqadyxBFH0KxZM44//niaNKn7S73W9KrvPwM7AyXALOAP+elVDUZa5agfETE0IkojorT8MIIkqfYtWLCArbfemqZNm/Lkk0/y3nvvrdb6Bx10ENddd10hAF999VUA3n33XXbaaSfOOeccjjzyyG+dR955553p2rUrF198cWHdd955p3COu1xVQ1QOGTKEc845h27durHlllt+q6azzz6bW265pXAFOcDtt9/ORx99RN++fRk9ejQATz31FC1btqRFixa0adOGiRMnAjBx4kT+/e9/r/K1L1/btttuy7bbbsull1662j39NbVGQZ1Smp1SWpZS+ga4gf8c3p4JbF9h0dZAlde1p5SuTyl1TSl1bdWq1ZqUIUlaiaVLl7LhhhsyYMAASktLC+db27Vrt1rtXHTRRSxZsoTi4mKKioq46KKLgNwY1EVFRZSUlPDWW29VeS74xhtv5KOPPmKXXXahY8eOnHHGGd+6grq4uJgmTZqw5557FobK7NKlCy1atFjhOeBtttmGu+66i5///OfsvvvutG/fnvHjx9OiRQuGDRtGaWkpxcXFXHDBBYWxto899ljmzZtHSUkJf/7zn9ltt91W+doHDx7MmWeeSUlJSaGXPWDAALbffnv22GOP6u/EGqjWMJcR0QZ4KKVUlH/+nZTSrPzj84AeKaUTI6IDcAe54N4W+Cewa0pp2crad5hLqf44zGXdqe9hLl977TXOOOOMwtXb65IPP/yQfv368dZbb63WYfO14eyzz6ZTp06cfvrp1V6nToe5jIg7gX5Ay4iYCVwM9IuIEnKHtacDPwBIKU2JiDHAG8BS4EerCmlJUu0bMWIEw4cP59prr63vUlbbrbfeyoUXXsjVV1+duZDu0qULG2+8MX/4wx9WvXAtqVaPuq7Zo5bqjz3qulPfPWplR0161Nn6qiJJkioxqCWpDmXhqKXqV03fAwa1JNWRZs2aMXfuXMN6PZZSYu7cuTRr1myN23BQDkmqI61bt2bmzJl4r4j1W7Nmzb51e9XVYVBLUh1p2rQpbdu2re8ytI7z0LckSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZ5r2+JamOdbylY7WXfX3Q63VYidZF9qglScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDFtlUEfEzRHxcURMrjDtyoh4KyImRcT9EbF5fnqbiFgcEWX5nxF1WbwkSQ1ddXrUo4CDl5v2GFCUUioG3gZ+VWHetJRSSf7nzNopU5Kk9dMqgzql9Awwb7lp41JKS/NPXwRa10FtkiSt92rjHPVpwN8rPG8bEa9GxNMR0acW2pckab3VpCYrR8SFwFJgdH7SLGCHlNLciOgCPBARHVJKn1Wx7lBgKMAOO+xQkzIkSWqw1rhHHRGDgMOBASmlBJBS+iqlNDf/eAIwDditqvVTStenlLqmlLq2atVqTcuQJKlBW6OgjoiDgV8CR6aUFlWY3ioiGucf7wTsCrxbG4VKkrQ+WuWh74i4E+gHtIyImcDF5K7y3hB4LCIAXsxf4d0X+E1ELAWWAWemlOZV2bAkSVqlVQZ1SumkKibftIJl7wXurWlRkiQpxzuTSZKUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGFN6rsAaUU63tKx2su+Puj1OqxEkuqPPWpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMqxaQR0RN0fExxExucK0LSPisYh4J//vFvnpERHDI2JqREyKiM51VbwkSQ1ddXvUo4CDl5t2AfDPlNKuwD/zzwEOAXbN/wwF/lzzMiVJWj9VK6hTSs8A85abfBRwS/7xLcDRFabfmnJeBDaPiO/URrGSJK1vanKOepuU0iyA/L9b56dvB7xfYbmZ+WmSJGk11cXFZFHFtPSthSKGRkRpRJTOmTOnDsqQJGndV5Ognl1+SDv/78f56TOB7Sss1xr4cPmVU0rXp5S6ppS6tmrVqgZlSJLUcNUkqB8EBuUfDwLGVph+Sv7q757AgvJD5JIkafU0qc5CEXEn0A9oGREzgYuBy4ExEXE6MAM4Pr/4I8ChwFRgEXBqLdcsSdJ6o1pBnVI6aQWz9q9i2QT8qCZFSZKkHO9MJklShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGNanvAiTVsmGbrd7ybXeomzok1Qp71JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGXYGl/1HRG7A3+tMGkn4L+BzYEzgDn56b9OKT2yxhVKkrQeW+OgTin9CygBiIjGwAfA/cCpwDUppatqpUJJktZjtXXoe39gWkrpvVpqT5IkUXtBfSJwZ4XnZ0fEpIi4OSK2qGqFiBgaEaURUTpnzpyqFpEkab1X46COiA2AI4G785P+DOxM7rD4LOAPVa2XUro+pdQ1pdS1VatWNS1DkqQGqTZ61IcAE1NKswFSSrNTSstSSt8ANwDda2EbkiStl2rjXt8nUeGwd0R8J6U0K//0GGByLWxDkrJlde6p7v3UVQM1CuqIaA58F/hBhcm/j4gSIAHTl5snSZJWQ42COqW0CNhquWkDa1SRJEkq8M5kkiRlmEEtSVKG1cbFZJIk1auOt3Ss9rKvD3q9DiupffaoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowg1qSpAwzqCVJyjCDWpKkDDOoJUnKMINakqQMM6glScowx6OWJGXTsM2qv2zbHequjnpmj1qSpAwzqCVJyjCDWpKkDDOoJUnKMC8m09rlxSGStFrsUUuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYd6ZbA11vKVjtZd9fdDrdViJJKkhs0ctSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZViTmjYQEdOBz4FlwNKUUteI2BL4K9AGmA70Tyl9WtNtSZK0vqmtHvW+KaWSlFLX/PMLgH+mlHYF/pl/LkmSVlNdHfo+Crgl//gW4Og62o4kSQ1abQR1AsZFxISIGJqftk1KaRZA/t+ta2E7kiStd2p8jhrolVL6MCK2Bh6LiLeqs1I+1IcC7LDDDrVQhiRJDU+Ne9QppQ/z/34M3A90B2ZHxHcA8v9+XMV616eUuqaUurZq1aqmZUiS1CDVKKgjYuOI2LT8MXAgMBl4EBiUX2wQMLYm25EkaX1V00Pf2wD3R0R5W3eklB6NiFeAMRFxOjADOL6G25Ekab1Uo6BOKb0L7FnF9LnA/jVpW5IkeWcySZIyzaCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjLMoJYkKcMMakmSMsygliQpwwxqSZIyzKCWJCnDDGpJkjJsjYM6IraPiCcj4s2ImBIR5+anD4uIDyKiLP9zaO2VK0nS+qVJDdZdCvwspTQxIjYFJkTEY/l516SUrqp5eZIkrd/WOKhTSrOAWfnHn0fEm8B2tVWYJEmqpXPUEdEG6AS8lJ90dkRMioibI2KL2tiGJEnro5oc+gYgIjYB7gV+klL6LCL+DPwPkPL//gE4rYr1hgJDAXbYYYeallE7hm1W/WXbZqRmSVKDVqMedUQ0JRfSo1NK9wGklGanlJallL4BbgC6V7VuSun6lFLXlFLXVq1a1aQMSZIarJpc9R3ATcCbKaWrK0z/ToXFjgEmr3l5kiSt32py6LsXMBB4PSLK8tN+DZwUESXkDn1PB35QowolSVqP1eSq72eBqGLWI2tejiRJqsg7k0mSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGEGtSRJGWZQS5KUYTUePUtqc8HD1V52erM6LESSGiB71JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShjnMpRqEN9u1X63l27/1Zh1VIkm1yx61JEkZZlBLkpRhBrUkSRlmUEuSlGFeTCap2lbnoj0v2JNqhz1qSZIyzB61JGmtaXPBw9VednqzOixkHWKPWpKkDLNHLa0D7IVI6y971JIkZZhBLUlShhnUkiRlmEEtSVKGGdSSJGWYQS1JUoYZ1JIkZZhBLUlShhnUkiRlmHcmkyRW7+5vUHd3gHOEMi3PHrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlWZ0EdEQdHxL8iYmpEXFBX25EkqSGrk6COiMbA/wGHAHsAJ0XEHnWxLUmSGrK66lF3B6amlN5NKX0N3AUcVUfbkiSpwaqroN4OeL/C85n5aZIkaTXU1b2+o4ppqdICEUOBofmnX0TEv+qoljoyuSXwSXWWjMFV7Y710+rtierv49U+rxIN93ey+q+sjvZzA97HkJH3svu4gnXyM3nH6ixUV0E9E9i+wvPWwIcVF0gpXQ9cX0fbr3MRUZpS6lrfdTRk7uO1w/1c99zHda8h7+O6OvT9CrBrRLSNiA2AE4EH62hbkiQ1WHXSo04pLY2Is4F/AI2Bm1NKU+piW5IkNWR1Nh51SukR4JG6aj8D1tnD9usQ9/Ha4X6ue+7jutdg93GklFa9lCRJqhfeQlSSpAwzqCVJyjCDWpKkDKuzi8kaoojYhtwd1hLwYUppdj2X1CBFxJZASil9Wt+1NFTu47rn50XdW1/2sReTVUNElAAjgM2AD/KTWwPzgR+mlCbWV20NRUTsAPwe2J/cfg2gBfAEcEFKaXr9VdcwuI/XDj8v6t76to8N6mqIiDLgBymll5ab3hP4S0ppz/qprOGIiBeAa4F7UkrL8tMaA8cDP0kp9azP+hoC9/Ha4edF3Vvf9rFBXQ0R8U5KadcVzJuaUtplbdfU0KxiH69wnqrPfbx2+HlR99a3few56ur5e0Q8DNzKf0YF2x44BXi03qpqWCZExJ+AW6i8jwcBr9ZbVQ2L+3jt8POi7q1X+9gedTVFxCHkxtTejty5vZnAg/k7sKmG8veEP53K+/h94G/ATSmlr+qxvAbBfbz2+HlR99anfWxQS5KUYf4ddQ3lx9VWHYqIw+u7hobOfbx2+HlR9xriPjaoay4zI5A3YN3qu4D1gPt47fDzou41uH3soe9qioh25M6FvJRS+qLC9INTSg3u4oX6EBHdyd2E45WI2AM4GHirIZ5zyoqIuDWldEp919GQRURvoDswOaU0rr7raQgiogfwZkrps4jYCLgA6Ay8AVyWUlpQrwXWMoO6GiLiHOBHwJtACXBuSmlsft7ElFLn+qyvIYiIi4FDyP0lwmNAD+Ap4ADgHyml39ZfdQ1DRDy4/CRgX3I3PCGldORaL6oBioiXU0rd84/PIPfZcT9wIPC3lNLl9VlfQxARU4A9U0pLI+J6YBFwD7mb+eyZUvpevRZYywzqaoiI14G9UkpfREQbcm+I21JKf4yIV1NKneq1wAYgv49LgA2Bj4DWFb4tv5RSKq7XAhuAiJhIrsdxI7lbLgZwJ3AiQErp6fqrruGo+JkQEa8Ah6aU5kTExsCLKaWO9Vvhui8i3kwptc8/rtRZioiylFJJ/VVX+zxHXT2Nyw9352+z2A84JCKupgGeD6knS1NKy1JKi4BpKaXPAFJKi4Fv6re0BqMrMAG4EFiQUnoKWJxSetqQrlWNImKLiNiKXGdoDkBKaSGwtH5LazAmR8Sp+cevRURXgIjYDVhSf2XVDYO6ej7K31sWgHxoHw60BPx2XDu+jojm+cddyidGxGYY1LUipfRNSuka4FTgwoj4X7zpUV3YjNwXolJgy4j4L4CI2AS/2NeWIcA+ETEN2AN4ISLeBW7Iz2tQPPRdDRHRmlyP76Mq5vVKKT1XD2U1KBGxYVU33IiIlsB3Ukqv10NZDVpEHAb0Sin9ur5rWR/kv4huk1L6d33X0lBExKbATuS+cM509CxJkrTWeehbkqQMM6glScowg1qSpAwzqCVJyjCDWmpAIuIX+TvpERHXRMQT+cf7R8TtEfHniCiNiCkRcUl+3iERMaZCG/0i4m/5xwdGxAsRMTEi7s7/iZGktciglhqWZ4A++cddgU0ioinQGxgPXJhS6goUk/s71GJyt2ztmb9zFsAJwF/zfxr3/4AD8nd+KgV+uvZeiiQwqKWGZgLQJf/3pV8BL5AL7D7kgrp//lairwIdgD1SSkuBR4EjIqIJcBgwFuhJ7mYSz0VEGTAI2HEtvx5pveddiaQGJKW0JCKmk7v72PPAJHIDb+wMLAZ+DnRLKX0aEaOAZvlV/0pu8Ih5wCsppc8jIoDHUkonrd1XIakie9RSw/MMuUB+hlwv+kygDGgBLAQWRMQ25EYrK/cUuWECzyAX2gAvAr0iYhfI3Vkrfy9lSWuRQS01POOB7wAv5G+p+CUwPqX0GrlD3lOAm4HCrW9TSsuAh8iF90P5aXOAwcCdETGJXHC3W3svQxJ4C1FJkjLNHrUkSRlmUEuSlGEGtSRJGWZQS5KUYQa1JEkZZlBLkpRhBrUkSRlmUEuSlGH/H9SeMf7C20gQAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "t.unstack(0)[['York County','Newport News City','Hampton City','James City County']].plot(kind='bar',figsize=(8, 8))\n", "plt.title(\"Number of Intercepted Anglers in Dataset\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also stack data we have unstacked:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "wave county_intercept \n", "2.0 Accomack County 101.0\n", " Chesapeake City 10.0\n", " Hampton City 59.0\n", " Isle of Wight County 17.0\n", " Newport News City 25.0\n", " Norfolk City 18.0\n", " Northampton County 7.0\n", " Virginia Beach City 382.0\n", " Westmoreland County 5.0\n", "3.0 Accomack County 172.0\n", " Chesapeake City 8.0\n", " Essex County 19.0\n", " Gloucester County 40.0\n", " Hampton City 108.0\n", " James City County 13.0\n", " Mathews County 20.0\n", " Middlesex County 64.0\n", " Newport News City 63.0\n", " Norfolk City 125.0\n", " Northampton County 73.0\n", " Northumberland County 11.0\n", " Poquoson City 23.0\n", " Portsmouth City 31.0\n", " Richmond County 21.0\n", " Suffolk City 11.0\n", " Surry County 2.0\n", " Virginia Beach City 427.0\n", " Westmoreland County 5.0\n", " York County 19.0\n", "4.0 Accomack County 181.0\n", " ... \n", "5.0 Isle of Wight County 9.0\n", " James City County 5.0\n", " Mathews County 2.0\n", " Middlesex County 63.0\n", " Newport News City 74.0\n", " Norfolk City 98.0\n", " Northampton County 49.0\n", " Northumberland County 5.0\n", " Poquoson City 12.0\n", " Richmond County 4.0\n", " Suffolk City 9.0\n", " Surry County 2.0\n", " Virginia Beach City 341.0\n", " York County 5.0\n", "6.0 Accomack County 1.0\n", " Chesapeake City 33.0\n", " Essex County 15.0\n", " Gloucester County 16.0\n", " Hampton City 72.0\n", " Mathews County 15.0\n", " Middlesex County 28.0\n", " Newport News City 65.0\n", " Norfolk City 40.0\n", " Northampton County 60.0\n", " Northumberland County 4.0\n", " Poquoson City 14.0\n", " Portsmouth City 20.0\n", " Virginia Beach City 314.0\n", " Westmoreland County 3.0\n", " York County 10.0\n", "Length: 84, dtype: float64" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t_unstacked = t.unstack(0)\n", "t_unstacked.stack()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pivot Table\n", "\n", "Borrowing from excel, pandas can create pivot tables of summary statistics. Suppose we want counties as rows and waves as columns, with the average and standard deviation of 2 and 12 month avidity. " ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanstd
ffdays12ffdays2ffdays12ffdays2
wave2.03.04.05.06.02.03.04.05.06.02.03.04.05.06.02.03.04.05.06.0
county_intercept
Accomack County21.56449.6627911.11616.7502.772282.29073.524864.421035.4817.511619.149237.3195-7.3064.695076.130598.09406-
Chesapeake City13.926.6253623.333345.03033.25.7584.6666710.060616.292134.591213.41645.773538.89454.10427.7597901.15479.45033
Essex County-25.421160.6667-12.0667-1.5789512-2.73333-38.413459.0028-11.4671-2.6523715.6205-3.78845
Gloucester County-23.57544.62549.485739.0625-4.0589.742866.5-32.05263.0385166.51248.0354-5.5050111.356417.40075.68038
Hampton City38.016922.96327.28921.900740.45832.745763.611114.907513.425535.45833131.30497.6488108.98889.3672163.96512.781410.195112.37989.4885116.6783
Isle of Wight County24.4706--26.7778-3--6.11111-42.891--32.8663-7.6567--11.0277-
James City County-23.461531.81824.2--3.846156.636361.4--34.896644.42483.42053--5.444365.390231.67332-
King William County--9.66667----5----9.52628----6.38357--
Mathews County-66.4582.33332121.3333-7.111.83333.57.2-220.903249.33712.727926.9064-21.608724.08622.121328.32552
Middlesex County-10.609424.966717.761924.9643-1.984385.254.460324.85714-15.154937.880730.056231.1121-2.751587.060575.871825.13315
Newport News City96.8840.87312.51916.47329.10778.441.714293.113924.486494.55385272.163175.23720.666417.770439.943826.97543.294323.265943.738895.50009
Norfolk City25.722223.47214.358525.244926.0751.777782.4083.044036.673474.87530.994994.870632.053832.603924.63412.510119.630344.469798.926714.83145
Northampton County37.285715.904114.398128.408218.41673.428573.273974.916676.551022.5333353.337624.74619.512737.211121.58964.503974.292316.729946.964383.47127
Northumberland County-2.181828.12523.6749.25-0.1818181.3754.673.5-5.9298910.72324.4704498.167-0.6030232.722264.21949
Poquoson City-53.521734.72732211.7857-7.608710.18187.833332.5-65.934524.755216.754910.4527-6.443744.956175.749842.71038
Portsmouth City-12.967716.6154-47.15-3.451615.46154-10.45-14.086613.4632-54.047-5.784114.31307-13.9302
Richmond County-14.857126--421.75--23.55482.30943.26599--4.626012.30941.70783-
Suffolk City-7.6363612.61549--2.727273.846154.22222--6.3288715.72446.80074--3.608074.687683.66667-
Surry County-0687.5--0522--05.6568588.3883--04.2426419.799-
Virginia Beach City28.086429.456718.890820.079231.76752.54453.601873.764564.944285.5382242.524891.258974.738961.2486100.9236.589736.741079.049499.296428.54943
Westmoreland County25.420.61.25-0.33333382.80.25-032.41614.02852.5-0.5773510.95452.167950.5-0
York County-29.947436.22221141.5-4.789477.777783.86.6-33.155337.874112.569840.6072-3.8525510.10915.019967.8344
\n", "
" ], "text/plain": [ " mean \\\n", " ffdays12 ffdays2 \n", "wave 2.0 3.0 4.0 5.0 6.0 2.0 \n", "county_intercept \n", "Accomack County 21.5644 9.66279 11.116 16.7 50 2.77228 \n", "Chesapeake City 13.9 26.625 36 23.3333 45.0303 3.2 \n", "Essex County - 25.4211 60.6667 - 12.0667 - \n", "Gloucester County - 23.575 44.625 49.4857 39.0625 - \n", "Hampton City 38.0169 22.963 27.289 21.9007 40.4583 2.74576 \n", "Isle of Wight County 24.4706 - - 26.7778 - 3 \n", "James City County - 23.4615 31.8182 4.2 - - \n", "King William County - - 9.66667 - - - \n", "Mathews County - 66.45 82.3333 21 21.3333 - \n", "Middlesex County - 10.6094 24.9667 17.7619 24.9643 - \n", "Newport News City 96.88 40.873 12.519 16.473 29.1077 8.44 \n", "Norfolk City 25.7222 23.472 14.3585 25.2449 26.075 1.77778 \n", "Northampton County 37.2857 15.9041 14.3981 28.4082 18.4167 3.42857 \n", "Northumberland County - 2.18182 8.125 23.6 749.25 - \n", "Poquoson City - 53.5217 34.7273 22 11.7857 - \n", "Portsmouth City - 12.9677 16.6154 - 47.15 - \n", "Richmond County - 14.8571 2 6 - - \n", "Suffolk City - 7.63636 12.6154 9 - - \n", "Surry County - 0 6 87.5 - - \n", "Virginia Beach City 28.0864 29.4567 18.8908 20.0792 31.7675 2.5445 \n", "Westmoreland County 25.4 20.6 1.25 - 0.333333 8 \n", "York County - 29.9474 36.2222 11 41.5 - \n", "\n", " std \\\n", " ffdays12 \n", "wave 3.0 4.0 5.0 6.0 2.0 3.0 \n", "county_intercept \n", "Accomack County 2.2907 3.52486 4.42 10 35.48 17.5116 \n", "Chesapeake City 5.75 8 4.66667 10.0606 16.2921 34.5912 \n", "Essex County 1.57895 12 - 2.73333 - 38.4134 \n", "Gloucester County 4.05 8 9.74286 6.5 - 32.052 \n", "Hampton City 3.61111 4.90751 3.42553 5.45833 131.304 97.6488 \n", "Isle of Wight County - - 6.11111 - 42.891 - \n", "James City County 3.84615 6.63636 1.4 - - 34.8966 \n", "King William County - 5 - - - - \n", "Mathews County 7.1 11.8333 3.5 7.2 - 220.903 \n", "Middlesex County 1.98438 5.25 4.46032 4.85714 - 15.1549 \n", "Newport News City 1.71429 3.11392 4.48649 4.55385 272.163 175.237 \n", "Norfolk City 2.408 3.04403 6.67347 4.875 30.9949 94.8706 \n", "Northampton County 3.27397 4.91667 6.55102 2.53333 53.3376 24.746 \n", "Northumberland County 0.181818 1.375 4.6 73.5 - 5.92989 \n", "Poquoson City 7.6087 10.1818 7.83333 2.5 - 65.9345 \n", "Portsmouth City 3.45161 5.46154 - 10.45 - 14.0866 \n", "Richmond County 4 2 1.75 - - 23.5548 \n", "Suffolk City 2.72727 3.84615 4.22222 - - 6.32887 \n", "Surry County 0 5 22 - - 0 \n", "Virginia Beach City 3.60187 3.76456 4.94428 5.53822 42.5248 91.2589 \n", "Westmoreland County 2.8 0.25 - 0 32.416 14.0285 \n", "York County 4.78947 7.77778 3.8 6.6 - 33.1553 \n", "\n", " \\\n", " ffdays2 \n", "wave 4.0 5.0 6.0 2.0 3.0 4.0 \n", "county_intercept \n", "Accomack County 19.1492 37.3195 - 7.306 4.69507 6.13059 \n", "Chesapeake City 13.4164 5.7735 38.8945 4.1042 7.75979 0 \n", "Essex County 59.0028 - 11.4671 - 2.65237 15.6205 \n", "Gloucester County 63.0385 166.512 48.0354 - 5.50501 11.3564 \n", "Hampton City 108.988 89.3672 163.965 12.7814 10.1951 12.3798 \n", "Isle of Wight County - 32.8663 - 7.6567 - - \n", "James City County 44.4248 3.42053 - - 5.44436 5.39023 \n", "King William County 9.52628 - - - - 6.38357 \n", "Mathews County 249.337 12.7279 26.9064 - 21.6087 24.0862 \n", "Middlesex County 37.8807 30.0562 31.1121 - 2.75158 7.06057 \n", "Newport News City 20.6664 17.7704 39.9438 26.9754 3.29432 3.26594 \n", "Norfolk City 32.0538 32.6039 24.6341 2.51011 9.63034 4.46979 \n", "Northampton County 19.5127 37.2111 21.5896 4.50397 4.29231 6.72994 \n", "Northumberland County 10.723 24.4704 498.167 - 0.603023 2.72226 \n", "Poquoson City 24.7552 16.7549 10.4527 - 6.44374 4.95617 \n", "Portsmouth City 13.4632 - 54.047 - 5.78411 4.31307 \n", "Richmond County 2.3094 3.26599 - - 4.62601 2.3094 \n", "Suffolk City 15.7244 6.80074 - - 3.60807 4.68768 \n", "Surry County 5.65685 88.3883 - - 0 4.24264 \n", "Virginia Beach City 74.7389 61.2486 100.923 6.58973 6.74107 9.04949 \n", "Westmoreland County 2.5 - 0.57735 10.9545 2.16795 0.5 \n", "York County 37.8741 12.5698 40.6072 - 3.85255 10.1091 \n", "\n", " \n", " \n", "wave 5.0 6.0 \n", "county_intercept \n", "Accomack County 8.09406 - \n", "Chesapeake City 1.1547 9.45033 \n", "Essex County - 3.78845 \n", "Gloucester County 17.4007 5.68038 \n", "Hampton City 9.48851 16.6783 \n", "Isle of Wight County 11.0277 - \n", "James City County 1.67332 - \n", "King William County - - \n", "Mathews County 2.12132 8.32552 \n", "Middlesex County 5.87182 5.13315 \n", "Newport News City 3.73889 5.50009 \n", "Norfolk City 8.92671 4.83145 \n", "Northampton County 6.96438 3.47127 \n", "Northumberland County 4.219 49 \n", "Poquoson City 5.74984 2.71038 \n", "Portsmouth City - 13.9302 \n", "Richmond County 1.70783 - \n", "Suffolk City 3.66667 - \n", "Surry County 19.799 - \n", "Virginia Beach City 9.29642 8.54943 \n", "Westmoreland County - 0 \n", "York County 5.01996 7.8344 " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(trips_fips.loc['VA'],index='county_intercept',columns='wave',\n", " values=['ffdays2','ffdays12'],aggfunc=[np.mean, np.std],fill_value=\"-\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting Data from Pandas\n", "\n", "You may want to export dataframes and other information to other packages (e.g. latex, excel, or stata). Pandas has a lot of `to_` methods on pandas dataframes and series. For example we can export some interesting information to latex table code: " ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "(t.unstack().head(25)).to_latex('excel_example.tex',na_rep='-')" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Or, we can save the same information to excel:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "(t.unstack().head(25)).to_excel('excel_example.xls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Econometrics in Python" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ideducln_wagepexptimeabilitymeducfeducbroken_homesiblingspexp2
04122.14240.261210144
16151.91440.4412160216
28132.32840.5112151264
311141.64141.821617121
412132.1664-1.3013120536
\n", "
" ], "text/plain": [ " id educ ln_wage pexp time ability meduc feduc broken_home \\\n", "0 4 12 2.14 2 4 0.26 12 10 1 \n", "1 6 15 1.91 4 4 0.44 12 16 0 \n", "2 8 13 2.32 8 4 0.51 12 15 1 \n", "3 11 14 1.64 1 4 1.82 16 17 1 \n", "4 12 13 2.16 6 4 -1.30 13 12 0 \n", "\n", " siblings pexp2 \n", "0 4 4 \n", "1 2 16 \n", "2 2 64 \n", "3 2 1 \n", "4 5 36 " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statsmodels.formula.api as smf \n", "\n", "# load data and create dataframe\n", "tobias_koop=pd.read_csv('https://rlhick.people.wm.edu/econ407/data/tobias_koop_t_4.csv')\n", "\n", "tobias_koop.head()" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: ln_wage R-squared: 0.166\n", "Model: OLS Adj. R-squared: 0.163\n", "Method: Least Squares F-statistic: 51.36\n", "Date: Mon, 12 Mar 2018 Prob (F-statistic): 1.83e-39\n", "Time: 15:04:02 Log-Likelihood: -583.66\n", "No. Observations: 1034 AIC: 1177.\n", "Df Residuals: 1029 BIC: 1202.\n", "Df Model: 4 \n", "Covariance Type: nonrobust \n", "===============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "-------------------------------------------------------------------------------\n", "Intercept 0.4603 0.137 3.353 0.001 0.191 0.730\n", "educ 0.0853 0.009 9.179 0.000 0.067 0.104\n", "pexp 0.2035 0.024 8.629 0.000 0.157 0.250\n", "pexp2 -0.0124 0.002 -5.438 0.000 -0.017 -0.008\n", "broken_home -0.0087 0.036 -0.244 0.807 -0.079 0.061\n", "==============================================================================\n", "Omnibus: 55.892 Durbin-Watson: 1.761\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 112.050\n", "Skew: -0.355 Prob(JB): 4.66e-25\n", "Kurtosis: 4.448 Cond. No. 391.\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "formula = \"ln_wage ~ educ + pexp + pexp2 + broken_home\"\n", "results = smf.ols(formula,tobias_koop).fit()\n", "print(results.summary())" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }