Tapping MariaDB / MySQL data from Ipython
In this short post, I will outline how one can access data stored in a database like MariaDB or MySQL for analysis inside an Ipython Notebook. There are many reasons why you might want to store your data in a proper database. For me the most important are:
-
All of my data resides in a password protected and more secure place than having a multitude of csv, mat, and dta files scattered all over my file system.
-
If you access the same data for multiple projects, any changes to the underlying data will be propagated to your analysis, without having to update copies of project data.
-
Having data in a central repository makes backup and recover significantly easier.
-
This allows for two-way interaction with your database. You can read and write tables from/to your database. Rather than use SQL, you can create database tables using pandas/ipython.
In what follows, I focus on techniques specifically for MariaDB and MySQL, although similar steps would work with other database products out there.
Instructions for doing this in Stata
First, load the python libraries we'll need:
%matplotlib inline
import MySQLdb as mariadb
import getpass # so we don't need to store passwords on disk
import pandas as pd
The following code uses the getpass package to record password information, and sets up the database connection to a mariadb/mysql database on localhost. Once the connection is established, it runs the query
select * from ipython_out
for copying the table ipython_out
to the pandas dataframe data
. Note, you need to supply your own username (myusername
) and database name (dbname
).
# importing via mariadb directly into pandas dataframe:
password = getpass.getpass() # asks for password in the console window so we don't store it here
conn = mariadb.connect('localhost','myusername',password,'dbname');
data = pd.io.sql.read_sql('select * from ipython_out', conn)
conn.close()
data.shape
As can be seen above, this table (now a dataframe in ipython) has over 200k rows and 27 columns. Let's grab a list of columns:
data.columns.values
data.head()
Of course, with the data in ipython, we can do lots of things like plotting:
data.total_score.plot(kind='hist')