Using stata_kernel for reproducible research goodness

This post is hopefully the last in a series of posts outlining how to use Stata in a proper dynamic document/reproducible research setting. As of the summer of 2020, I am only using stata_kernel for my own work and no longer recommend using my customized ob-ipython.el for reasons described here. This method allows users to create dynamic documents using either

  1. Jupyter Lab (or Notebook)
  2. Emacs Org-mode

This post shows the installation steps to get this working and some usability recommendations if using Org-mode.

Python Preliminaries

The stata_kernel needs a proper python installation with Jupyter installed. My preferred method of doing this is to use Anaconda Python along with several python libraries. The stata_kernel install page provides full instructions for installing Anaconda python and all packages necessary for getting this running (note: the mechanics of doing this is outlined below).

This part goes into a bit more detail on how to actually perform the installation steps after you've installed Anaconda Python. You should be able to run the "Anaconda Navigator" and once launched you should see something like what is pictured below:

anaconda_navigator.png

Launch "Jupyter Lab" (you may need to install it first using the install button on the icon), and you will see this:

jupyter_lab_1.png

except that your machine probably won't have "Stata" listed under "Notebooks" or "Consoles". To add that, click on "Terminal" and you will see something like this:

jupyter_lab_2.png

Run the following commands to complete the installation:

  1. pip install stata_kernel
  2. python -m stata_kernel.install
  3. conda install nodejs -y, or if this fails, run conda install -c conda-forge nodejs -y
  4. jupyter labextension install jupyterlab-stata-highlight

Now you should see "Stata" listed as I do below:

jupyter_lab_1.png

Running Stata Commands in Jupyter Notebook

Clicking on "Stata" in the "Notebook" section will give you this:

jupyter_lab_3.png

You can then enter stata code directly in the notebook and use "Ctrl-Enter" to execute a cell. Here is code and output:

jupyter_lab_4.png

You can add "Markdown" math and other notation by toggling between code and markdown in new cells. See [???][this video for a demo].

Running Stata Commands in Emacs

Note: this section is necessary only if you wish to run stata commands inside the emacs text editor using orgmode.

Once you have setup the python environment following the steps above, do this in emacs:

  1. Install and load ob-ipython.el
  2. Ensure that you have activated the python environment where stata_kernel is available

Stata code blocks need to look like this:

#+BEGIN_SRC jupyter-stata :session stata :kernel stata
webuse auto
sum
#+END_SRC

Note the header arguments "ipython :session :kernel stata".

Running this code yields both code with syntax highlighting and output:

webuse auto
sum
# Out [68]: 

(1978 Automobile Data)


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        make |          0
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
       rep78 |         69    3.405797    .9899323          1          5
    headroom |         74    2.993243    .8459948        1.5          5
-------------+---------------------------------------------------------
       trunk |         74    13.75676    4.277404          5         23
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
        turn |         74    39.64865    4.399354         31         51
displacement |         74    197.2973    91.83722         79        425
-------------+---------------------------------------------------------
  gear_ratio |         74    3.014865    .4562871       2.19       3.89
     foreign |         74    .2972973    .4601885          0          1

Display the first 5 observations:

%head 5
# Out [72]: 
: 
:      +-----------------------------------------------------------------+
:   1. | make          | price | mpg | rep78 | headroom | trunk | weight |
:      | AMC Concord   | 4,099 |  22 |     3 |      2.5 |    11 |  2,930 |
:      |-----------------------------------------------------------------|
:      |  length   |  turn   |  displa~t   |   gear_r~o   |    foreign   |
:      |     186   |    40   |       121   |       3.58   |   Domestic   |
:      +-----------------------------------------------------------------+
: 
:      +-----------------------------------------------------------------+
:   2. | make          | price | mpg | rep78 | headroom | trunk | weight |
:      | AMC Pacer     | 4,749 |  17 |     3 |      3.0 |    11 |  3,350 |
:      |-----------------------------------------------------------------|
:      |  length   |  turn   |  displa~t   |   gear_r~o   |    foreign   |
:      |     173   |    40   |       258   |       2.53   |   Domestic   |
:      +-----------------------------------------------------------------+
: 
:      +-----------------------------------------------------------------+
:   3. | make          | price | mpg | rep78 | headroom | trunk | weight |
:      | AMC Spirit    | 3,799 |  22 |     . |      3.0 |    12 |  2,640 |
:      |-----------------------------------------------------------------|
:      |  length   |  turn   |  displa~t   |   gear_r~o   |    foreign   |
:      |     168   |    35   |       121   |       3.08   |   Domestic   |
:      +-----------------------------------------------------------------+
: 
:      +-----------------------------------------------------------------+
:   4. | make          | price | mpg | rep78 | headroom | trunk | weight |
:      | Buick Century | 4,816 |  20 |     3 |      4.5 |    16 |  3,250 |
:      |-----------------------------------------------------------------|
:      |  length   |  turn   |  displa~t   |   gear_r~o   |    foreign   |
:      |     196   |    40   |       196   |       2.93   |   Domestic   |
:      +-----------------------------------------------------------------+
: 
:      +-----------------------------------------------------------------+
:   5. | make          | price | mpg | rep78 | headroom | trunk | weight |
:      | Buick Electra | 7,827 |  15 |     4 |      4.0 |    20 |  4,080 |
:      |-----------------------------------------------------------------|
:      |  length   |  turn   |  displa~t   |   gear_r~o   |    foreign   |
:      |     222   |    43   |       350   |       2.41   |   Domestic   |
:      +-----------------------------------------------------------------+
: 
: 
bstrap: regress price mpg headroom

Displaying and Exporting Graphics

One notable "gotcha" that continues to be a bit bothersome is that state_kernel uses the console version (on linux) of stata which is fully functional with one exception: stata cannot output png files when displaying or exporting graphics. stata_kernel sidesteps this by producing svg graphics along with pdf graphics files for each figure displayed in the notebook. This causes some difficulties that vary depending on what we are looking for (showing figures inline in emacs, exporting to html, or exporting to pdf). I gather that these issues aren't relevant for Windows (not sure about MAC).

To demonstrate these points, suppose we wish to display a histogram of price in the Emacs buffer. Using the working codeblocks shown above, we do this:

#+BEGIN_SRC ipython :session :kernel stata
hist price
#+END_SRC

This fails to display the image and in the Messages buffer we see this error:

cdr: Wrong type argument: listp, "JVBERi0xLjMKJbe <lots more gibberish here>"

If we look into a jupyter notebook running this code, we have

"outputs": [
    {
     "data": {
      "application/pdf": "JVBERi0xLjMKJbe <lots more gibberish here indentical to above>",
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!-- This is a Stata 15.1 generated SVG file (http://www.stata.com) -->\n",
       "\n",
       "<svg version=\"1.1\" width=\"600px\" height=\"436px\" viewBox=\"0 0 3960 2880\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "\t<desc>Stata Graph - Graph</desc>\n" <snip>

Two things to note: 1) image/png is not a returned output, and the mimetype application/pdf is causing issues for ob-stata. However, ob-stata can handle image/svg+xml images. If we alter the header values as follows images will be displayed in the Emacs buffer:

#+BEGIN_SRC ipython :session :kernel stata :results output drawer :display image/svg+xml
hist price
#+END_SRC

So we have directed ob-ipython to use the svg file rather than having to parse through the outputs from stata_kernel which are not completely understood, and we have suitable results. The figure will usually export fine if exporting to html, but will completely fail if exporting to pdf.

More robust viewing and exporting

The strategy here is to continue to use the results from codeblock execution to view figures inside Emacs, but also save them to disk and then reference them manually in orgmode for more robust exporting.

#+BEGIN_SRC ipython :session :kernel stata :exports code :results output drawer :display image/svg+xml
hist price
graph export "/tmp/hist.svg"
#+END_SRC

Then we can manually add a link to this file in our orgmode document via [[/tmp/hist.svg]], to include the histogram in a way that should be robust to whatever document type we wish to export to. It is worth noting that in the Emacs buffer, you will likely see the image twice (one in the results object that we aren't exporting, and one for the manual link you've created). You can turn off the second of these by toggling org-toggle-inline-images. Sorry, your browser does not support SVG.

This method has the added benefit of better/customized placement for figures.

Conclusion

This post shows how to use stata_kernel with Emacs. The method outline here superior to one that uses an updated version of ob-stata.el and Emacs Speaks Statistics (ESS) that I wrote about over a year ago as Stata support there has been deprecated for current releases and my modified script no longer works (ie. > Summer 2020). Even the somewhat inconvenient way of dealing with graphical output is no worse than what was required before.