Using stata_kernel for reproducible research goodness
This post is hopefully the last in a series of posts outlining how to use Stata
in a proper dynamic document/reproducible research setting. As of the summer of 2020, I am only using stata_kernel
for my own work and no longer recommend using my customized ob-ipython.el
for reasons described here. This method allows users to create dynamic documents using either
- Jupyter Lab (or Notebook)
- Emacs Org-mode
This post shows the installation steps to get this working and some usability recommendations if using Org-mode.
Python Preliminaries
The stata_kernel
needs a proper python installation with Jupyter
installed. My preferred method of doing this is to use Anaconda Python
along with several python libraries. The stata_kernel
install page provides full instructions for installing Anaconda python
and all packages necessary for getting this running (note: the mechanics of doing this is outlined below).
This part goes into a bit more detail on how to actually perform the installation steps after you've installed Anaconda Python
. You should be able to run the "Anaconda Navigator" and once launched you should see something like what is pictured below:
Launch "Jupyter Lab" (you may need to install it first using the install button on the icon), and you will see this:
except that your machine probably won't have "Stata" listed under "Notebooks" or "Consoles". To add that, click on "Terminal" and you will see something like this:
Run the following commands to complete the installation:
pip install stata_kernel
python -m stata_kernel.install
-
conda install nodejs -y
, or if this fails, runconda install -c conda-forge nodejs -y
jupyter labextension install jupyterlab-stata-highlight
Now you should see "Stata" listed as I do below:
Running Stata Commands in Jupyter Notebook
Clicking on "Stata" in the "Notebook" section will give you this:
You can then enter stata code directly in the notebook and use "Ctrl-Enter" to execute a cell. Here is code and output:
You can add "Markdown" math and other notation by toggling between code and markdown in new cells. See [???][this video for a demo].
Running Stata Commands in Emacs
Note: this section is necessary only if you wish to run stata commands inside the emacs
text editor using orgmode
.
Once you have setup the python environment following the steps above, do this in emacs:
- Install and load
ob-ipython.el
- Ensure that you have activated the python environment where
stata_kernel
is available
Stata code blocks need to look like this:
#+BEGIN_SRC jupyter-stata :session stata :kernel stata
webuse auto
sum
#+END_SRC
Note the header arguments "ipython :session :kernel stata
".
Running this code yields both code with syntax highlighting and output:
webuse auto
sum
# Out [68]: (1978 Automobile Data) Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- make | 0 price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 rep78 | 69 3.405797 .9899323 1 5 headroom | 74 2.993243 .8459948 1.5 5 -------------+--------------------------------------------------------- trunk | 74 13.75676 4.277404 5 23 weight | 74 3019.459 777.1936 1760 4840 length | 74 187.9324 22.26634 142 233 turn | 74 39.64865 4.399354 31 51 displacement | 74 197.2973 91.83722 79 425 -------------+--------------------------------------------------------- gear_ratio | 74 3.014865 .4562871 2.19 3.89 foreign | 74 .2972973 .4601885 0 1
Display the first 5 observations:
%head 5
# Out [72]: : : +-----------------------------------------------------------------+ : 1. | make | price | mpg | rep78 | headroom | trunk | weight | : | AMC Concord | 4,099 | 22 | 3 | 2.5 | 11 | 2,930 | : |-----------------------------------------------------------------| : | length | turn | displa~t | gear_r~o | foreign | : | 186 | 40 | 121 | 3.58 | Domestic | : +-----------------------------------------------------------------+ : : +-----------------------------------------------------------------+ : 2. | make | price | mpg | rep78 | headroom | trunk | weight | : | AMC Pacer | 4,749 | 17 | 3 | 3.0 | 11 | 3,350 | : |-----------------------------------------------------------------| : | length | turn | displa~t | gear_r~o | foreign | : | 173 | 40 | 258 | 2.53 | Domestic | : +-----------------------------------------------------------------+ : : +-----------------------------------------------------------------+ : 3. | make | price | mpg | rep78 | headroom | trunk | weight | : | AMC Spirit | 3,799 | 22 | . | 3.0 | 12 | 2,640 | : |-----------------------------------------------------------------| : | length | turn | displa~t | gear_r~o | foreign | : | 168 | 35 | 121 | 3.08 | Domestic | : +-----------------------------------------------------------------+ : : +-----------------------------------------------------------------+ : 4. | make | price | mpg | rep78 | headroom | trunk | weight | : | Buick Century | 4,816 | 20 | 3 | 4.5 | 16 | 3,250 | : |-----------------------------------------------------------------| : | length | turn | displa~t | gear_r~o | foreign | : | 196 | 40 | 196 | 2.93 | Domestic | : +-----------------------------------------------------------------+ : : +-----------------------------------------------------------------+ : 5. | make | price | mpg | rep78 | headroom | trunk | weight | : | Buick Electra | 7,827 | 15 | 4 | 4.0 | 20 | 4,080 | : |-----------------------------------------------------------------| : | length | turn | displa~t | gear_r~o | foreign | : | 222 | 43 | 350 | 2.41 | Domestic | : +-----------------------------------------------------------------+ : :
bstrap: regress price mpg headroom
Displaying and Exporting Graphics
One notable "gotcha" that continues to be a bit bothersome is that state_kernel
uses the console version (on linux) of stata
which is fully functional with one exception: stata cannot output png files when displaying or exporting graphics. stata_kernel
sidesteps this by producing svg
graphics along with pdf
graphics files for each figure displayed in the notebook. This causes some difficulties that vary depending on what we are looking for (showing figures inline in emacs
, exporting to html
, or exporting to pdf
). I gather that these issues aren't relevant for Windows (not sure about MAC).
To demonstrate these points, suppose we wish to display a histogram of price in the Emacs buffer. Using the working codeblocks shown above, we do this:
#+BEGIN_SRC ipython :session :kernel stata
hist price
#+END_SRC
This fails to display the image and in the Messages buffer we see this error:
cdr: Wrong type argument: listp, "JVBERi0xLjMKJbe <lots more gibberish here>"
If we look into a jupyter notebook running this code, we have
"outputs": [ { "data": { "application/pdf": "JVBERi0xLjMKJbe <lots more gibberish here indentical to above>", "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!-- This is a Stata 15.1 generated SVG file (http://www.stata.com) -->\n", "\n", "<svg version=\"1.1\" width=\"600px\" height=\"436px\" viewBox=\"0 0 3960 2880\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "\t<desc>Stata Graph - Graph</desc>\n" <snip>
Two things to note: 1) image/png
is not a returned output, and the mimetype application/pdf
is causing issues for ob-stata
. However, ob-stata
can handle image/svg+xml
images. If we alter the header values as follows images will be displayed in the Emacs buffer:
#+BEGIN_SRC ipython :session :kernel stata :results output drawer :display image/svg+xml
hist price
#+END_SRC
So we have directed ob-ipython
to use the svg
file rather than having to parse through the outputs from stata_kernel
which are not completely understood, and we have suitable results. The figure will usually export fine if exporting to html
, but will completely fail if exporting to pdf
.
More robust viewing and exporting
The strategy here is to continue to use the results from codeblock execution to view figures inside Emacs, but also save them to disk and then reference them manually in orgmode for more robust exporting.
#+BEGIN_SRC ipython :session :kernel stata :exports code :results output drawer :display image/svg+xml
hist price
graph export "/tmp/hist.svg"
#+END_SRC
Then we can manually add a link to this file in our orgmode document via [[/tmp/hist.svg]]
, to include the histogram in a way that should be robust to whatever document type we wish to export to. It is worth noting that in the Emacs buffer, you will likely see the image twice (one in the results object that we aren't exporting, and one for the manual link you've created). You can turn off the second of these by toggling org-toggle-inline-images
.
This method has the added benefit of better/customized placement for figures.
Conclusion
This post shows how to use stata_kernel
with Emacs. The method outline here superior to one that uses an updated version of ob-stata.el
and Emacs Speaks Statistics (ESS
) that I wrote about over a year ago as Stata support there has been deprecated for current releases and my modified script no longer works (ie. > Summer 2020). Even the somewhat inconvenient way of dealing with graphical output is no worse than what was required before.