Stata and Literate Programming in Emacs Org-Mode
Important Note: The following post is outdated and is no longer the recommended approach for running stata in orgmode. Please see this post on using emacs
with jupyter
and the stata_kernel
for a method that works and that is more robust moving forward.
Stata is a statistical package that lots of people use, and Emacs Org-mode is a great platform for organizing, publishing, and blogging your research. In one of my older posts, I outlined the relative benefits of Org-mode compared to other packages for literate programming. At that time, I argued it was the best way to write literate programming documents with Stata (if you are willing to pay the fixed costs of learning Emacs). I still believe that, and I use it a lot for writing course notes, emailing students with code and results, and even for drafting manuscripts for publishing.
Despite how good Emacs Org-mode is for research involving Stata, Stata is still something of a second class citizen compared to packages like R
or Python
. While it is functional, it can be a little rough around the edges, and since not many people use Stata with Emacs finding answers can be tough. This post does 3 things:
- Demonstrates some issues using stata in org-mode
- Introduces an updated version of
ob-stata.el
. With only minor modifications, this version avoids some issues with the current version ofob-stata
found here. My version ofob-stata.el
can be downloaded from gitlab. - Provides full setup instructions that enables code-highliting in html and latex export.
Issues with Stata and Org-Mode
Occasional garbled output
For most of the usual commands (regress
, sum
, probit
, etc.) output is fine. But there are some commands for which output can be truncated. In this codeblock, we will bootstrap the probit command and then ask for model fit diagnostics. Both the commands bstrap
and estat classification
fail to render properly using the current version of ob-stata.el
.
webuse auto
bstrap: probit foreign mpg price headroom
estat classification
webuse auto (1978 Automobile Data) 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 19.82 Prob > chi2 = 0.0002 Log likelihood = -35.296645 Pseudo R2 = 0.2162 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .1218917 .03376 3.61 0.000 .0557233 .1880601 price | .0001563 .0000879 1.78 0.075 -.0000159 .0003286 headroom | -.3379361 .1974509 -1.71 0.087 -.7249327 .0490606 _cons | -3.232319 1.32055 -2.45 0.014 -5.820551 -.644088 ------------------------------------------------------------------------------ | 8 6 | 14 - | 14 46 | 60 -----------+--------------------------+----------- Total | 22 52 | 74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0 -------------------------------------------------- Sensitivity Pr( +| D) 36.36% Specificity Pr( -|~D) 88.46% Positive predictive value Pr( D| +) 57.14% Negative predictive value Pr(~D| -) 76.67% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 11.54% False - rate for true D Pr( -| D) 63.64% False + rate for classified + Pr(~D| +) 42.86% False - rate for classified - Pr( D| -) 23.33% -------------------------------------------------- Correctly classified 72.97% --------------------------------------------------
My modification of ob-stata.el
renders this output correctly:
webuse auto
bstrap: probit foreign mpg price headroom
estat classification
(1978 Automobile Data) (running probit on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 14.01 Prob > chi2 = 0.0029 Log likelihood = -35.296645 Pseudo R2 = 0.2162 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .1218917 .0555982 2.19 0.028 .0129212 .2308622 price | .0001563 .0000843 1.85 0.064 -8.94e-06 .0003216 headroom | -.3379361 .2451438 -1.38 0.168 -.8184091 .142537 _cons | -3.232319 2.100886 -1.54 0.124 -7.34998 .8853416 ------------------------------------------------------------------------------ Probit model for foreign -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 8 6 | 14 - | 14 46 | 60 -----------+--------------------------+----------- Total | 22 52 | 74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0 -------------------------------------------------- Sensitivity Pr( +| D) 36.36% Specificity Pr( -|~D) 88.46% Positive predictive value Pr( D| +) 57.14% Negative predictive value Pr(~D| -) 76.67% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 11.54% False - rate for true D Pr( -| D) 63.64% False + rate for classified + Pr(~D| +) 42.86% False - rate for classified - Pr( D| -) 23.33% -------------------------------------------------- Correctly classified 72.97% --------------------------------------------------
Output Contains Commands and Results
For me one of the major annoyances with using Stata in Org-mode is that Stata output includes commands and results. If one wants to produce output that has code highlighting of the Stata commands, you will necessarily have duplicate commands in your html
or pdf
document. This example illustrates the problem:
webuse auto
reg price mpg
webuse auto (1978 Automobile Data) reg price mpg Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------
Notice that in the exported html (what you are viewing), you see duplicate versions of the commands that produced the output. The first has font highlighting (and is what we really want) while the second is interspersed in the plain text results. I should note that no other language I have used in Org-mode behaves like this. For example, in R
or Python
, the commands are left in the source code block (and are highlited) while results only contain results.
To make Stata behave more like R
or Python
, I have modified ob-stata.el
to purge the results of any and all commands (for :results output
and stata invoked by :session
). For the same Stata code, this modification produces:
webuse auto
reg price mpg
(1978 Automobile Data) Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------
Note, even when using my modified code there are instances when using line continuation or your code is contained on a line longer than 77 characters where some form of your command might still be included in output. This occurs infrequently enough that I haven't bothered to try to patch ob-stata.el
further.
No line continuation support
Stata allows for long commands to be split across lines using ///
. This isn't currently supported in ob-stata.el
. My modifications support line continuation:
reg price mpg ///
weight
Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 14.74 Model | 186321280 2 93160639.9 Prob > F = 0.0000 Residual | 448744116 71 6320339.67 R-squared = 0.2934 -------------+---------------------------------- Adj R-squared = 0.2735 Total | 635065396 73 8699525.97 Root MSE = 2514 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -49.51222 86.15604 -0.57 0.567 -221.3025 122.278 weight | 1.746559 .6413538 2.72 0.008 .467736 3.025382 _cons | 1946.069 3597.05 0.54 0.590 -5226.245 9118.382 ------------------------------------------------------------------------------
Smaller annoyances
Stata has some more minor limitations in Org-mode that I have learned to live with and haven't bothered to try and fix, since my research isn't too Stata centric. I'll list them here
- Font highliting in exported documents does work, but it is somewhat hit and miss for html output while pretty good for latex output. I am not altogether clear why there is a difference, but I suspect that for html output it is using a stata syntax dictionary from the Emacs Speaks Statistics package (and it isn't possible to modify highliting settings for Stata since "Font Lock" is disabled). For latex/pdf output, pygmentize is used and it works very well. For this blog, html output is produced using
pygmentize
'd output, so what you are seeing isn't representative of what you will get from a straight html export in Org-mode. - Highliting in Emacs Org-mode while editing the document requires you to place a space before the first command in
src
blocks. - There can be limits on other types of output you can get out of your codeblocks, such as values, tables, latex code, etc. For this reason, I only use
:results output
. - Graphics can only be included if you run code, save the result and then manually include it in your org file.
Setup
To get stata execution blocks working in Org-mode, you need to
- Ensure the command
stata
is executable and is in your path (note, not xstata). Emacs will execute commands usingstata
. If you want to use another version of Stata, you will need to soft-link it to astata
command in the path. For example, I would rather usestata-mp
, so I soft-linked it to/usr/local/sbin/stata
so emacs will use the Multi-Processing version of Stata. - In Emacs, install ESS: Emacs Speaks Statistics
- If you want to try my version, download
ob-stata.el
from this gitlab repo and save to~/emacs.d/lisp
. If you prefer the original version ofob-stata.el
, you can find it at this mirror of the emacs repository. -
For your version of Emacs and Org-Mode, you might need to change this line in
ob-stata.el
(see this thread):(let ((vars (mapcar #'cdr (org-babel--get-vars params))))
to/from (depending on what you downloaded above)
(let ((vars (mapcar #'cdr (org-babel-get-header params :var))))
-
Modify one of your Emacs initialization files to include
;; load Emacs Speaks Statistics - for Stata support (require 'ess-site) ;; Tell emacs location of the directory containing ;; personal elisp (and ob-stata.el) (add-to-list 'load-path "~/.emacs.d/lisp/") ;; load ob-stata (require "ob-stata")
-
Following the commands above, include Stata as a babel language in your Emacs initialization files. Mine looks like this:
(org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ipython . t) (R . t) (sh . t) (matlab . t) (stata . t) ))
-
Include Stata as a language to be fontified for latex exports by including the following in your Emacs initialization files:
(add-to-list 'org-latex-minted-langs '(stata "stata"))
Make sure to include
\usepackage{minted}
in the header of your latex export template and that your version ofpygmentize
is 2.2 or higher.