Stata and Literate Programming in Emacs OrgMode
Important Note: The following post is outdated and is no longer the recommended approach for running stata in orgmode. Please see this post on using emacs
with jupyter
and the stata_kernel
for a method that works and that is more robust moving forward.
Stata is a statistical package that lots of people use, and Emacs Orgmode is a great platform for organizing, publishing, and blogging your research. In one of my older posts, I outlined the relative benefits of Orgmode compared to other packages for literate programming. At that time, I argued it was the best way to write literate programming documents with Stata (if you are willing to pay the fixed costs of learning Emacs). I still believe that, and I use it a lot for writing course notes, emailing students with code and results, and even for drafting manuscripts for publishing.
Despite how good Emacs Orgmode is for research involving Stata, Stata is still something of a second class citizen compared to packages like R
or Python
. While it is functional, it can be a little rough around the edges, and since not many people use Stata with Emacs finding answers can be tough. This post does 3 things:
 Demonstrates some issues using stata in orgmode
 Introduces an updated version of
obstata.el
. With only minor modifications, this version avoids some issues with the current version ofobstata
found here. My version ofobstata.el
can be downloaded from gitlab.  Provides full setup instructions that enables codehighliting in html and latex export.
Issues with Stata and OrgMode
Occasional garbled output
For most of the usual commands (regress
, sum
, probit
, etc.) output is fine. But there are some commands for which output can be truncated. In this codeblock, we will bootstrap the probit command and then ask for model fit diagnostics. Both the commands bstrap
and estat classification
fail to render properly using the current version of obstata.el
.
webuse auto
bstrap: probit foreign mpg price headroom
estat classification
webuse auto (1978 Automobile Data) 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 19.82 Prob > chi2 = 0.0002 Log likelihood = 35.296645 Pseudo R2 = 0.2162   Observed Bootstrap Normalbased foreign  Coef. Std. Err. z P>z [95% Conf. Interval] + mpg  .1218917 .03376 3.61 0.000 .0557233 .1880601 price  .0001563 .0000879 1.78 0.075 .0000159 .0003286 headroom  .3379361 .1974509 1.71 0.087 .7249327 .0490606 _cons  3.232319 1.32055 2.45 0.014 5.820551 .644088   8 6  14   14 46  60 ++ Total  22 52  74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0  Sensitivity Pr( + D) 36.36% Specificity Pr( ~D) 88.46% Positive predictive value Pr( D +) 57.14% Negative predictive value Pr(~D ) 76.67%  False + rate for true ~D Pr( +~D) 11.54% False  rate for true D Pr(  D) 63.64% False + rate for classified + Pr(~D +) 42.86% False  rate for classified  Pr( D ) 23.33%  Correctly classified 72.97% 
My modification of obstata.el
renders this output correctly:
webuse auto
bstrap: probit foreign mpg price headroom
estat classification
(1978 Automobile Data) (running probit on estimation sample) Bootstrap replications (50) + 1 + 2 + 3 + 4 + 5 .................................................. 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 14.01 Prob > chi2 = 0.0029 Log likelihood = 35.296645 Pseudo R2 = 0.2162   Observed Bootstrap Normalbased foreign  Coef. Std. Err. z P>z [95% Conf. Interval] + mpg  .1218917 .0555982 2.19 0.028 .0129212 .2308622 price  .0001563 .0000843 1.85 0.064 8.94e06 .0003216 headroom  .3379361 .2451438 1.38 0.168 .8184091 .142537 _cons  3.232319 2.100886 1.54 0.124 7.34998 .8853416  Probit model for foreign  True  Classified  D ~D  Total ++ +  8 6  14   14 46  60 ++ Total  22 52  74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0  Sensitivity Pr( + D) 36.36% Specificity Pr( ~D) 88.46% Positive predictive value Pr( D +) 57.14% Negative predictive value Pr(~D ) 76.67%  False + rate for true ~D Pr( +~D) 11.54% False  rate for true D Pr(  D) 63.64% False + rate for classified + Pr(~D +) 42.86% False  rate for classified  Pr( D ) 23.33%  Correctly classified 72.97% 
Output Contains Commands and Results
For me one of the major annoyances with using Stata in Orgmode is that Stata output includes commands and results. If one wants to produce output that has code highlighting of the Stata commands, you will necessarily have duplicate commands in your html
or pdf
document. This example illustrates the problem:
webuse auto
reg price mpg
webuse auto (1978 Automobile Data) reg price mpg Source  SS df MS Number of obs = 74 + F(1, 72) = 20.26 Model  139449474 1 139449474 Prob > F = 0.0000 Residual  495615923 72 6883554.48 Rsquared = 0.2196 + Adj Rsquared = 0.2087 Total  635065396 73 8699525.97 Root MSE = 2623.7  price  Coef. Std. Err. t P>t [95% Conf. Interval] + mpg  238.8943 53.07669 4.50 0.000 344.7008 133.0879 _cons  11253.06 1170.813 9.61 0.000 8919.088 13587.03 
Notice that in the exported html (what you are viewing), you see duplicate versions of the commands that produced the output. The first has font highlighting (and is what we really want) while the second is interspersed in the plain text results. I should note that no other language I have used in Orgmode behaves like this. For example, in R
or Python
, the commands are left in the source code block (and are highlited) while results only contain results.
To make Stata behave more like R
or Python
, I have modified obstata.el
to purge the results of any and all commands (for :results output
and stata invoked by :session
). For the same Stata code, this modification produces:
webuse auto
reg price mpg
(1978 Automobile Data) Source  SS df MS Number of obs = 74 + F(1, 72) = 20.26 Model  139449474 1 139449474 Prob > F = 0.0000 Residual  495615923 72 6883554.48 Rsquared = 0.2196 + Adj Rsquared = 0.2087 Total  635065396 73 8699525.97 Root MSE = 2623.7  price  Coef. Std. Err. t P>t [95% Conf. Interval] + mpg  238.8943 53.07669 4.50 0.000 344.7008 133.0879 _cons  11253.06 1170.813 9.61 0.000 8919.088 13587.03 
Note, even when using my modified code there are instances when using line continuation or your code is contained on a line longer than 77 characters where some form of your command might still be included in output. This occurs infrequently enough that I haven't bothered to try to patch obstata.el
further.
No line continuation support
Stata allows for long commands to be split across lines using ///
. This isn't currently supported in obstata.el
. My modifications support line continuation:
reg price mpg ///
weight
Source  SS df MS Number of obs = 74 + F(2, 71) = 14.74 Model  186321280 2 93160639.9 Prob > F = 0.0000 Residual  448744116 71 6320339.67 Rsquared = 0.2934 + Adj Rsquared = 0.2735 Total  635065396 73 8699525.97 Root MSE = 2514  price  Coef. Std. Err. t P>t [95% Conf. Interval] + mpg  49.51222 86.15604 0.57 0.567 221.3025 122.278 weight  1.746559 .6413538 2.72 0.008 .467736 3.025382 _cons  1946.069 3597.05 0.54 0.590 5226.245 9118.382 
Smaller annoyances
Stata has some more minor limitations in Orgmode that I have learned to live with and haven't bothered to try and fix, since my research isn't too Stata centric. I'll list them here
 Font highliting in exported documents does work, but it is somewhat hit and miss for html output while pretty good for latex output. I am not altogether clear why there is a difference, but I suspect that for html output it is using a stata syntax dictionary from the Emacs Speaks Statistics package (and it isn't possible to modify highliting settings for Stata since "Font Lock" is disabled). For latex/pdf output, pygmentize is used and it works very well. For this blog, html output is produced using
pygmentize
'd output, so what you are seeing isn't representative of what you will get from a straight html export in Orgmode.  Highliting in Emacs Orgmode while editing the document requires you to place a space before the first command in
src
blocks.  There can be limits on other types of output you can get out of your codeblocks, such as values, tables, latex code, etc. For this reason, I only use
:results output
.  Graphics can only be included if you run code, save the result and then manually include it in your org file.
Setup
To get stata execution blocks working in Orgmode, you need to
 Ensure the command
stata
is executable and is in your path (note, not xstata). Emacs will execute commands usingstata
. If you want to use another version of Stata, you will need to softlink it to astata
command in the path. For example, I would rather usestatamp
, so I softlinked it to/usr/local/sbin/stata
so emacs will use the MultiProcessing version of Stata.  In Emacs, install ESS: Emacs Speaks Statistics
 If you want to try my version, download
obstata.el
from this gitlab repo and save to~/emacs.d/lisp
. If you prefer the original version ofobstata.el
, you can find it at this mirror of the emacs repository. 
For your version of Emacs and OrgMode, you might need to change this line in
obstata.el
(see this thread):(let ((vars (mapcar #'cdr (orgbabelgetvars params))))
to/from (depending on what you downloaded above)
(let ((vars (mapcar #'cdr (orgbabelgetheader params :var))))

Modify one of your Emacs initialization files to include
;; load Emacs Speaks Statistics  for Stata support (require 'esssite) ;; Tell emacs location of the directory containing ;; personal elisp (and obstata.el) (addtolist 'loadpath "~/.emacs.d/lisp/") ;; load obstata (require "obstata")

Following the commands above, include Stata as a babel language in your Emacs initialization files. Mine looks like this:
(orgbabeldoloadlanguages 'orgbabelloadlanguages '((python . t) (ipython . t) (R . t) (sh . t) (matlab . t) (stata . t) ))

Include Stata as a language to be fontified for latex exports by including the following in your Emacs initialization files:
(addtolist 'orglatexmintedlangs '(stata "stata"))
Make sure to include
\usepackage{minted}
in the header of your latex export template and that your version ofpygmentize
is 2.2 or higher.