# Stata and Literate Programming in Emacs Org-Mode

Stata is a statistical package that lots of people use, and Emacs Org-mode is a great platform for organizing, publishing, and blogging your research. In one of my older posts, I outlined the relative benefits of Org-mode compared to other packages for literate programming. At that time, I argued it was the best way to write literate programming documents with Stata (if you are willing to pay the fixed costs of learning Emacs). I still believe that, and I use it a lot for writing course notes, emailing students with code and results, and even for drafting manuscripts for publishing.

Despite how good Emacs Org-mode is for research involving Stata, Stata is still something of a second class citizen compared to packages like `R`

or `Python`

. While it is functional, it can be a little rough around the edges, and since not many people use Stata with Emacs finding answers can be tough. This post attempts to provide a one stop resource for getting Stata working in Org-mode. I also introduce an improvement in how Stata behaves with src codeblocks in Emacs Org-Mode documents.

## Issues with Stata and Org-Mode

### Output Contains Commands and Results

For me one of the major annoyances with using Stata in Org-mode is that Stata output includes commands and results. If one wants to produce output that has code highlighting of the Stata commands, you will necessarily have duplicate commands in your `html`

or `pdf`

document. This example illustrates the problem:

```
webuse auto
reg price mpg
```

webuse auto (1978 Automobile Data) reg price mpg Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------

Notice that in the exported html (what you are viewing), you see duplicate versions of the commands that produced the output. The first has font highlighting (and is what we really want) while the second is interspersed in the plain text results. I should note that no other language I have used in Org-mode behaves like this. For example, in `R`

or `Python`

, the commands are left in the source code block (and are highlited) while results only contain results.

To make Stata behave more like `R`

or `Python`

, I have modified `ob-stata.el`

to purge the results of any and all commands (for `:results output`

and stata invoked by `:session`

). For the same Stata code, this modification produces:

```
webuse auto
reg price mpg
```

(1978 Automobile Data) Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------

### Occasional garbled output

For most of the usual commands (`regress`

, `sum`

, `probit`

, etc.) output is fine. But there are some commands for which output can be truncated. In this codeblock, we will bootstrap the probit command and then ask for model fit diagnostics. Both the commands `bstrap`

and `estat classification`

fail to render properly using the current version of `ob-stata.el`

.

```
webuse auto
bstrap: probit foreign mpg price headroom
estat classification
```

webuse auto (1978 Automobile Data) 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 19.82 Prob > chi2 = 0.0002 Log likelihood = -35.296645 Pseudo R2 = 0.2162 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .1218917 .03376 3.61 0.000 .0557233 .1880601 price | .0001563 .0000879 1.78 0.075 -.0000159 .0003286 headroom | -.3379361 .1974509 -1.71 0.087 -.7249327 .0490606 _cons | -3.232319 1.32055 -2.45 0.014 -5.820551 -.644088 ------------------------------------------------------------------------------ | 8 6 | 14 - | 14 46 | 60 -----------+--------------------------+----------- Total | 22 52 | 74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0 -------------------------------------------------- Sensitivity Pr( +| D) 36.36% Specificity Pr( -|~D) 88.46% Positive predictive value Pr( D| +) 57.14% Negative predictive value Pr(~D| -) 76.67% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 11.54% False - rate for true D Pr( -| D) 63.64% False + rate for classified + Pr(~D| +) 42.86% False - rate for classified - Pr( D| -) 23.33% -------------------------------------------------- Correctly classified 72.97% --------------------------------------------------

My modification of `ob-stata.el`

renders this output correctly:

```
webuse auto
bstrap: probit foreign mpg price headroom
estat classification
```

(1978 Automobile Data) (running probit on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 14.01 Prob > chi2 = 0.0029 Log likelihood = -35.296645 Pseudo R2 = 0.2162 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .1218917 .0555982 2.19 0.028 .0129212 .2308622 price | .0001563 .0000843 1.85 0.064 -8.94e-06 .0003216 headroom | -.3379361 .2451438 -1.38 0.168 -.8184091 .142537 _cons | -3.232319 2.100886 -1.54 0.124 -7.34998 .8853416 ------------------------------------------------------------------------------ Probit model for foreign -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 8 6 | 14 - | 14 46 | 60 -----------+--------------------------+----------- Total | 22 52 | 74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0 -------------------------------------------------- Sensitivity Pr( +| D) 36.36% Specificity Pr( -|~D) 88.46% Positive predictive value Pr( D| +) 57.14% Negative predictive value Pr(~D| -) 76.67% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 11.54% False - rate for true D Pr( -| D) 63.64% False + rate for classified + Pr(~D| +) 42.86% False - rate for classified - Pr( D| -) 23.33% -------------------------------------------------- Correctly classified 72.97% --------------------------------------------------

### Smaller annoyances

Stata has some more minor limitations in Org-mode that I have learned to live with and haven't bothered to try and fix, since my research isn't too Stata centric. I'll list them here

- Font highliting in exported documents does work, but it is somewhat hit and miss for html output while pretty good for latex output. I am not altogether clear why there is a difference, but I suspect that for html output it is using a stata syntax dictionary from the Emacs Speaks Statistics package (and it isn't possible to modify highliting settings for Stata since "Font Lock" is disabled). For latex/pdf output, pygmentize is used and it works very well. For this blog, html output is produced using
`pygmentize`

'd output, so what you are seeing isn't representative of what you will get from a straight html export in Org-mode. - Highliting in Emacs Org-mode while editing the document requires you to place a space before the first command in
`src`

blocks. - There can be limits on other types of output you can get out of your codeblocks, such as values, tables, latex code, etc. For this reason, I only use
`:results output`

. - Graphics can only be included if you run code, save the result and then manually include it in your org file.

## Setup

To get stata execution blocks working in Org-mode, you need to

- Ensure the command
`stata`

is executable and is in your path (note,**not xstata**). Emacs will execute commands using`stata`

. If you want to use another version of Stata, you will need to soft-link it to a`stata`

command in the path. For example, I would rather use`stata-mp`

, so I soft-linked it to`/usr/local/sbin/stata`

so emacs will use the Multi-Processing version of Stata. - In Emacs, install ESS: Emacs Speaks Statistics
- Download
`ob-stata.el`

from here, and save to`~/emacs.d/lisp`

. If you prefer my version that eliminates the two problems described above, download the ob-stata.el file here instead. - For your version of Emacs and Org-Mode, you might need to change this line in
`ob-stata.el`

(see this thread):(let ((vars (mapcar #'cdr (org-babel--get-vars params))))

to/from (depending on what you downloaded above)

(let ((vars (mapcar #'cdr (org-babel-get-header params :var))))

- Modify one of your Emacs initialization files to include
;; load Emacs Speaks Statistics - for Stata support (require 'ess-site) ;; Tell emacs location of the directory containing ;; personal elisp (and ob-stata.el) (add-to-list 'load-path "~/.emacs.d/lisp/") ;; load ob-stata (require "ob-stata")

- Include Stata as a babel language in your Emacs initialization files. Mine looks like this:
(org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ipython . t) (R . t) (sh . t) (matlab . t) (stata . t) ))

- Include Stata as a language to be fontified for latex exports by including the following in your Emacs initialization files:
(add-to-list 'org-latex-minted-langs '(stata "stata"))

Make sure to include

`\usepackage{minted}`

in the header of your latex export template and that your version of`pygmentize`

is 2.2 or higher.