More on Literate Programming with Stata in Emacs Org-Mode

In a recent post, I talked quite a bit about literate programming for econometrics. Given the tools I use and my requirements, I have settled on Emacs Org-Mode for the reasons I expounded on in the earlier post. During the past 6 months I have been using Stata with Org-Mode for literate programming quite a bit. Here I will share some tricks and tips for making it work.

Recognize the weaknesses of Stata for working around them in Org-Mode

Stata by echoes everything into the output window (and there is no way to stop it) which will nearly always clutter your results

clear
webuse auto 
gen x = 1
reg price mpg

Notice, that all commands (even if the command produces not output) are echoed in the output buffer as well as any results (like the regression results). To appreciate why this may be a limitation, consider this python codeblock

%matplotlib inline
import pandas as pd
import numpy as np
from tabulate import tabulate
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(100, 2), columns=['a', 'b'])
print(tabulate(df.head(10), tablefmt="orgtbl", showindex=False, headers = df.columns))

Notice, that in python only the results of the command are reported in the results. Since in Org-Mode we can choose whether to export our source code blocks, there are no strong benefits to echoing commands and it might even be confusing for a reader.

Furthermore, we can't silence commands in Stata using `cmdlog`, which would ordinarily suppress Stata commands in the output:

cmdlog 
sum

Long commands and comments are truncated in output in Org-Mode

* This is a long comment I have added to my stata file and it is probably not going to look good in the output since Org-Mode will only take the last bit of the command. 
twoway (scatter mpg weight if foreign, msymbol(O)) (scatter mpg weight if !foreign, msymbol(Oh)) if mpg>20
sum mpg weight

Graphics can only be included if you run code, save the result and then manually include it in your org file

graph export "/tmp/scatter.eps", replace

To manually include your saved graph, use this code:

file:/tmp/scatter.eps

Again, a comparison to python using Scimax may make this limitation more evident:

plt.hist(df.a,bins = 15)

Note, the histogram is produced and exported automatically.

Stata and Org-Mode

Knowing about the issues outline above, we can work-around them with a few tricks:

Use :exports results to only export results and split code blocks

I use this technique for cases where I want Stata code in my exported document. This is usually when I am sharing working documents with colleagues or tutorials to students where the code is an integral part of the document I am writing (rather than important for writing the results).

The codeblock

#+BEGIN_SRC stata :session
gen y = 1
sum price mpg 
#+END_SRC

produces

gen y = 1
sum price mpg

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       price |        74    6165.257    2949.496       3291      15906
         mpg |        74     21.2973    5.785503         12         41

Split the SRC block into two parts:

  1. Auxiliary commands for which we don't want to see output:
    #+BEGIN_SRC stata :session :exports code
    gen y = 1 
    #+END_SRC
    

    generates the variable y and produces this output:

    gen y = 1
    
  2. A single-line Stata command generating results we want to display in our document. This code
    #+BEGIN_SRC stata :session :exports results
    sum price mpg
    #+END_SRC
    

    produces

       sum price mpg
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
           price |        74    6165.257    2949.496       3291      15906
             mpg |        74     21.2973    5.785503         12         41
    

Note that I selectively choose :exports results or :exports code to include either Stata output or Stata code. This isn't totally perfect as there is no code highliting in Stata results blocks.

Don't output any Stata Results, instead include generated \(LateX\) or HTML code

When writing manuscripts, I want a literate programming document that is reproducible so having it as an executable document is important, but the manuscript itself doesn't need to show code. Use this workflow:

  1. Run Stata code but hide code and output
  2. In the Stata code generate \(Latex\) or HTML results
  3. Include those in the document when exporting

For example, these codeblock

#+BEGIN_SRC stata :session :exports none :results none 
eststo clear
eststo: qui reg price mpg
eststo: qui reg price mpg foreign 
esttab using "/tmp/tables.html", replace
#+END_SRC

#+BEGIN_SRC shell :exports results :results value html 
,#!/bin/bash
,# Bug in Orgmode Requires this for outputting html
,# https://emacs.stackexchange.com/questions/10085/org-export-how-to-include-a-pregenerated-html-file-when-exporting-org-to-html
/bin/cat /tmp/tables.html
#+END_SRC

produces this table: