#+BEGIN_COMMENT .. title: Stata and Literate Programming in Emacs Org-Mode .. slug: stata-and-literate-programming-in-emacs-org-mode .. date: 2018-02-28 11:34:32 UTC+01:00 .. tags: stata, emacs, orgmode, reproducible research .. category: .. link: .. description: Describes some recent improvements in orgmode .. type: text #+END_COMMENT **Important Note:** The following post is outdated and is no longer the recommended approach for running stata in orgmode. Please [[./stata_kernel_emacs.html][see this post on using =emacs= with =jupyter= and the =stata_kernel=]] for a method that works and that is more robust moving forward. Stata is a statistical package that lots of people use, and Emacs Org-mode is a great platform for organizing, publishing, and blogging your research. In one of [[./reproducible-research.html][my older posts]], I outlined the relative benefits of Org-mode compared to other packages for literate programming. At that time, I argued it was the best way to write literate programming documents with Stata (if you are willing to pay the fixed costs of learning Emacs). I still believe that, and I use it a lot for writing course notes, emailing students with code and results, and even for drafting manuscripts for publishing. Despite how good Emacs Org-mode is for research involving Stata, Stata is still something of a second class citizen compared to packages like =R= or =Python=. While it is functional, it can be a little rough around the edges, and since not many people use Stata with Emacs finding answers can be tough. This post does 3 things: 1. Demonstrates some issues using stata in org-mode 2. Introduces an updated version of =ob-stata.el=. With only minor modifications, this version avoids some issues with the current version of =ob-stata= found [[https://github.com/aspiers/orgmode/blob/master/contrib/lisp/ob-stata.el][here]]. My version of =ob-stata.el= [[https://gitlab.com/robhicks/ob-stata.el][can be downloaded from gitlab]]. 3. Provides full setup instructions that enables code-highliting in html and latex export. #+HTML: * Issues with Stata and Org-Mode ** Occasional garbled output For most of the usual commands (=regress=, =sum=, =probit=, etc.) output is fine. But there are some commands for which output can be truncated. In this codeblock, we will bootstrap the probit command and then ask for model fit diagnostics. Both the commands =bstrap= and =estat classification= fail to render properly using the current version of =ob-stata.el=. #+BEGIN_SRC stata :session :results output :eval never-export :exports both webuse auto bstrap: probit foreign mpg price headroom estat classification #+END_SRC #+RESULTS: #+begin_example webuse auto (1978 Automobile Data) 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 19.82 Prob > chi2 = 0.0002 Log likelihood = -35.296645 Pseudo R2 = 0.2162 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .1218917 .03376 3.61 0.000 .0557233 .1880601 price | .0001563 .0000879 1.78 0.075 -.0000159 .0003286 headroom | -.3379361 .1974509 -1.71 0.087 -.7249327 .0490606 _cons | -3.232319 1.32055 -2.45 0.014 -5.820551 -.644088 ------------------------------------------------------------------------------ | 8 6 | 14 - | 14 46 | 60 -----------+--------------------------+----------- Total | 22 52 | 74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0 -------------------------------------------------- Sensitivity Pr( +| D) 36.36% Specificity Pr( -|~D) 88.46% Positive predictive value Pr( D| +) 57.14% Negative predictive value Pr(~D| -) 76.67% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 11.54% False - rate for true D Pr( -| D) 63.64% False + rate for classified + Pr(~D| +) 42.86% False - rate for classified - Pr( D| -) 23.33% -------------------------------------------------- Correctly classified 72.97% -------------------------------------------------- #+end_example My modification of =ob-stata.el= renders this output correctly: #+BEGIN_SRC stata :session :eval never-export :results output :exports both webuse auto bstrap: probit foreign mpg price headroom estat classification #+END_SRC #+RESULTS: #+begin_example (1978 Automobile Data) (running probit on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Probit regression Number of obs = 74 Replications = 50 Wald chi2(3) = 14.01 Prob > chi2 = 0.0029 Log likelihood = -35.296645 Pseudo R2 = 0.2162 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .1218917 .0555982 2.19 0.028 .0129212 .2308622 price | .0001563 .0000843 1.85 0.064 -8.94e-06 .0003216 headroom | -.3379361 .2451438 -1.38 0.168 -.8184091 .142537 _cons | -3.232319 2.100886 -1.54 0.124 -7.34998 .8853416 ------------------------------------------------------------------------------ Probit model for foreign -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 8 6 | 14 - | 14 46 | 60 -----------+--------------------------+----------- Total | 22 52 | 74 Classified + if predicted Pr(D) >= .5 True D defined as foreign != 0 -------------------------------------------------- Sensitivity Pr( +| D) 36.36% Specificity Pr( -|~D) 88.46% Positive predictive value Pr( D| +) 57.14% Negative predictive value Pr(~D| -) 76.67% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 11.54% False - rate for true D Pr( -| D) 63.64% False + rate for classified + Pr(~D| +) 42.86% False - rate for classified - Pr( D| -) 23.33% -------------------------------------------------- Correctly classified 72.97% -------------------------------------------------- #+end_example ** Output Contains Commands and Results For me one of the major annoyances with using Stata in Org-mode is that Stata output includes commands and results. If one wants to produce output that has code highlighting of the Stata commands, you will necessarily have duplicate commands in your =html= or =pdf= document. This example illustrates the problem: #+BEGIN_SRC stata :session :results output :exports both :eval never-export webuse auto reg price mpg #+END_SRC #+RESULTS: #+begin_example webuse auto (1978 Automobile Data) reg price mpg Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------ #+end_example Notice that in the exported html (what you are viewing), you see duplicate versions of the commands that produced the output. The first has font highlighting (and is what we really want) while the second is interspersed in the plain text results. I should note that no other language I have used in Org-mode behaves like this. For example, in =R= or =Python=, the commands are left in the source code block (and are highlited) while results only contain results. To make Stata behave more like =R= or =Python=, I have modified =ob-stata.el= to purge the results of any and all commands (for =:results output= and stata invoked by =:session=). For the same Stata code, this modification produces: #+BEGIN_SRC stata :session :results output :exports both :eval never-export webuse auto reg price mpg #+END_SRC #+RESULTS: #+begin_example (1978 Automobile Data) Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------ #+end_example Note, even when using my modified code there are instances when using line continuation or your code is contained on a line longer than 77 characters where some form of your command might still be included in output. This occurs infrequently enough that I haven't bothered to try to patch =ob-stata.el= further. ** No line continuation support Stata allows for long commands to be split across lines using =///=. This isn't currently supported in =ob-stata.el=. My modifications support line continuation: #+BEGIN_SRC stata :session :results output :exports both :eval never-export reg price mpg /// weight #+END_SRC #+RESULTS: #+begin_example Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 14.74 Model | 186321280 2 93160639.9 Prob > F = 0.0000 Residual | 448744116 71 6320339.67 R-squared = 0.2934 -------------+---------------------------------- Adj R-squared = 0.2735 Total | 635065396 73 8699525.97 Root MSE = 2514 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -49.51222 86.15604 -0.57 0.567 -221.3025 122.278 weight | 1.746559 .6413538 2.72 0.008 .467736 3.025382 _cons | 1946.069 3597.05 0.54 0.590 -5226.245 9118.382 ------------------------------------------------------------------------------ #+end_example ** Smaller annoyances Stata has some more minor limitations in Org-mode that I have learned to live with and haven't bothered to try and fix, since my research isn't too Stata centric. I'll list them here - Font highliting in exported documents does work, but it is somewhat hit and miss for html output while pretty good for latex output. I am not altogether clear why there is a difference, but I suspect that for html output it is using a stata syntax dictionary from the Emacs Speaks Statistics package (and it isn't possible to modify highliting settings for Stata since "Font Lock" is disabled). For latex/pdf output, pygmentize is used and it works very well. For this blog, html output is produced using =pygmentize='d output, so what you are seeing isn't representative of what you will get from a straight html export in Org-mode. - Highliting in Emacs Org-mode while editing the document requires you to place a space before the first command in =src= blocks. - There can be limits on other types of output you can get out of your codeblocks, such as values, tables, latex code, etc. For this reason, I only use =:results output=. - Graphics can only be included if you run code, save the result and then manually include it in your org file. * Setup To get stata execution blocks working in Org-mode, you need to 1. Ensure the command =stata= is executable and is in your path (note, *not xstata*). Emacs will execute commands using =stata=. If you want to use another version of Stata, you will need to soft-link it to a =stata= command in the path. For example, I would rather use =stata-mp=, so I soft-linked it to =/usr/local/sbin/stata= so emacs will use the Multi-Processing version of Stata. 2. In Emacs, install ESS: Emacs Speaks Statistics 3. If you want to try my version, download =ob-stata.el= from [[https://gitlab.com/robhicks/ob-stata.el][this gitlab repo]] and save to =~/emacs.d/lisp=. If you prefer the original version of =ob-stata.el=, you can find it [[https://github.com/aspiers/orgmode/blob/master/contrib/lisp/ob-stata.el][at this mirror of the emacs repository]]. 4. For your version of Emacs and Org-Mode, you might need to change this line in =ob-stata.el= (see [[http://emacs.stackexchange.com/questions/29885/error-when-stata-t-added-to-org-babel-do-load-languages-in-an-attempt-to-eva][this thread]]): #+BEGIN_SRC lisp :exports code :eval never-export (let ((vars (mapcar #'cdr (org-babel--get-vars params)))) #+END_SRC to/from (depending on what you downloaded above) #+BEGIN_SRC lisp :exports code :eval never-export (let ((vars (mapcar #'cdr (org-babel-get-header params :var)))) #+END_SRC 5. Modify one of your Emacs initialization files to include #+BEGIN_SRC lisp :results none :exports code :eval never-export ;; load Emacs Speaks Statistics - for Stata support (require 'ess-site) ;; Tell emacs location of the directory containing ;; personal elisp (and ob-stata.el) (add-to-list 'load-path "~/.emacs.d/lisp/") ;; load ob-stata (require "ob-stata") #+END_SRC 6. Following the commands above, include Stata as a babel language in your Emacs initialization files. Mine looks like this: #+BEGIN_SRC lisp :results none :exports code :eval never-export (org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ipython . t) (R . t) (sh . t) (matlab . t) (stata . t) )) #+END_SRC 7. Include Stata as a language to be fontified for latex exports by including the following in your Emacs initialization files: #+BEGIN_SRC emacs-lisp :results none :exports code :eval never-export (add-to-list 'org-latex-minted-langs '(stata "stata")) #+END_SRC Make sure to include =\usepackage{minted}= in the header of your latex export template and that your version of =pygmentize= is 2.2 or higher.