% Stata Markdown and Reproducible Research
% Rob Hicks
**Note: course syllabus is here**:
[https://rlhick.people.wm.edu/stories/syllabus_econ407.html](https://rlhick.people.wm.edu/stories/syllabus_econ407.html)
## Introduction
From [Wikipedia](https://en.wikipedia.org/wiki/Reproducibility#
Reproducible_research), reproducible research is defined as:
>The term reproducible research refers to the idea that the ultimate product of
academic research is the paper **along with** the full computational
environment used to produce the results in the paper such as the code, data,
etc. that can be used to reproduce the results and create new work based on the
research.
The reproducible research movement (especially for the statistical sciences)
takes this a step further by advocating for dynamic documents. The idea is
that a researcher should provide a file (the dynamic document) that can execute
the statistical analysis, generate figures, and contains accompanying text
narrative. This file can be executed to produce the **academic paper**. The
researcher shares this file with other researchers rather than the only the
paper. It is my view that within 20 years nearly every scientific journal in
applied statistics will require this approach.
This document shows how to use [MarkStat](http://data.princeton.edu/stata/markdown/)
and markdown syntax for reproducible research and dynamic
documents in stata. The idea behind MarkStat is that you share your research
by sharing your do file. This do file performs the full suite of statistical
analysis and can produce the pdf (with extra configuration), MS Word, or html documents
describing your analysis. You will use this workflow for producing pdf or word
documents for class assignments.
For every problem set, you will turn in
* The stata `stmd` (similar to a do file) file containing all commands and written text that produces
your problem set responses.
* A hardcopy of the pdf or word version produced after running your do file
[the hardcopy]
The only exception to this rule is for questions involving proofs or other
equation heavy assigments where handwritten responses can be attached to the
hardcopy problem set response.
## Installation Instructions
In `Stata`, issue these commands:
1. `ssc install markstat`
2. `ssc install whereis`
3. Install pandoc from `http://pandoc.org/installing`
4. Tell markstat where to find pandoc. Probably the command you need to run in stata is:
* Windows: `whereis pandoc "C:\Users\username\AppData\Local\Pandoc\pandoc.exe"`
* Mac: `whereis pandoc /usr/local/bin/pandoc`
* Linux/Unix: `whereis pandoc /usr/bin/pandoc`
Windows users should substitute your username for "username" in the `whereis` commands above
## Some Features of MarkStat
Markdoc allows for most features of
[Markdown](https://daringfireball.net/projects/markdown/syntax),
which is a liteweight and readable **text-based** language that allows
files to be easily converted to nice looking pdf, html, or even word
documents. Some features you will likely want to use:
* Equations and Math Notation using latex math
* Headers
* Emphasizing text (bold and italics)
* Numeric and bulletted lists
* Turning stata output on and off
* Pagebreaks can be inserted using `\newpage` on a separate line
\newpage
## A simple example analysis using Markdoc
Below we'll be modeling the following regression equation for cars back in the day:
$$
price_i = \beta_0 + \beta_1 mpg_i + \beta_2 foreign_i + \epsilon_i
$$
### Load Data and Summarize
cd ~/Dropbox/Current/Teaching/courses/ECON407/do_files/reproducible_research/markstat
webuse auto
reg price mpg
sum
hist price
graph export price.png, replace
![Histogram of Price](price.png){width=60%}
### Regression Model
Here are the regression results:
reg price mpg foreign
#### Discussion
Looks like back in the day, foreign cars sell for more!
## Markdoc and Mata
Mata is the matrix algebra environment in stata. We can embed markdown
(including equations) inside mata too:
Define $\mathbf{A}_{2 \times 2}$ as
$$
\mathbf{A}=\begin{bmatrix} 1 & 2 \\
3 & 4 \end{bmatrix}
$$
mata
A = (1,2\3,4)
A
end
## Compiling your document
You will be creating a file with an `stmd` extension that contains your code and
writeup. You can create and edit this file in any text editor including the stata do file editor.
Suppose your problem set document called `script.stmd` contained this text:
```
% Problem Set 1
% Johnny Appleseed
% Sept 1, 2018
Let us read the fuel efficiency data that ships with Stata
sysuse auto, clear
To study how fuel efficiency depends on weight it is useful to transform
the dependent variable from “miles per gallon” to “gallons per 100
miles”
gen gphm = 100/mpg
We then obtain a fairly linear relationship
twoway scatter gphm weight || lfit gphm weight ///
ytitle(Gallons per 100 Miles) legend(off)
graph export auto.png, width(500) replace
![Fuel Efficiency by Weight](auto.png)
The regression equation estimated by OLS is
$$
gphm = \beta_0 + \beta_1 weight + \epsilon
$$
Estimating in stata, yields:
regress gphm weight
Thus, a car that weighs 1,000 pounds more than another requires on
average an extra 1.4 gallons to travel 100 miles.
```
You can then generate a word, pdf, or html document containing all code and results with these
commands in stata (assuming your current working directory contains `script.stmd`):
* `markstat using script, mathjax`: produces an html file
* `markstat using script, mathjax docx`: produces a word document
* `markstat using script, mathjax pdf`: produces a pdf document (requires working latex environment)
Problem set responses produced by `markstat` in small fonts will be
immediately returned to the student and considered not turned in until
font sizes are fixed. Shoot for 11pt fonts.
# Document Details
This document is written entirely in `stata` using `markstat`. To see
the source code, [see http://rlhick.people.wm.edu/bin/reproducible_research.stmd (clickable)](http://rlhick.people.wm.edu/econ407/bin/reproducible_research.stmd).