ECON 407: Stata Primer
This page has been moved to https://econ.pages.code.wm.edu/407/notes/docs/index.html and is no longer being maintained here.
Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we'll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. R, Python, or Matlab, etc.) you are free to do so, but I might no be able to support any technical issues you run into.
Loading data into Stata
Loading stata datasets
Stata can load comma-delimited (csv
), excel (xls
), and stata (dta
) files out of the box. It can also load data from the web:
use "https://rlhick.people.wm.edu/econ407/data/mroz"
sum
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- lfp | 753 .5683931 .4956295 0 1 whrs | 753 740.5764 871.3142 0 4950 kl6 | 753 .2377158 .523959 0 3 k618 | 753 1.353254 1.319874 0 8 wa | 753 42.53785 8.072574 30 60 -------------+--------------------------------------------------------- we | 753 12.28685 2.280246 5 17 ww | 753 2.374565 3.241829 0 25 rpwg | 753 1.849734 2.419887 0 9.98 hhrs | 753 2267.271 595.5666 175 5010 ha | 753 45.12085 8.058793 30 60 -------------+--------------------------------------------------------- he | 753 12.49137 3.020804 3 17 hw | 753 7.482179 4.230559 .4121 40.509 faminc | 753 23080.59 12190.2 1500 96000 mtr | 753 .6788632 .0834955 .4415 .9415 wmed | 753 9.250996 3.367468 0 17 -------------+--------------------------------------------------------- wfed | 753 8.808765 3.57229 0 17 un | 753 8.623506 3.114934 3 14 cit | 753 .6427623 .4795042 0 1 ax | 753 10.63081 8.06913 0 45
Loading files from disk is a slight variation the above command. Supposing that your stata data file mroz.dta was in the folder /some/place, in Linux or MacOS we would use the R command
use "/some/place/mroz.dta"
Viewing Data
If you are using the graphical version of Stata (recommended) viewing data is easy and I can show you how to do that. Viewing Listing data at the command line is achieved by the list
command, and might be useful for your problem sets for showing a few lines of data. Here we'll view the first 5 rows of data:
list in 1/5
+--------------------------------------------------------------------+ 1. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 2. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1656 | 0 | 2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 | 9 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 8.4416 | 21800 | .6615 | 7 | 7 | 11 | 1 | 5 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 3. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 4. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 456 | 0 | 3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | 10 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 3.5417 | 7300 | .7815 | 7 | 7 | 5 | 0 | 6 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 5. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1568 | 1 | 2 | 31 | 14 | 4.5918 | 3.6 | 2000 | 32 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 10 | 27300 | .6215 | 12 | 14 | 9.5 | 1 | 7 | +--------------------------------------------------------------------+
You can combine list
with logical expressions for showing rows meeting logical conditions. Let's look at the first 3 rows where the respondent has kids less than 6 years old:
list if kl6>0 in 1/3
+--------------------------------------------------------------------+ 1. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 3. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 | +--------------------------------------------------------------------+
Creating and Modifying Variables
Creating Variables
In stata
, you need to start a new variable with create
.
gen newvar = lfp * ax
Modifying Variables
To modify an existing variable, use replace
Unlike stata
we simply redefine the variable and don't need to bother with replace
:
replace newvar = newvar/10
Here is an example that creates a new dummy variable.
gen haskids = 0
replace haskids = (kl6>0) | (k618>0)
list haskids kl6 k618 in 1/10
(524 real changes made) +----------------------+ | haskids kl6 k618 | |----------------------| 1. | 1 1 0 | 2. | 1 0 2 | 3. | 1 1 3 | 4. | 1 0 3 | 5. | 1 1 2 | |----------------------| 6. | 0 0 0 | 7. | 1 0 2 | 8. | 0 0 0 | 9. | 1 0 2 | 10. | 1 0 2 | +----------------------+
Creating dummy variables
While the above example shows how to make "manually" use logical checks to create dummy variables, a better way (particularly if you need to create many categories) is tab
. Suppose a variable x takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use
tab x, gen(dum_x)
Starting Over
Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the clear
command
Log Files
A very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue
log using "/some/place/my_first.log", replace txt
will create (or if it exists, will replace) the file myfirst.log in the folder /some/place. If you don't won't to replace your existing work, use this command instead
log using "/some/place/my_first.log", append txt
and all of your results will be appended to the log file. When you are finished for a stata session, issue the command log close
to close the file and save all changes. You may then open it using the text editor of your choosing.
Do Files
Do files allow you to put all of the relevant stata commands for a project into one file, so that results can be easily replicated from one stata settion to the next. The use of do files are highly recommended for your own work, and are a required part of your assignments in the course. I will illustrate their use early in the class. Additionally, you are required to write literate "do" files as I will show you on the first day of class.
Getting help in Stata
If you need to find general help in stata, type help command
where command is some stata command. You can also do keyword searches: search keyword
. To see the same set of results in a better help viewer, type view search keyword
for example view search reg
.
Linear Algebra in Stata
Stata has a linear algebra environment that can be started using the mata command from the stata command line. Notice, when you type mata from the stata command window, the command prompt changes from a .
to :
. This is really your only way of distinguishing if you are in the mata or stata environment. At this point "normal" stata commands (e.g. summary
, reg
, or use
) will not work and will lead to error messages. To exit mata
, issue the command end
. Commands for mata may also be nested inside stata do files (command files) so long as all mata commands are between the commands mata
and end
Getting help in mata is similar to the normal Stata environment. Type help mata command
where command is some mata command. You can also do keyword searches: search mata keyword
. To see the same set of results in a better help viewer, type view search mata keyword
. For example view search mata inverse
.
Once you have Stata running, you can invoke mata
like this
mata
------------------------------------------------- mata (type end to exit) -----
Creating matrices, vectors, and scalars
There are two ways to create a matrix. Consider a two by two matrix,
A = (1,2 \ 3,4)
A
1 2 +---------+ 1 | 1 2 | 2 | 3 4 | +---------+
Or, you could create an empty matrix of the desired dimension
B=J(2,3,.)
B
1 2 3 +-------------+ 1 | . . . | 2 | . . . | +-------------+
where B is of dimension rows=2 and columns=3. We can fill \(\mathbf{B}\) element by element:
B[1,1]=5
B[1,2]=6
B[1,3]=7
B[2,1]=8
B[2,2]=9
B[2,3]=10
B
1 2 3 +----------------+ 1 | 5 6 7 | 2 | 8 9 10 | +----------------+
Building a matrix from submatrices
Suppose you have the matrices A to D defined as: The matrix E=[ACBD]
A=(1,2 \ 3,4)
B=(5,6,7 \ 8,9,10)
C=(3,4 \ 5,6)
D=(1,2,3 \ 4,5,6)
E=(A,B \ C,D)
E
1 2 3 4 5 +--------------------------+ 1 | 1 2 5 6 7 | 2 | 3 4 8 9 10 | 3 | 3 4 1 2 3 | 4 | 5 6 4 5 6 | +--------------------------+
Creating Vectors
Row and column vectors can also be created using the same basic syntax:
f = (1, 2, 3)
f
1 2 3 +-------------+ 1 | 1 2 3 | +-------------+
or, a column vector can be created by
g=(3\ 4 \5)
g
1 +-----+ 1 | 3 | 2 | 4 | 3 | 5 | +-----+
The command below can construct a row vectors of incremented integer values between 1 and 100 (e.g. 1,2,3,…,99,100).
id_rows=(1::5)
id_rows
1 +-----+ 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 | 5 | 5 | +-----+
Creating Scalars
These are easy. To define a scalar variable called u
:
u = 3
u
3
Creating a vector of zeros or ones
Suppose we have 1000 observations and we wish to create a column of ones (this is especially useful for estimating a constant term), use this command
ones=J(1000, 1, 1)
ones[1::5]
1 +-----+ 1 | 1 | 2 | 1 | 3 | 1 | 4 | 1 | 5 | 1 | +-----+
This command can be combined with what we have previously to create the fully matrix of independent variables (with the constant in the first positions) using
X=(ones=J(1000, 1, 1),x)
so long as your matrix of independent variables x
exists in mata
and has 1000 rows.
Creating the Identity Matrix
The command will create an identity matrix with 5 rows/columns.
identity = I(5)
identity
[symmetric] 1 2 3 4 5 +---------------------+ 1 | 1 | 2 | 0 1 | 3 | 0 0 1 | 4 | 0 0 0 1 | 5 | 0 0 0 0 1 | +---------------------+
Note, Stata only shows the lower triangular part of any symmetric matrix.
Stata datasets in Mata
Once you have loaded data into stata as described above, it is easy to access that information from within mata. Using the Mroz data (that we loaded into Stata already) into mata
, there are two ways to proceed. One can copy the data or one can create a view that always refers back to the original stata dataset. Views are useful if you want to modify the data in mata and then return to stata with the original dataset changed based on operations in mata, while copying the data is both faster and requires less memory. If you need to do all your work in mata and don't need to change any of the underlying .dta data, I recommend the copy method. The command to load everything in the stata workspace into mata is
X=st_data(.,.)
X[1::5,]
1 2 3 4 5 +----------------------------------------------------------------------- 1 | 1 1610 1 0 32 2 | 1 1656 0 2 30 3 | 1 1980 1 3 35 4 | 1 456 0 3 34 5 | 1 1568 1 2 31 +----------------------------------------------------------------------- 6 7 8 9 10 ----------------------------------------------------------------------- 1 12 3.354000092 2.650000095 2708 34 2 12 1.388900042 2.650000095 2310 30 3 12 4.545499802 4.039999962 3072 40 4 12 1.096500039 3.25 1920 53 5 14 4.591800213 3.599999905 2000 32 ----------------------------------------------------------------------- 11 12 13 14 15 ----------------------------------------------------------------------- 1 12 4.028800011 16310 .7214999795 12 2 9 8.441599846 21800 .6614999771 7 3 12 3.580699921 21040 .6915000081 12 4 10 3.541699886 7300 .7814999819 7 5 12 10 27300 .6215000153 12 ----------------------------------------------------------------------- 16 17 18 19 20 -----------------------------------------------------------------------+ 1 7 5 0 14 1 | 2 7 11 1 5 1 | 3 7 5 0 15 1 | 4 7 5 0 6 1 | 5 14 9.5 1 7 1 | -----------------------------------------------------------------------+
Note, columns aren't labeled and you need to keep track of variable order in Stata to know which columns are important for your work.
Alternatively, you can selectively include columns in the order you define using this and viewing the first 5 rows:
X=st_data(.,("kl6","k618","faminc"))
X[1::5,]
1 2 3 +-------------------------+ 1 | 1 0 16310 | 2 | 0 2 21800 | 3 | 1 3 21040 | 4 | 0 3 7300 | 5 | 1 2 27300 | +-------------------------+
Remember, once you end the mata session, all changes to the data following an st_data
command are lost. The st_view
command has identical syntax to st_data
and allows changes to the data to be preserved once back in stata. In this course, it is sufficient to use the command st_data
to load data into mata as described above.
The mata workspace
The command mata describe
will list all the matrices, vectors, and scalars currently defined.
mata describe
# bytes type name and extent ------------------------------------------------------------------------------- 32 real matrix A[2,2] 48 real matrix B[2,3] 32 real matrix C[2,2] 48 real matrix D[2,3] 160 real matrix E[4,5] 18,072 real matrix X[753,3] 24 real rowvector f[3] 24 real colvector g[3] 80 real colvector id1[10] 40 real colvector id_cols[5] 40 real colvector id_rows[5] 200 real matrix identity[5,5] 8,000 real colvector ones[1000] 8 real scalar u -------------------------------------------------------------------------------
To delete all of these, issue mata clear
. To delete only a few matrices, vectors, or scalars, issue mata drop X f g
Getting Information about your matrices and vectors
Stata offers three functions useful for checking conformability conditions. The function rows(X)
and cols(X)
return the number of rows and columns of X respectively,
rows(X)
cols(X)
753 3
while length()
length(X)
2259
Calculates the total number of elements in matrix X, equal to (# rows) × (# columns.).
Linear Algebra Operations
Important Commands
Operation | Command |
---|---|
Transpose of B |
B ' |
Inverse of B | luinv(B) |
Inverse of B (if symmetric) | invsym(B) |
Diagonal Elements of matrix B | diagonal(B) |
Put vector B into diagonal square matrix | diag(B) |
Upper Triangular Elements of B | uppertriangle(B) |
Lower Triangular Elements of B | lowertriangle(B) |
Sort based on values in column i | sort(B,i) |
Sort based on values in column i and j | sort(B,(i,j)) |
Multiplication of Matrices if missing elements | cross(B,A) |
Multiplication and Addition
For matrices A and B of same dimensions, matrix addition is given by
D = A + C
D
1 2 +-----------+ 1 | 4 6 | 2 | 8 10 | +-----------+
Subtraction follows in a similar way. Multiplication (assuming conformability of A and B) is given by
D = A * B
D
1 2 3 +----------------+ 1 | 21 24 27 | 2 | 47 54 61 | +----------------+
Combinations of these operators are also possible. For example, \(\mathbf{(x'x)^{-1}x'y}\) is
invsym(x'*x)*x'*y
Would be the OLS estimator we discuss in Chapter 1. There are many more functions and tools in the mata environment that I won't describe here, but are available to interested students.