A Brief Stata Primer, with Mata and Linear Algebra

Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we’ll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. S Plus, Matlab, etc.) you are free to do so.

Basic Stata Commands

Getting Help

If you need to find general help in stata, type
help command
where command is some stata command. You can also do keyword searches
search keyword
To see the same set of results in a better help viewer, type
view search keyword
for example
view search reg

Loading Data

Loading Stata Datasets from the Web

For all of our class excercises, we will be using datasets on my website. These are easily accessed from within stata. For example, suppose you want to use the file mroz.dta that is used in the first problem set. To load this file, issue the command
webuse set http://rlhick.people.wm.edu/econ407/data/
followed by
webuse mroz
You will then have the file loaded into stata. My files are read-only, so if you plan on making changes, you should immediately save these locally so your changes to the data persist through time.

Loading stata files stored on your hard disk

Suppose you have the file mroz.dta on your personal computer in a directory called /some/place. To load this file into stata, issue the command
use "/some/place/mroz"
(for mac/unix and the data will be loaded. For windows, if the file is in a directory \some\place, simply reverse the slash characters.

Loading text files stored on your hard disk

Suppose you have the file mroz.csv on your personal computer in a directory called /some/place. To load this file into stata, issue the command
insheet "/some/place/mroz.csv"
(for mac/unix and the data will be loaded. For windows, if the file is in a directory \some\place, simply reverse the slash characters.

Creating and Modifying Variables

The command
gen ln_gdp=log(gdp)
will generate the natural log of gdp for each row in your dataset. Beware of missing value warning messages.

Once a variable is created, it can be modified using the command
replace ln_gdp=ln_gdp+1
Conditional commands are also possible. Suppose you want to create a dummy called old =1 only if age is greater than 60 and otherwise zero. You can do this using conditional statements
gen old=1 if age>=60
followed by
replace old=0 if age<60

A better way to create dummy variables

Suppose a variable x takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use
tab x, gen(dum_x)

Starting Over

Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the
clear command

Log Files

A very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue
log using "/some/place/my_first.log", replace txt
Will create (or if it exists, will replace) the file my_first.log in the folder /some/place. If you don’t won’t to replace your existing work, use this command instead
log using "/some/place/my_first.log", append txt
And all of your results will be appended to the log file. When you are finished for a stata session, issue the command
log close
to close the file and save all changes. You may then open it using the text editor of your choosing.

Do Files

Do files allow you to put all of the relevant stata commands for a project into one file, so that results can be easily replicated from one stata settion to the next. The use of do files are highly recommended for your own work, and are a required part of your assignments in the course. I will illustrate their use early in the class.

Linear Algebra in Stata1

Stata has a linear algebra environment that can be started using the
mata
command from the stata command line. Notice, when you type mata from the stata command window, the command prompt changes from a . to :. This is really your only way of distinguishing if you are in the mata or stata environment. Once you have finished using mata you can return to the normal stata environment by typing
end
Commands for mata may also be nested inside of stata do files (command files) so long as all mata commands are between the commands
mata
and
end

Getting Help

If you need to find general help on mata, type
help mata command
where command is some mata command. You can also do keyword searches
search mata keyword
To see the same set of results in a better help viewer, type
view search mata keyword
for example
view search mata inverse

Creating matrices, vectors, and scalars

There are two ways to create a matrix. Consider a two by two matrix,
A=(1,2 \ 3,4)
will give you

$$
\begin{equation}
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
\end{equation}
$$

Or, you could create an empty matrix of the desired dimension
B=J(2,3,.)
where $B$ is of dimension rows=2 and columns=3. To fill it element by element use
B[1,1]=5
B[1,2]=6
B[1,3]=7
B[2,1]=8
B[2,2]=9
B[2,3]=10
To display your matrix, type
b

Building a matrix from submatrices

Suppose you have the matrices A to D defined as
A=(1,2 \ 3,4)
B=(5,6,7 \ 8,9,10)
C=(3,4 \ 5,6)
D=(1,2,3 \ 4,5,6)
The matrix $E=\begin{bmatrix}A & B \\ C & D \end{bmatrix}$
can be constructed by
E=(A,B \ C,D)
to yield

$$
\begin{equation}\label{eq:ps1_question}
\begin{bmatrix}
1&2&5&6&7 \\
3&4&8&9&10 \\
3&4&1&2&3 \\
5&6&4&5&6
\end{bmatrix}
\end{equation}
$$

Creating Vectors

Row and column vectors can also be created using the same basic syntax:
f=(1,2,3)
is

$$
\begin{equation}
\begin{bmatrix}
1 & 2 & 3
\end{bmatrix}
\end{equation}
$$

or, a column vector can be created by
g=(3\ 4 \5)
The special commands
id=(1::100)
and
id=(1:100)
are useful for constructing row and column vectors, respectively of incremented integer values between 1 and 100 (e.g. $1,2,3,\ldots,99,100$).

Creating Scalars

These are easy. Simply define a scalar by
u=3

Creating a vector of zeros or ones

Suppose we have 1000 observations and we wish to create a column of ones (this is especially useful for estimating a constant term), use this command
ones=J(1000, 1, 1)
This command can be combined with what we have previously to create the fully matrix of independent variables (with the constant in the first positions) using
X=(ones=J(1000, 1, 1),x)
so long as your matrix of independent variables has 1000 rows.

Creating the Identity Matrix

The command
I(1000)
will create an identity matrix with 1000 rows.

Stata datasets in Mata

Once you have loaded data into stata (.dta or .csv files), it is easy to access that information from within mata. For example, suppose we use the stata sample dataset, auto.dta accessed by
webuse auto.dta
There are generally two ways to get your .dta stata data into mata. One can copy the data or one can create a view that always refers back to the original stata dataset. Views are useful if you want to modify the data in mata and then return to stata with the original dataset changed based on operations in mata, while copying the data is both faster and requires less memory. If you need to do all your work in mata and don’t need to change any of the underlying .dta data, I recommend the copy method. The command is
X=st_data(.,.)
copies the entire currently loaded stata database into mata, while
X=st_data(.,("mpg","rep78","weight"))
copies the all rows and only the three variables listed (columns) into X. It is also possible to copy in only a subset of rows and columns as
X=st_data((1::5,7::9),("mpg","rep78","weight"))
copies in rows 1 to 4 and 7 to 9 with the three variables into X. Remember, once you end the mata session, all changes to the data following an st_data command are lost. The st_view command has identical syntax to st_data and will allow for changes to be preserved once back in stata. In this course, it is sufficient to use the command
X=st_data(.,.)
to load data into mata.

The mata workspace

The command
mata describe
will list all the matrices, vectors, and scalars currently defined. To delete all of these, issue
mata clear
To delete only a few matrices, vectors, or scalars, issue
mata drop X f g

Getting Information about your matrices and vectors

Stata offers three functions useful for checking conformability conditions. The function
rows(X) and cols(X)
return the number of rows and columns of X respectively, while
length(X)
Calculates the total number of elements in matrix X, equal to (# rows) $\times$ (# columns.)

Linear Algebra Operations

Important Commands

Operation Command
Transpose of B B'
Inverse of B luinv(B)
Inverse of B (if symmetric) invsym(B)
Diagonal Elements of matrix B diagonal(B)
Put vector B into diagonal square matrix diag(B)
Upper Triangular Elements of B uppertriangle(B)
Lower Triangular Elements of B lowertriangle(B)
Sort based on values in column i sort(B,i)
Sort based on values in column i and j sort(B,(i,j))
Multiplication of Matrices if missing elements cross(B,A)

Multiplication and Addition

For matrices A and B of same dimensions, matrix addition is given by
C=A+B
Subtraction follows in a similar way. Multiplication (assuming conformability of A and B) is given by
C=A*B
Of course, combinations of these operators are possible as well. For example,
invsym(x'x)*x'y
Would be the OLS estimator we discuss in Chapter 1. There are many more functions and tools in the mata environment that I won’t describe here, but that are accessible to interested students.

Maximum Likelihood in Stata

Under Construction.


  1. This section is an abbreviated version of an excellent stata tutorial by Kurt Schmidheiny at the Universitat Pompeu Fabra, Barcelona Spain