ECON 407: Stata Primer

Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we'll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. S Plus, Matlab, etc.) you are free to do so.

Basic Stata Commands

Getting Help

If you need to find general help in stata, type help command where command is some stata command. You can also do keyword searches search keyword To see the same set of results in a better help viewer, type view search keyword for example view search reg

Loading Data

Loading Stata Datasets from the Web

For all of our class excercises, we will be using datasets on my website. These are easily accessed from within stata. For example, suppose you want to use the file mroz.dta that is used in the first problem set. To load this file, issue the command webuse set http://rlhick.people.wm.edu/econ407/data/ followed by webuse mroz You will then have the file loaded into stata. My files are read-only, so if you plan on making changes, you should immediately save these locally so your changes to the data persist through time.

Loading stata files stored on your hard disk

Suppose you have the file mroz.dta on your personal computer in a directory called /some/place. To load this file into stata, issue the command use "/some/place/mroz" (for mac/unix and the data will be loaded. For windows, if the file is in a directory \some\place, simply reverse the slash characters.

Loading text files stored on your hard disk

Suppose you have the file mroz.csv on your personal computer in a directory called /some/place. To load this file into stata, issue the command insheet "/some/place/mroz.csv" (for mac/unix and the data will be loaded. For windows, if the file is in a directory \some\place, simply reverse the slash characters.

Creating and Modifying Variables

The command gen ln_gdp=log(gdp) will generate the natural log of gdp for each row in your dataset. Beware of missing value warning messages.

Once a variable is created, it can be modified using the command replace ln_gdp=ln_gdp+1 Conditional commands are also possible. Suppose you want to create a dummy called old =1 only if age is greater than 60 and otherwise zero. You can do this using conditional statements gen old=1 if age>=60 followed by replace old=0 if age<60

A better way to create dummy variables

Suppose a variable x takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use tab x, gen(dum_x)

Starting Over

Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the clear command

Log Files

A very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue log using "/some/place/my_first.log", replace txt Will create (or if it exists, will replace) the file my_first.log in the folder /some/place. If you don't won't to replace your existing work, use this command instead log using "/some/place/my_first.log", append txt And all of your results will be appended to the log file. When you are finished for a stata session, issue the command log close to close the file and save all changes. You may then open it using the text editor of your choosing.

Do Files

Do files allow you to put all of the relevant stata commands for a project into one file, so that results can be easily replicated from one stata settion to the next. The use of do files are highly recommended for your own work, and are a required part of your assignments in the course. I will illustrate their use early in the class.

Linear Algebra in Stata1

Stata has a linear algebra environment that can be started using the
mata command from the stata command line. Notice, when you type mata from the stata command window, the command prompt changes from a . to :. This is really your only way of distinguishing if you are in the mata or stata environment. Once you have finished using mata you can return to the normal stata environment by typing end Commands for mata may also be nested inside of stata do files (command files) so long as all mata commands are between the commands mata and end

Getting Help

If you need to find general help on mata, type help mata command where command is some mata command. You can also do keyword searches search mata keyword To see the same set of results in a better help viewer, type view search mata keyword for example view search mata inverse

Creating matrices, vectors, and scalars

There are two ways to create a matrix. Consider a two by two matrix, A=(1,2 \ 3,4) will give you

$$ \begin{equation} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \end{equation} $$

Or, you could create an empty matrix of the desired dimension B=J(2,3,.) where $B$ is of dimension rows=2 and columns=3. To fill it element by element use B[1,1]=5 B[1,2]=6 B[1,3]=7 B[2,1]=8 B[2,2]=9 B[2,3]=10 To display your matrix, type b

Building a matrix from submatrices

Suppose you have the matrices A to D defined as A=(1,2 \ 3,4) B=(5,6,7 \ 8,9,10) C=(3,4 \ 5,6) D=(1,2,3 \ 4,5,6) The matrix $E=\begin{bmatrix}A & B \\ C & D \end{bmatrix}$ can be constructed by E=(A,B \ C,D) to yield

$$ \begin{equation}\label{eq:ps1_question} \begin{bmatrix} 1&2&5&6&7 \\ 3&4&8&9&10 \\ 3&4&1&2&3 \\ 5&6&4&5&6 \end{bmatrix} \end{equation} $$

Creating Vectors

Row and column vectors can also be created using the same basic syntax: f=(1,2,3) is

$$ \begin{equation} \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \end{equation} $$

or, a column vector can be created by g=(3\ 4 \5) The special commands id=(1::100) and id=(1:100) are useful for constructing row and column vectors, respectively of incremented integer values between 1 and 100 (e.g. $1,2,3,\ldots,99,100$).

Creating Scalars

These are easy. Simply define a scalar by u=3

Creating a vector of zeros or ones

Suppose we have 1000 observations and we wish to create a column of ones (this is especially useful for estimating a constant term), use this command ones=J(1000, 1, 1) This command can be combined with what we have previously to create the fully matrix of independent variables (with the constant in the first positions) using X=(ones=J(1000, 1, 1),x) so long as your matrix of independent variables has 1000 rows.

Creating the Identity Matrix

The command I(1000) will create an identity matrix with 1000 rows.

Stata datasets in Mata

Once you have loaded data into stata (.dta or .csv files), it is easy to access that information from within mata. For example, suppose we use the stata sample dataset, auto.dta accessed by webuse auto.dta There are generally two ways to get your .dta stata data into mata. One can copy the data or one can create a view that always refers back to the original stata dataset. Views are useful if you want to modify the data in mata and then return to stata with the original dataset changed based on operations in mata, while copying the data is both faster and requires less memory. If you need to do all your work in mata and don't need to change any of the underlying .dta data, I recommend the copy method. The command is

X=st_data(.,.)

copies the entire currently loaded stata database into mata, while

X=st_data(.,("mpg","rep78","weight"))

copies the all rows and only the three variables listed (columns) into X. It is also possible to copy in only a subset of rows and columns as

X=st_data((1::5,7::9),("mpg","rep78","weight"))

copies in rows 1 to 4 and 7 to 9 with the three variables into X. Remember, once you end the mata session, all changes to the data following an st_data command are lost. The st_view command has identical syntax to st_data and will allow for changes to be preserved once back in stata. In this course, it is sufficient to use the command X=st_data(.,.) to load data into mata.

The mata workspace

The command mata describe will list all the matrices, vectors, and scalars currently defined. To delete all of these, issue mata clear To delete only a few matrices, vectors, or scalars, issue mata drop X f g

Getting Information about your matrices and vectors

Stata offers three functions useful for checking conformability conditions. The function rows(X) and cols(X) return the number of rows and columns of X respectively, while length(X) Calculates the total number of elements in matrix X, equal to (# rows) $\times$ (# columns.)

Linear Algebra Operations

Important Commands

Operation Command
Transpose of B B'
Inverse of B luinv(B)
Inverse of B (if symmetric) invsym(B)
Diagonal Elements of matrix B diagonal(B)
Put vector B into diagonal square matrix diag(B)
Upper Triangular Elements of B uppertriangle(B)
Lower Triangular Elements of B lowertriangle(B)
Sort based on values in column i sort(B,i)
Sort based on values in column i and j sort(B,(i,j))
Multiplication of Matrices if missing elements cross(B,A)

Multiplication and Addition

For matrices A and B of same dimensions, matrix addition is given by C=A+B Subtraction follows in a similar way. Multiplication (assuming conformability of A and B) is given by C=A*B Of course, combinations of these operators are possible as well. For example, invsym(x'x)*x'y Would be the OLS estimator we discuss in Chapter 1. There are many more functions and tools in the mata environment that I won't describe here, but that are accessible to interested students.


  1. This section is an abbreviated version of an excellent stata tutorial by Kurt Schmidheiny at the Universitat Pompeu Fabra, Barcelona Spain