# ECON 407: Stata Primer

Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we'll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. S Plus, Matlab, etc.) you are free to do so.

## Basic Stata Commands¶

### Getting Help¶

If you need to find general help in stata, type
`help command`

where `command`

is some stata command. You can also do keyword searches
`search keyword`

To see the same set of results in a better help viewer, type
`view search keyword`

for example
`view search reg`

### Loading Data¶

#### Loading Stata Datasets from the Web¶

For all of our class excercises, we will be using datasets on my website. These are easily accessed from within stata. For example, suppose you want to use the file `mroz.dta`

that is used in the first problem set. To load this file, issue the command
`webuse set http://rlhick.people.wm.edu/econ407/data/`

followed by
`webuse mroz`

You will then have the file loaded into stata. My files are read-only, so if you plan on making changes, you should immediately save these locally so your changes to the data persist through time.

#### Loading stata files stored on your hard disk¶

Suppose you have the file mroz.dta on your personal computer in a directory called `/some/place`

. To load this file into stata, issue the command
`use "/some/place/mroz"`

(for mac/unix and the data will be loaded. For windows, if the file is in a directory `\some\place`

, simply reverse the slash characters.

#### Loading text files stored on your hard disk¶

Suppose you have the file mroz.csv on your personal computer in a directory called `/some/place`

. To load this file into stata, issue the command
`insheet "/some/place/mroz.csv"`

(for mac/unix and the data will be loaded. For windows, if the file is in a directory `\some\place`

, simply reverse the slash characters.

### Creating and Modifying Variables¶

The command
`gen ln_gdp=log(gdp)`

will generate the natural log of gdp for each row in your dataset. Beware of missing value warning messages.

Once a variable is created, it can be modified using the command
`replace ln_gdp=ln_gdp+1`

Conditional commands are also possible. Suppose you want to create a dummy called `old`

=1 only if age is greater than 60 and otherwise zero. You can do this using conditional statements
`gen old=1 if age>=60`

followed by
`replace old=0 if age<60`

### A better way to create dummy variables¶

Suppose a variable `x`

takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use
`tab x, gen(dum_x)`

### Starting Over¶

Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the
`clear`

command

### Log Files¶

A very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue
`log using "/some/place/my_first.log", replace txt`

Will create (or if it exists, will replace) the file `my_first.log`

in the folder `/some/place`

. If you don't won't to replace your existing work, use this command instead
`log using "/some/place/my_first.log", append txt`

And all of your results will be appended to the log file. When you are finished for a stata session, issue the command
`log close`

to close the file and save all changes. You may then open it using the text editor of your choosing.

### Do Files¶

Do files allow you to put all of the relevant stata commands for a project into one file, so that results can be easily replicated from one stata settion to the next. The use of do files are highly recommended for your own work, and are a required part of your assignments in the course. I will illustrate their use early in the class.

## Linear Algebra in Stata^{1}¶

Stata has a linear algebra environment that can be started using the`mata`

command from the stata command line. Notice, when you type `mata`

from the stata command window, the command prompt changes from a `.`

to `:`

. This is really your only way of distinguishing if you are in the mata or stata environment. Once you have finished using `mata`

you can return to the normal stata environment by typing
`end`

Commands for mata may also be nested inside of stata do files (command files) so long as all mata commands are between the commands
`mata`

and
`end`

### Getting Help¶

If you need to find general help on mata, type
`help mata command`

where `command`

is some mata command. You can also do keyword searches
`search mata keyword`

To see the same set of results in a better help viewer, type
`view search mata keyword`

for example
`view search mata inverse`

### Creating matrices, vectors, and scalars¶

There are two ways to create a matrix. Consider a two by two matrix,
`A=(1,2 \ 3,4)`

will give you

Or, you could create an empty matrix of the desired dimension
`B=J(2,3,.)`

where $B$ is of dimension rows=2 and columns=3. To fill it element by element use
`B[1,1]=5`

`B[1,2]=6`

`B[1,3]=7`

`B[2,1]=8`

`B[2,2]=9`

`B[2,3]=10`

To display your matrix, type
`b`

#### Building a matrix from submatrices¶

Suppose you have the matrices A to D defined as
`A=(1,2 \ 3,4)`

`B=(5,6,7 \ 8,9,10)`

`C=(3,4 \ 5,6)`

`D=(1,2,3 \ 4,5,6)`

The matrix $E=\begin{bmatrix}A & B \\ C & D \end{bmatrix}$
can be constructed by
`E=(A,B \ C,D)`

to yield

#### Creating Vectors¶

Row and column vectors can also be created using the same basic syntax:
`f=(1,2,3)`

is

or, a column vector can be created by
`g=(3\ 4 \5)`

The special commands
`id=(1::100)`

and
`id=(1:100)`

are useful for constructing row and column vectors, respectively of incremented integer values between 1 and 100 (e.g. $1,2,3,\ldots,99,100$).

#### Creating Scalars¶

These are easy. Simply define a scalar by
`u=3`

#### Creating a vector of zeros or ones¶

Suppose we have 1000 observations and we wish to create a column of ones (this is especially useful for estimating a constant term), use this command
`ones=J(1000, 1, 1)`

This command can be combined with what we have previously to create the fully matrix of independent variables (with the constant in the first positions) using
`X=(ones=J(1000, 1, 1),x)`

so long as your matrix of independent variables has 1000 rows.

#### Creating the Identity Matrix¶

The command
`I(1000)`

will create an identity matrix with 1000 rows.

### Stata datasets in Mata¶

Once you have loaded data into stata (.dta or .csv files), it is easy to access that information from within mata. For example, suppose we use the stata sample dataset, auto.dta accessed by
`webuse auto.dta`

There are generally two ways to get your .dta stata data into mata. One can copy the data or one can create a view that always refers back to the original stata dataset. Views are useful if you want to modify the data in mata and then return to stata with the original dataset *changed* based on operations in mata, while copying the data is both faster and requires less memory. If you need to do all your work in mata and don't need to change any of the underlying .dta data, I recommend the copy method. The command is

`X=st_data(.,.)`

copies the entire currently loaded stata database into mata, while

`X=st_data(.,("mpg","rep78","weight"))`

copies the all rows and only the three variables listed (columns) into X. It is also possible to copy in only a subset of rows and columns as

`X=st_data((1::5,7::9),("mpg","rep78","weight"))`

copies in rows 1 to 4 and 7 to 9 with the three variables into X. Remember, once you end the mata session, all changes to the data following an `st_data`

command are lost. The `st_view`

command has identical syntax to `st_data`

and will allow for changes to be preserved once back in stata. In this course, it is sufficient to use the command
`X=st_data(.,.)`

to load data into mata.

### The mata workspace¶

The command
`mata describe`

will list all the matrices, vectors, and scalars currently defined. To delete all of these, issue
`mata clear`

To delete only a few matrices, vectors, or scalars, issue
`mata drop X f g`

### Getting Information about your matrices and vectors¶

Stata offers three functions useful for checking conformability conditions. The function
`rows(X)`

and `cols(X)`

return the number of rows and columns of X respectively, while
`length(X)`

Calculates the total number of elements in matrix X, equal to (# rows) $\times$ (# columns.)

### Linear Algebra Operations¶

### Important Commands¶

Operation | Command |
---|---|

Transpose of B | `B'` |

Inverse of B | `luinv(B)` |

Inverse of B (if symmetric) | `invsym(B)` |

Diagonal Elements of matrix B | `diagonal(B)` |

Put vector B into diagonal square matrix | `diag(B)` |

Upper Triangular Elements of B | `uppertriangle(B)` |

Lower Triangular Elements of B | `lowertriangle(B)` |

Sort based on values in column i | `sort(B,i)` |

Sort based on values in column i and j | `sort(B,(i,j))` |

Multiplication of Matrices if missing elements | `cross(B,A)` |

### Multiplication and Addition¶

For matrices A and B of same dimensions, matrix addition is given by
`C=A+B`

Subtraction follows in a similar way. Multiplication (assuming conformability of A and B) is given by
`C=A*B`

Of course, combinations of these operators are possible as well. For example,
`invsym(x'x)*x'y`

Would be the OLS estimator we discuss in Chapter 1. There are many more functions and tools in the mata environment that I won't describe here, but that are accessible to interested students.

This section is an abbreviated version of an excellent stata tutorial by Kurt Schmidheiny at the Universitat Pompeu Fabra, Barcelona Spain↩