# ECON 407: Stata Primer

Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we'll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. R, Python, or Matlab, etc.) you are free to do so, but I might no be able to support any technical issues you run into.

## Loading data into Stata

### Loading stata datasets

Stata can load comma-delimited (`csv`

), excel (`xls`

), and stata (`dta`

) files out of the box. It can also load data from the web:

```
use "http://rlhick.people.wm.edu/econ407/data/mroz"
sum
```

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- lfp | 753 .5683931 .4956295 0 1 whrs | 753 740.5764 871.3142 0 4950 kl6 | 753 .2377158 .523959 0 3 k618 | 753 1.353254 1.319874 0 8 wa | 753 42.53785 8.072574 30 60 -------------+--------------------------------------------------------- we | 753 12.28685 2.280246 5 17 ww | 753 2.374565 3.241829 0 25 rpwg | 753 1.849734 2.419887 0 9.98 hhrs | 753 2267.271 595.5666 175 5010 ha | 753 45.12085 8.058793 30 60 -------------+--------------------------------------------------------- he | 753 12.49137 3.020804 3 17 hw | 753 7.482179 4.230559 .4121 40.509 faminc | 753 23080.59 12190.2 1500 96000 mtr | 753 .6788632 .0834955 .4415 .9415 wmed | 753 9.250996 3.367468 0 17 -------------+--------------------------------------------------------- wfed | 753 8.808765 3.57229 0 17 un | 753 8.623506 3.114934 3 14 cit | 753 .6427623 .4795042 0 1 ax | 753 10.63081 8.06913 0 45

Loading files from disk is a slight variation the above command. Supposing that your stata data file mroz.dta was in the folder /some/place, in Linux or MacOS we would use the R command

```
use "/some/place/mroz.dta"
```

## Viewing Data

If you are using the graphical version of Stata (recommended) viewing data is easy and I can show you how to do that. Viewing Listing data at the command line is achieved by the `list`

command, and might be useful for your problem sets for showing a few lines of data. Here we'll view the first 5 rows of data:

```
list in 1/5
```

+--------------------------------------------------------------------+ 1. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 2. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1656 | 0 | 2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 | 9 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 8.4416 | 21800 | .6615 | 7 | 7 | 11 | 1 | 5 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 3. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 4. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 456 | 0 | 3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | 10 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 3.5417 | 7300 | .7815 | 7 | 7 | 5 | 0 | 6 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 5. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1568 | 1 | 2 | 31 | 14 | 4.5918 | 3.6 | 2000 | 32 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 10 | 27300 | .6215 | 12 | 14 | 9.5 | 1 | 7 | +--------------------------------------------------------------------+

You can combine `list`

with logical expressions for showing rows meeting logical conditions. Let's look at the first 3 rows where the respondent has kids less than 6 years old:

```
list if kl6>0 in 1/3
```

+--------------------------------------------------------------------+ 1. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ 3. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 | |--------------------------------------------------------------------| | hw | faminc | mtr | wmed | wfed | un | cit | ax | | 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 | +--------------------------------------------------------------------+

## Creating and Modifying Variables

### Creating Variables

In `stata`

, you need to start a new variable with `create`

.

```
gen newvar = lfp * ax
```

### Modifying Variables

To modify an existing variable, use `replace`

Unlike `stata`

we simply redefine the variable and don't need to bother with `replace`

:

```
replace newvar = newvar/10
```

Here is an example that creates a new dummy variable.

```
gen haskids = 0
replace haskids = (kl6>0) | (k618>0)
list haskids kl6 k618 in 1/10
```

(524 real changes made) +----------------------+ | haskids kl6 k618 | |----------------------| 1. | 1 1 0 | 2. | 1 0 2 | 3. | 1 1 3 | 4. | 1 0 3 | 5. | 1 1 2 | |----------------------| 6. | 0 0 0 | 7. | 1 0 2 | 8. | 0 0 0 | 9. | 1 0 2 | 10. | 1 0 2 | +----------------------+

### Creating dummy variables

While the above example shows how to make "manually" use logical checks to create dummy variables, a better way (particularly if you need to create many categories) is `tab`

. Suppose a variable x takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use

```
tab x, gen(dum_x)
```

## Starting Over

Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the `clear`

command

## Log Files

A very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue

```
log using "/some/place/my_first.log", replace txt
```

will create (or if it exists, will replace) the file my_{first}.log in the folder /some/place. If you don't won't to replace your existing work, use this command instead

```
log using "/some/place/my_first.log", append txt
```

and all of your results will be appended to the log file. When you are finished for a stata session, issue the command `log close`

to close the file and save all changes. You may then open it using the text editor of your choosing.

## Do Files

Do files allow you to put all of the relevant stata commands for a project into one file, so that results can be easily replicated from one stata settion to the next. The use of do files are highly recommended for your own work, and are a required part of your assignments in the course. I will illustrate their use early in the class. Additionally, you are required to write literate "do" files as I will show you on the first day of class.

## Getting help in Stata

If you need to find general help in stata, type `help command`

where command is some stata command. You can also do keyword searches: `search keyword`

. To see the same set of results in a better help viewer, type `view search keyword`

for example `view search reg`

.

## Linear Algebra in Stata

Stata has a linear algebra environment that can be started using the mata command from the stata command line. Notice, when you type mata from the stata command window, the command prompt changes from a `.`

to `:`

. This is really your only way of distinguishing if you are in the mata or stata environment. At this point "normal" stata commands (e.g. `summary`

, `reg`

, or `use`

) will not work and will lead to error messages. To exit `mata`

, issue the command `end`

. Commands for mata may also be nested inside stata do files (command files) so long as all mata commands are between the commands `mata`

and `end`

Getting help in mata is similar to the normal Stata environment. Type `help mata command`

where command is some mata command. You can also do keyword searches: `search mata keyword`

. To see the same set of results in a better help viewer, type `view search mata keyword`

. For example `view search mata inverse`

.

Once you have Stata running, you can invoke `mata`

like this

```
mata
```

------------------------------------------------- mata (type end to exit) -----

### Creating matrices, vectors, and scalars

There are two ways to create a matrix. Consider a two by two matrix,

```
A = (1,2 \ 3,4)
A
```

1 2 +---------+ 1 | 1 2 | 2 | 3 4 | +---------+

Or, you could create an empty matrix of the desired dimension

```
B=J(2,3,.)
B
```

1 2 3 +-------------+ 1 | . . . | 2 | . . . | +-------------+

where B is of dimension rows=2 and columns=3. We can fill \(\mathbf{B}\) element by element:

```
B[1,1]=5
B[1,2]=6
B[1,3]=7
B[2,1]=8
B[2,2]=9
B[2,3]=10
B
```

1 2 3 +----------------+ 1 | 5 6 7 | 2 | 8 9 10 | +----------------+

### Building a matrix from submatrices

Suppose you have the matrices A to D defined as: The matrix E=[ACBD]

```
A=(1,2 \ 3,4)
B=(5,6,7 \ 8,9,10)
C=(3,4 \ 5,6)
D=(1,2,3 \ 4,5,6)
E=(A,B \ C,D)
E
```

1 2 3 4 5 +--------------------------+ 1 | 1 2 5 6 7 | 2 | 3 4 8 9 10 | 3 | 3 4 1 2 3 | 4 | 5 6 4 5 6 | +--------------------------+

### Creating Vectors

Row and column vectors can also be created using the same basic syntax:

```
f = (1, 2, 3)
f
```

1 2 3 +-------------+ 1 | 1 2 3 | +-------------+

or, a column vector can be created by

```
g=(3\ 4 \5)
g
```

1 +-----+ 1 | 3 | 2 | 4 | 3 | 5 | +-----+

The command below can construct a row vectors of incremented integer values between 1 and 100 (e.g. 1,2,3,…,99,100).

```
id_rows=(1::5)
id_rows
```

1 +-----+ 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 | 5 | 5 | +-----+

### Creating Scalars

These are easy. To define a scalar variable called `u`

:

```
u = 3
u
```

3

### Creating a vector of zeros or ones

Suppose we have 1000 observations and we wish to create a column of ones (this is especially useful for estimating a constant term), use this command

```
ones=J(1000, 1, 1)
ones[1::5]
```

1 +-----+ 1 | 1 | 2 | 1 | 3 | 1 | 4 | 1 | 5 | 1 | +-----+

This command can be combined with what we have previously to create the fully matrix of independent variables (with the constant in the first positions) using

```
X=(ones=J(1000, 1, 1),x)
```

so long as your matrix of independent variables `x`

exists in `mata`

and has 1000 rows.

### Creating the Identity Matrix

The command will create an identity matrix with 5 rows/columns.

```
identity = I(5)
identity
```

[symmetric] 1 2 3 4 5 +---------------------+ 1 | 1 | 2 | 0 1 | 3 | 0 0 1 | 4 | 0 0 0 1 | 5 | 0 0 0 0 1 | +---------------------+

Note, Stata only shows the lower triangular part of any symmetric matrix.

### Stata datasets in Mata

Once you have loaded data into stata as described above, it is easy to access that information from within mata. Using the Mroz data (that we loaded into Stata already) into `mata`

, there are two ways to proceed. One can copy the data or one can create a view that always refers back to the original stata dataset. Views are useful if you want to modify the data in mata and then return to stata with the original dataset changed based on operations in mata, while copying the data is both faster and requires less memory. If you need to do all your work in mata and don't need to change any of the underlying .dta data, **I recommend the copy method**. The command to load everything in the stata workspace into mata is

```
X=st_data(.,.)
X[1::5,]
```

1 2 3 4 5 +----------------------------------------------------------------------- 1 | 1 1610 1 0 32 2 | 1 1656 0 2 30 3 | 1 1980 1 3 35 4 | 1 456 0 3 34 5 | 1 1568 1 2 31 +----------------------------------------------------------------------- 6 7 8 9 10 ----------------------------------------------------------------------- 1 12 3.354000092 2.650000095 2708 34 2 12 1.388900042 2.650000095 2310 30 3 12 4.545499802 4.039999962 3072 40 4 12 1.096500039 3.25 1920 53 5 14 4.591800213 3.599999905 2000 32 ----------------------------------------------------------------------- 11 12 13 14 15 ----------------------------------------------------------------------- 1 12 4.028800011 16310 .7214999795 12 2 9 8.441599846 21800 .6614999771 7 3 12 3.580699921 21040 .6915000081 12 4 10 3.541699886 7300 .7814999819 7 5 12 10 27300 .6215000153 12 ----------------------------------------------------------------------- 16 17 18 19 20 -----------------------------------------------------------------------+ 1 7 5 0 14 1 | 2 7 11 1 5 1 | 3 7 5 0 15 1 | 4 7 5 0 6 1 | 5 14 9.5 1 7 1 | -----------------------------------------------------------------------+

Note, columns aren't labeled and you need to keep track of variable order in Stata to know which columns are important for your work.

Alternatively, you can selectively include columns **in the order you define** using this and viewing the first 5 rows:

```
X=st_data(.,("kl6","k618","faminc"))
X[1::5,]
```

1 2 3 +-------------------------+ 1 | 1 0 16310 | 2 | 0 2 21800 | 3 | 1 3 21040 | 4 | 0 3 7300 | 5 | 1 2 27300 | +-------------------------+

Remember, once you end the mata session, all changes to the data following an `st_data`

command are lost. The `st_view`

command has identical syntax to `st_data`

and allows changes to the data to be preserved once back in stata. In this course, it is sufficient to use the command `st_data`

to load data into mata as described above.

### The mata workspace

The command `mata describe`

will list all the matrices, vectors, and scalars currently defined.

```
mata describe
```

# bytes type name and extent ------------------------------------------------------------------------------- 32 real matrix A[2,2] 48 real matrix B[2,3] 32 real matrix C[2,2] 48 real matrix D[2,3] 160 real matrix E[4,5] 18,072 real matrix X[753,3] 24 real rowvector f[3] 24 real colvector g[3] 80 real colvector id1[10] 40 real colvector id_cols[5] 40 real colvector id_rows[5] 200 real matrix identity[5,5] 8,000 real colvector ones[1000] 8 real scalar u -------------------------------------------------------------------------------

To delete all of these, issue `mata clear`

. To delete only a few matrices, vectors, or scalars, issue `mata drop X f g`

### Getting Information about your matrices and vectors

Stata offers three functions useful for checking conformability conditions. The function `rows(X)`

and `cols(X)`

return the number of rows and columns of X respectively,

```
rows(X)
cols(X)
```

753 3

while `length()`

```
length(X)
```

2259

Calculates the total number of elements in matrix X, equal to (# rows) × (# columns.).

### Linear Algebra Operations

#### Important Commands

Operation | Command |
---|---|

Transpose of B |
`B` ' |

Inverse of B | `luinv(B)` |

Inverse of B (if symmetric) | `invsym(B)` |

Diagonal Elements of matrix B | `diagonal(B)` |

Put vector B into diagonal square matrix | `diag(B)` |

Upper Triangular Elements of B | `uppertriangle(B)` |

Lower Triangular Elements of B | `lowertriangle(B)` |

Sort based on values in column i | `sort(B,i)` |

Sort based on values in column i and j | `sort(B,(i,j))` |

Multiplication of Matrices if missing elements | `cross(B,A)` |

### Multiplication and Addition

For matrices A and B of same dimensions, matrix addition is given by

```
D = A + C
D
```

1 2 +-----------+ 1 | 4 6 | 2 | 8 10 | +-----------+

Subtraction follows in a similar way. Multiplication (assuming conformability of A and B) is given by

```
D = A * B
D
```

1 2 3 +----------------+ 1 | 21 24 27 | 2 | 47 54 61 | +----------------+

Combinations of these operators are also possible. For example, \(\mathbf{(x'x)^{-1}x'y}\) is

```
invsym(x'*x)*x'*y
```

Would be the OLS estimator we discuss in Chapter 1. There are many more functions and tools in the mata environment that I won't describe here, but are available to interested students.