# Monte Carlos, Random Number Generation, and Parallel Processing in Python

Monte Carlo analysis is a big part of my research: for testing the properties of new estimators or for debugging code. Computers with multiple processors and cores offer a quick way to run dozens if not hundreds of monte-carlos replicates in parallel. This can drop runtimes for the monte-carlo by as much as # processors \(\times\) # of cores (although due to overhead, things don't always scale linearly). This note serves two purposes. It demonstrates how to run parallel jobs in python in a monte carlo setting, and **more importantly** shows how randomness is treated by default in python parallel libraries like `joblib`

or `multiprocess`

and what this means for proper monte carlo analysis.

The typical workflow is something like this:

- Develop estimator
- Develop a data generation process for creating a dataset with known statistical properties (the toy dataset). This process produces data from
- known parameters \(\pmb{theta}\) and \(\epsilon\) (although some of these may be randomly drawn inside the data generation process)
- randomly generated independent variable data (\(\mathbf{x}\))
- a known functional relationship between the dependent variable (\(\mathbf{y}\)) and all other model information (\(f(\mathbf{x},\pmb{theta},\epsilon)\))
- other parameters may be important for the data generation process (e.g. sample size (\(N\))). Usually these will be fixed for all replicates within a particular monte carlo experiment

- Given the estimator and a toy dataset, run the model and collect model estimates for \(\pmb{\theta}\)
- Repeat \(R\) times and summarize data

It should be apparent that step (2) involves quite a bit of random number generation.