I like software libraries. A software library means that someone has already done some of the heavy lifting for me. A software library means that I won’t have to reinvent the wheel. It means that I might not have to spend my afternoon fixing bugs and stringently validating my own code*.
It also means that I don’t necessarily have a good idea of what the computer is doing “behind the scenes”; that I’m potentially unaware of the clever (or not) algorithm that is being implemented, and what it’s limitations (or bugs) are.
For a long time now, I’ve been blithely accepting that numpy.random.multivariate_normal applies some sensible, but mysterious, procedure in order to generate random variables from a multivariate Gaussian (normal) distribution. And it does – I just never tried to unravel said mystery. So, prompted by Dag Sverre, and courtesy of Wikipedia, I present a Python snippet that draws multivariate Gaussians given (a) some way of generating a set of standard univariate Gaussian random numbers, and (b) a Cholesky decomposition routine. In all, it’s about 10% faster than the NumPy routine, which applies a bunch of validity tests that I don’t.
import numpy as np
# Define mean, covariance matrix, and no. of samples required
mean = ... # some array of length N
cov = ... # some positive-definite NxN matrix
Ndraws = 1000
# Do factorisation (can store this and use it again later)
L = np.linalg.cholesky(cov)
# Get 3*Ndraws Gaussian random variables (mean=0, variance=1)
norm = np.random.normal(size=Ndraws*3).reshape(3, Ndraws)
# Construct final set of random numbers (with correct mean)
rand = mean + np.dot(L, norm)
Of course, this now introduces two new mysteries: How do you efficiently implement Cholesky decomposition, and how do you generate univariate Gaussian random numbers? Those are exercises for another day, I think.
* It depends on how sloppy I’m feeling; of course, you should validate everything that matters, regardless of who wrote it.