Gambling, physics, and one of my favorite simulations
A general notebook to introduce details and practical examples of the Monte Carlo method for simulations.
Introduction
Today, I decided to talk about one of my favorite numerical methods in computational physics just for fun: the Monte Carlo method. The Monte Carlo method (MCM) is any method of a class of statistical methods that are based on massive random sampling to obtain numerical results. In short, they use data randomness to generate a result for problems that are a priori deterministic.
Legend says that "Monte Carlo" came about during the Manhattan Project in World War II. In the project to build the atomic bomb, Stanisław Ulam, von Neumann and Fermi considered the possibility of using the method, which involved the direct simulation of problems of a probabilistic nature related to the neutron diffusion coefficient in certain materials. Despite having attracted the attention of these scientists in 1948, the logic of the method had been known for a long time. For example, there is a record of an article written by Lord Kelvin dozens of years earlier, who already used Monte Carlo techniques in a discussion of Boltzmann's equations.
Before the Monte Carlo method was developed, simulations had a deterministic problem due to statistical sampling that was used to estimate the uncertainties in the simulations. Monte Carlo simulations reverse this approach, solving deterministic problems using metaheuristic probabilities. In principle, Monte Carlo methods can be used to solve any problems with a probabilistic interpretation. By the Law of Large Numbers, integrals described by the expected value of some random variable can be approximated by obtaining the empirical average of independent samples of variables.
As you've probably already guessed if you've read this far, the name of the method is due to the city-state of Monte Carlo, the capital of Monaco (and of gambling), and very famous for its many casinos, which are directly related to the study of chaos and randomness within human history.
Estimating the value of Pi
A simple simulation example would be calculating $\pi$. Take a dart board. Draw a square, and a circle inside the square whose diameter equals the length of the side of the square. Throw some darts at it. Assuming you're as good at darts as me, they will be randomly populated inside the square (ignore any that land outside the square...). How many darts land inside the circle as well? It's a question of relative area:
The area of the square is $d^2$, and the area of circle is $\pi (d/2)^2$. The circle therefore covers $\pi/4$ of the area of the square. So assuming uniform distributions, the ratio of darts that land inside the circle to the total inside the square will also be $\pi/4$, which we can then rearrange to find pi.
So how many darts do you need to throw to get an accurate answer? Let's set up a program in Python to find out more details of this system empirically. First, imagine that we are going to assemble the system described above: a dartboard and a bounding square of this area
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Circle
from IPython.display import display
#create the plot so we can visualise it
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)
#add a square
Rpatch = Rectangle((0,0),1,1,ec='k',fc='grey',alpha=0.35)
ax.add_patch(Rpatch)
#add circle
Cpatch = Circle((0.5,0.5),0.5,ec='k',fc='b',alpha=0.25)
ax.add_patch(Cpatch)
plt.draw()
Now we make use of random number generators to simulate our dart thrower. Let's add a few to the plot as an example to see what happens
xs = np.random.random(20)
ys = np.random.random(20)
ax.plot(xs,ys,'ro')
display(fig)
To decide if the points are inside the circle, we just need to look at their radius (relative to the center coordinate of the circle):
rad = np.sqrt((xs-0.5)**2+(ys-0.5)**2)
inside = rad[rad<0.5] # use logical statements to index lists, this selects the elements of rad whose value is less than 0.5
print(len(inside), len(rad))
ratio = float(len(inside))/len(rad)
pi_guess = 4 * ratio
print ('Estimate of pi:',pi_guess)
print ('Relative error (%):', 100*abs(pi_guess-np.pi)/np.pi)
That was really not too bad for 20 points! Let's try increasing the numbers of darts though...
We'll rewrite the above code into a routine we can call with an argument that represents the number of points:
def estimate_pi(n):
""" Estimate pi using the drunken dart player model.
Argument n (integer) is the number of 'darts' thrown
"""
xs = np.random.random(n)
ys = np.random.random(n)
rad = np.sqrt((xs-0.5)**2+(ys-0.5)**2)
inside = rad[rad<0.5]
ratio = float(len(inside))/len(rad)
pi_guess = 4 * ratio
return pi_guess
Ok, so let's try with a few different values to see how it evolves:
for n in [20,200,2000,20000,200000]:
pi_guess = estimate_pi(n)
print ('Number of points:', n)
print ('Estimate of pi:',pi_guess)
print ('Relative error (%):', 100*abs(pi_guess-np.pi)/np.pi)
print ('')
So as more points are added, the estimate gets closer to the real value. One should note that sometimes low numbers appear to get lucky, so let's have a look at the variability:
pi_guesses = []
for i in range(20):
pi_guesses.append(estimate_pi(200))
print ('Number of points: 200')
print (pi_guesses)
print ('Mean value:', np.array(pi_guesses).mean())
print ('Standard deviation:', np.array(pi_guesses).std())
pi_guesses = []
for i in range(20):
pi_guesses.append(estimate_pi(200000))
print ('Number of points: 2M')
print (pi_guesses)
print ('Mean value:', np.array(pi_guesses).mean())
print ('Standard deviation:', np.array(pi_guesses).std())
Ok, so not only does the average get closer, but the standard deviation also decreases considerably when more points are used. That's really interesting!
Let's look at that graphically, using histograms to analyse the spread of values:
pi_guesses = []
for i in range(500):
pi_guesses.append(estimate_pi(200))
print ('200 darts (blue), 10000 darts (red)')
fig2 = plt.figure(figsize=(6,5))
ax = fig2.add_subplot(111)
ax.axvline(np.pi,lw=3,linestyle='dashed',color='k')
bad_hist = ax.hist(pi_guesses,bins=30,color='b',alpha=0.65,lw=0.2)
ax.set_xlim(2.7,3.5)
pi_guesses = []
for i in range(500):
pi_guesses.append(estimate_pi(10000))
good_hist = ax.hist(pi_guesses,bins=30,color='r',alpha=0.65,lw=0.2)
ax.set_xlabel('Estimate of $\pi$')
ax.set_ylabel('Number of occurrences')
plt.show()
Wrap up
So, now that we've been introduced to and had the opportunity to visualize a simple Monte Carlo simulation (and how accurate it can be made through computational brute force), let's do a little summary of the general algorithm we use to the our system:
-
Initialize circle_points, square_points and number of points to be generated N.
-
For i <= N:
-
Generate random x-point.
-
Generate random y-point.
-
Calculate d = xx + yy.
-
If d <= 1, increment circle_points.
-
Calculate pi = 4*(circle_points/square_points).
-
close
Monte Carlo Integration
In the previous example, notice that a a particular case of Monte Carlo integration was used to obtain the value of the area corresponding to the value of $\pi/4$. This integration method can be used analogously in any other function within some integration requirements. In order to help the reader properly visualize what is happening here, Imagine that we want to measure the area of a pond with arbitrary shape.
Suppose that this pond is in the middle of a field with known area $A$. If we throw $N$ stones randomly, such that they land within the boundaries of the field, and we count the number of stones that fall in the pond $N_{in}$, the area of the pond will be approximately proportional to the fraction of stones that make a splash, multiplied by $A$:
$$A_{pond}=\frac{N_{in}}{N}A.$$This simple procedure is an example of the “Monte Carlo” method.
Simple Monte Carlo integration
More generaly, imagine a rectangle of height $H$ in the integration interval $[a,b]$, such that the function $f(x)$ is within its boundaries. Compute $n$ pairs of random numbers $(x_i,y_i)$ such that they are uniformly distributed inside this rectangle. The fraction of points that fall within the area contained below $f(x)$, i. e., that satisfy $y_i \leq f(x_i)$ is an estimate of the ratio o fthe integral of $f(x)$ and the area of the rectangle. Hence, the estimate of the integral will be given by:
$$\int _a^b{f(x)dx} \simeq I(N) = \frac{N_{in}}{N}H(b-a).$$
Another Monte Carlo procedure is based on the definition:
$$\langle g \rangle=\frac{1}{(b-a)} \int _a^b{f(x)dx}.$$In order to determine this average, we sample the value of $f(x)$:
$$\langle f \rangle \simeq \frac{1}{N}\sum_{i=1}^{N}f(x_i),$$where the $N$ values $x_i$ are distributed unformly in the interval $[a,b]$. The integral will be given by
$$I(N)=(b-a) \langle f \rangle .$$
Monte Carlo error analysis
The Monte Carlo method clearly yields approximate results. The accuracy deppends on the number of values $N$ that we use for the average. A possible measure of the error is the “variance” $\sigma^2$ defined by:
$$\sigma ^2=\langle f^2 \rangle - \langle f \rangle ^2,$$where
$$\langle f \rangle = \frac{1}{N} \sum_{i=1}^N f(x_i)$$and
$$\langle f^2 \rangle = \frac{1}{N} \sum_{i=1}^{N} f(x_i)^2.$$The “standard deviation” is $\sigma$. However, we should expect that the error decreases with the number of points $N$, and the quantity $\sigma$ defines by ([mc_sigma]) does not. Hence, this cannot be a good measure of the error.
Imagine that we perform several measurements of the integral, each of them yielding a result $I_n$. These values have been obtained with different sequences of $N$ random numbers. According to the central limit theorem, these values whould be normally dstributed around a mean $\langle I \rangle$. Suppouse that we have a set of $M$ of such measurements ${I_n}$. A convenient measure of the differences of these measurements is the “standard deviation of the means” $\sigma_M$:
$$\sigma_M ^2=\langle I^2 \rangle - \langle I \rangle ^2,$$where
$$\langle I \rangle = \frac{1}{M} \sum_{n=1}^M I_n$$and
$$\langle I^2 \rangle = \frac{1}{M} \sum_{n=1}^{M} I_n^2.$$It can be proven that
$$\sigma_M \approx \sigma/\sqrt{N}.$$
This relation becomes exact in the limit of a very large number of measurements. Note that this expression implies that the error decreases with the square root of the number of trials, meaning that if we want to reduce the error by a factor 10, we need 100 times more points for the average. Now let's get our hands dirty on programming solving some practical exercises
Problem 1) One dimensional integration Example
-
Write a program that implements the “hit and miss” Monte Carlo integration algorithm. Find the estimate $I(N)$ for the integral of
$$f(x)=4\sqrt{1-x^2}$$
as a function of $N$, in the interval $(0,1)$. Choose $H=1$, and sample only the $x$-dependent part $\sqrt{1-x^2}$, and multiply the result by 4. Calculate the difference between $I(N)$ and the exact result $\pi$. This difference is a measure of the error associated with the Monte Carlo estimate. Make a log-log plot of the error as a function of $N$. What is the approximate functional deppendece of the error on $N$ for large $N$?
- Estimate the integral of $f(x)$ using the simple Monte Carlo integration by averaging over $N$ points, using the monte carlo integral, and compute the error as a function of $N$, for $N$ upt to 10,000. Determine the approximate functional deppendence of the error on $N$ for large $N$. How many trials are necessary to determine $I_N$ to two decimal places?
- Perform 10 measurements $I_n(N)$, with $N=10,000$ using different random sequences. Show in a table the values of $I_n$ and $\sigma$ according to the part [1] and [2]. Use [2] to estimate the standard deviation of the means, and compare to the values obtained from [2] using the 100,000 values.
- To verify that your result for the error is independent of the number of sets you used to divide your data, repeat the previous item grouping your results in 20 groups of 5,000 points each.
First, let's plot the function that we intend to integrate within the desired range
import numpy as np
from matplotlib import plt
x = np.arange(0,1,0.02)
plt.plot(x, 4*np.sqrt(1-x**2))
ngroups = 16
I = np.zeros(ngroups)
N = np.zeros(ngroups)
E = np.zeros(ngroups)
n0 = 100
for i in range(ngroups):
N[i] = n0
x = np.random.random(n0)
y = np.random.random(n0)
I[i] = 0.
Nin = 0
for j in range(n0):
if(y[j] < np.sqrt(1-x[j]**2)):
Nin += 1
I[i] = 4.*float(Nin)/float(n0)
E[i] = abs(I[i]-np.pi)
print (n0,Nin,I[i],E[i])
n0 *= 2
plt.plot(N,E,ls='-',c='red',lw=3);
plt.plot(N,0.8/np.sqrt(N),ls='-',c='blue',lw=3);
plt.xscale('log')
plt.yscale('log')
ngroups = 16
I = np.zeros(ngroups)
N = np.zeros(ngroups)
E = np.zeros(ngroups)
n0 = 100
for i in range(ngroups):
N[i] = n0
r = np.random.random(n0)
I[i] = 0.
for j in range(n0):
x = r[j]
I[i] += np.sqrt(1-x**2)
I[i] *= 4./float(n0)
E[i] = abs(I[i]-np.pi)
print (n0,I[i],E[i])
n0 *= 2
plt.plot(N,E,ls='-',c='red',lw=3);
plt.plot(N,0.8/np.sqrt(N),ls='-',c='blue',lw=3);
plt.xscale('log')
plt.yscale('log')
n0 = 100000
I = np.zeros(n0)
r = np.random.random(n0)
for j in range(n0):
x = r[j]
I[j] = 4*np.sqrt(1-x**2)
def group_measurements(ngroups):
global I,n0
nmeasurements = n0//ngroups
for n in range(ngroups):
Ig = 0.
Ig2 = 0.
for i in range(n*nmeasurements,(n+1)*nmeasurements):
Ig += I[i]
Ig2 += I[i]**2
Ig /= nmeasurements
Ig2 /= nmeasurements
sigma = Ig2-Ig**2
print(Ig,Ig2,sigma)
group_measurements(10)
print("=============================")
group_measurements(20)
print("=============================")
group_measurements(1)
Variance reduction
If the function being integrated does not fluctuate too much in the interval of integration, and does not differ much from the average value, then the standard Monte Carlo mean-value method should work well with a reasonable number of points. Otherwise, we will find that the variance is very large, meaning that some points will make small contributions, while others will make large contributions to the integral. If this is the case, the algorithm will be very inefficient. The method can be improved by splitting the function $f(x)$ in two $f(x)=f_1(x)+f_2(x)$, such that the integral of $f_1(x)$ is known, and $f_2(x)$ as a small variance. The “variance reduction” technique, consists then in evaluating the integral of $f_2(x)$ to obtain:
$$\int _a^b{f(x)dx}=\int _a^b {f_1(x)dx} + \int _a^b{f_2(x)dx} = \int _a^b{f_1(x)dx}+J.$$Importance Sampling
Imagine that we want to sample the function $f(x)=e^{-x^2}$ in the interval $[0,1]$. It is evident that most of our points will fall in the region where the value of $f(x)$ is very small, and therefore we will need a large number of values to achieve a decent accuracy. A way to improve the measurement by reducing the variance is obtained by “importance sampling”. As the name says, the idea is to sample the regions with larger contributions to the integral. For this goal, we introduce a probability distribution $P(x)$ normalized in the interval of integration
$$\int _a^b{P(x)dx} = 1.$$Then, we can rewrite the integral of $f(x)$ as
$$I=\int _a^b{\frac{f(x)}{P(x)}P(x)dx}$$
We can evaluate this integral, by sampling according to the probability distribution $P(x)$ and evaluating the sum
$$I(N)=\frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{P(x_i)}.$$
Note that for the uniform case $P(x)=1/(b-a)$, the expression reduces to the simple Monte Carlo integral.
We are free to choose $P(x)$ now. We wish to do it in a way to reduce and minimize the variance of the integrand $f(x)/P(x)$. The way to to this is picking a $P(x)$ that mimics $f(x)$ where $f(x)$ is large. if we are able to determine an apropiate $P(x)$, the integrand will be slowly varying, and hence the variance will be reduced. Another consideration is that the generation of points according to the distribution $P(x)$ should be a simple task. As an example, let us consider again the integral
$$I=\int _0^1 {e^{-x^2}dx}.$$A reasonable choice for a weigh function is $P(x)=Ae^{-x}$, where $A$ is a normalization constant.
Notice that for $P(x)=f(x)$ the variance is zero! This is known as the zero variance property. There is a catch, though: The probability function $P(x)$ needs to be normalized, implying that in reality, $P(x)=f(x)/\int f(x)dx$, which assumes that we know in advance precisely the integral that we are trying to calculate!
What you think of trying our new infos in another round of programming now?
Problem 2) Importance sampling
-
Choose the weight function $P(x)=e^{-x}$ and evaluate the integral:
$$\int _0^{\infty} {x^{3/2}e^{-x}dx}.$$
-
Choose $P(x)=e^{-ax}$ and estimate the integral
$$\int _0^{\pi} \frac{dx}{x^2+\cos ^2{x}}.$$
Determine the value of $a$ that minimizes the variance of the integral.
plt.xlim(0,10)
x = np.arange(0,10,0.1)
plt.plot(x,np.exp(-x), label = '$e^{-x}$')
plt.plot(x,np.exp(-x**2), label = '$e^{-x^2}$')
plt.plot(x,x**1.5*np.exp(-x), label = '$x^{3/2}e^{-x}$')
plt.legend()
plt.show()
def trapezoids(func, xmin, xmax, nmax):
Isim = func(xmin)+func(xmax)
h = (xmax-xmin)/nmax
for i in range(1,nmax):
x = xmin+i*h
Isim += 2*func(x)
Isim *= h/2
return Isim
def f(x):
return x**1.5*np.exp(-x)
print("Trapezoids: ", trapezoids(f, 0., 20., 100000))
# Simple Monte Carlo integration
n0 = 1000000
r = np.random.random(n0)
Itot = np.sum(r**1.5*np.exp(-r))
print("Simple Monte Carlo: ", Itot/n0)
x = -np.log(r)
Itot = np.sum(x**1.5)
print("Importance Sampling: ", Itot/n0)
plt.xlim(0,np.pi)
x = np.arange(0,np.pi,0.05)
plt.plot(x,1./(x**2+np.cos(x)**2), label = '$1/x^2 + \cos^2{x}$')
plt.plot(x,np.exp(-x), label = '$e^{-x}$')
plt.plot(x,np.exp(-2*x), label = '$e^{-2x}$')
plt.plot(x,np.exp(-0.2*x), label = '$e^{-0.2x}$')
plt.legend()
plt.show()
def g(x):
return 1./(x**2+np.cos(x)**2)
print("Trapezoids: ", trapezoids(g, 0., np.pi, 1000000))
# Simple Monte Carlo integration
n0 = 1000000
a = np.arange(0.1,2.1,0.1)
I = np.arange(0.1,2.1,0.1)
r = np.random.random(n0)
I0 = np.sum(1./((r*np.pi)**2+np.cos(r*np.pi)**2))
print("Simple Monte Carlo: ", I0/n0*np.pi)
# Importance Sampling
print("Importance Sampling:")
x = -np.log(r)
i = 0
for ai in a:
norm = (1.-np.exp(-ai*np.pi))/ai
x1 = norm*x/ai
Itot = 0.
Nin = 0
I2 = 0.
for xi in x1:
if(xi <= np.pi):
Nin += 1
Itot += g(xi)*np.exp(xi*ai)
I2 += (g(xi)*np.exp(xi*ai))**2
Itot *= norm
I2 *= norm
I[i] = Itot/Nin
i += 1
print(ai,Itot/Nin,np.sqrt(abs(Itot**2/Nin**2-I2/Nin))/np.sqrt(Nin))
plt.plot(a,I,ls='-',marker='o',c='red',lw=3)
Bonus code: The Metropolis algorithm
Use the Metropolis algorithm to sample points according to a ditribution and estimate the integral
$$\int _0^4 {x^2e^{-x}dx},$$with $P(x)=e^{-x}$ for $0 \leq x \leq 4$. Plot the number of times the walker is at points $x_0$, $x_1$, $x_2$, ... Is the integrand sampled uniformly? If not, what is the approximate region of $x$ where the integrand is sampled more often?
delta = 2
xmin = 0.
xmax = 4.
def f(x):
return x**2*np.exp(-x)
def P(x):
global xmin, xmax
if(x < xmin or x > xmax):
return 0.
return np.exp(-x)
def metropolis(xold):
global delta
xtrial = np.random.random()
xtrial = xold+(2*xtrial-1)*delta
weight = P(xtrial)/P(xold)
xnew = xold
if(weight >= 1): #Accept
xnew = xtrial
elif(weight != 0):
r = np.random.random()
if(r <= weight): #Accept
xnew = xtrial
return xnew
xwalker = (xmax+xmin)/2.
for i in range(100000):
xwalker = metropolis(xwalker)
I0 = 0.
N = 300000
x = np.zeros(N)
x[0] = xwalker
for i in range(1,N):
for j in range(20):
xwalker = metropolis(xwalker)
x[i] = xwalker
I0 += x[i]**2
binwidth=0.1
plt.hist(x,bins=np.arange(xmin-1, xmax+1, 0.1))
print("Trapezoids: ", trapezoids(f,xmin,xmax,100000))
print("Metropolis: ", I0*(1.-np.exp(-4.))/N)
plt.hist(x**2,bins=np.arange(xmin**2-1, xmax**2+1, 0.1))