Introduction

Today, I decided to talk about one of my favorite numerical methods in computational physics just for fun: the Monte Carlo method. The Monte Carlo method (MCM) is any method of a class of statistical methods that are based on massive random sampling to obtain numerical results. In short, they use data randomness to generate a result for problems that are a priori deterministic.

Legend says that "Monte Carlo" came about during the Manhattan Project in World War II. In the project to build the atomic bomb, Stanisław Ulam, von Neumann and Fermi considered the possibility of using the method, which involved the direct simulation of problems of a probabilistic nature related to the neutron diffusion coefficient in certain materials. Despite having attracted the attention of these scientists in 1948, the logic of the method had been known for a long time. For example, there is a record of an article written by Lord Kelvin dozens of years earlier, who already used Monte Carlo techniques in a discussion of Boltzmann's equations.

Before the Monte Carlo method was developed, simulations had a deterministic problem due to statistical sampling that was used to estimate the uncertainties in the simulations. Monte Carlo simulations reverse this approach, solving deterministic problems using metaheuristic probabilities. In principle, Monte Carlo methods can be used to solve any problems with a probabilistic interpretation. By the Law of Large Numbers, integrals described by the expected value of some random variable can be approximated by obtaining the empirical average of independent samples of variables.

As you've probably already guessed if you've read this far, the name of the method is due to the city-state of Monte Carlo, the capital of Monaco (and of gambling), and very famous for its many casinos, which are directly related to the study of chaos and randomness within human history.

Estimating the value of Pi

A simple simulation example would be calculating $\pi$. Take a dart board. Draw a square, and a circle inside the square whose diameter equals the length of the side of the square. Throw some darts at it. Assuming you're as good at darts as me, they will be randomly populated inside the square (ignore any that land outside the square...). How many darts land inside the circle as well? It's a question of relative area:

The area of the square is $d^2$, and the area of circle is $\pi (d/2)^2$. The circle therefore covers $\pi/4$ of the area of the square. So assuming uniform distributions, the ratio of darts that land inside the circle to the total inside the square will also be $\pi/4$, which we can then rearrange to find pi.

So how many darts do you need to throw to get an accurate answer? Let's set up a program in Python to find out more details of this system empirically. First, imagine that we are going to assemble the system described above: a dartboard and a bounding square of this area

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Circle
from IPython.display import display

#create the plot so we can visualise it
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)
#add a square
Rpatch = Rectangle((0,0),1,1,ec='k',fc='grey',alpha=0.35)
ax.add_patch(Rpatch)
#add circle
Cpatch = Circle((0.5,0.5),0.5,ec='k',fc='b',alpha=0.25)
ax.add_patch(Cpatch)
plt.draw()

Now we make use of random number generators to simulate our dart thrower. Let's add a few to the plot as an example to see what happens

xs = np.random.random(20)
ys = np.random.random(20)

ax.plot(xs,ys,'ro')
display(fig)

To decide if the points are inside the circle, we just need to look at their radius (relative to the center coordinate of the circle):

rad = np.sqrt((xs-0.5)**2+(ys-0.5)**2)
inside = rad[rad<0.5] # use logical statements to index lists, this selects the elements of rad whose value is less than 0.5

print(len(inside), len(rad))

ratio = float(len(inside))/len(rad)
pi_guess = 4 * ratio

print ('Estimate of pi:',pi_guess)
print ('Relative error (%):', 100*abs(pi_guess-np.pi)/np.pi)

16 20
Estimate of pi: 3.2
Relative error (%): 1.8591635788130245

That was really not too bad for 20 points! Let's try increasing the numbers of darts though...

We'll rewrite the above code into a routine we can call with an argument that represents the number of points:

def estimate_pi(n):
    """ Estimate pi using the drunken dart player model. 
        Argument n (integer) is the number of 'darts' thrown
    """
    xs = np.random.random(n)
    ys = np.random.random(n)
    rad = np.sqrt((xs-0.5)**2+(ys-0.5)**2)
    inside = rad[rad<0.5]
    ratio = float(len(inside))/len(rad)
    pi_guess = 4 * ratio
    
    return pi_guess

Ok, so let's try with a few different values to see how it evolves:

for n in [20,200,2000,20000,200000]:
    pi_guess = estimate_pi(n)
    print ('Number of points:', n)
    print ('Estimate of pi:',pi_guess)
    print ('Relative error (%):', 100*abs(pi_guess-np.pi)/np.pi)
    print ('')

Number of points: 20
Estimate of pi: 3.8
Relative error (%): 20.957756749840456

Number of points: 200
Estimate of pi: 3.32
Relative error (%): 5.678882213018502

Number of points: 2000
Estimate of pi: 3.124
Relative error (%): 0.5599915561837868

Number of points: 20000
Estimate of pi: 3.1236
Relative error (%): 0.5727239516311371

Number of points: 200000
Estimate of pi: 3.13918
Relative error (%): 0.07679714895679349

So as more points are added, the estimate gets closer to the real value. One should note that sometimes low numbers appear to get lucky, so let's have a look at the variability:

pi_guesses = []
for i in range(20):
    pi_guesses.append(estimate_pi(200))

print ('Number of points: 200')
print (pi_guesses)
print ('Mean value:', np.array(pi_guesses).mean())
print ('Standard deviation:', np.array(pi_guesses).std())

pi_guesses = []
for i in range(20):
    pi_guesses.append(estimate_pi(200000))

print ('Number of points: 2M')
print (pi_guesses)
print ('Mean value:', np.array(pi_guesses).mean())
print ('Standard deviation:', np.array(pi_guesses).std())

Number of points: 200
[3.0, 3.02, 3.26, 3.1, 3.26, 3.12, 3.12, 2.92, 3.3, 3.12, 3.06, 3.48, 3.14, 3.2, 3.06, 3.12, 3.2, 3.1, 3.24, 3.08]
Mean value: 3.1450000000000005
Standard deviation: 0.12047821379818009
Number of points: 2M
[3.1434, 3.14544, 3.13554, 3.14164, 3.14594, 3.1413, 3.13708, 3.1405, 3.13772, 3.13214, 3.14258, 3.13984, 3.14158, 3.13406, 3.14264, 3.13798, 3.1469, 3.13852, 3.13662, 3.1413]
Mean value: 3.140136
Standard deviation: 0.003837444983318954

Ok, so not only does the average get closer, but the standard deviation also decreases considerably when more points are used. That's really interesting!

Let's look at that graphically, using histograms to analyse the spread of values:

pi_guesses = []
for i in range(500):
    pi_guesses.append(estimate_pi(200))

print ('200 darts (blue), 10000 darts (red)')
fig2 = plt.figure(figsize=(6,5))
ax = fig2.add_subplot(111)
ax.axvline(np.pi,lw=3,linestyle='dashed',color='k')
bad_hist = ax.hist(pi_guesses,bins=30,color='b',alpha=0.65,lw=0.2)
ax.set_xlim(2.7,3.5)


pi_guesses = []
for i in range(500):
    pi_guesses.append(estimate_pi(10000))
    
good_hist = ax.hist(pi_guesses,bins=30,color='r',alpha=0.65,lw=0.2)
ax.set_xlabel('Estimate of $\pi$')
ax.set_ylabel('Number of occurrences')
plt.show()

200 darts (blue), 10000 darts (red)

Wrap up

So, now that we've been introduced to and had the opportunity to visualize a simple Monte Carlo simulation (and how accurate it can be made through computational brute force), let's do a little summary of the general algorithm we use to the our system:

Initialize circle_points, square_points and number of points to be generated N.
For i <= N:
Generate random x-point.
Generate random y-point.
Calculate d = xx + yy.
If d <= 1, increment circle_points.
Calculate pi = 4*(circle_points/square_points).
close

Monte Carlo Integration

In the previous example, notice that a a particular case of Monte Carlo integration was used to obtain the value of the area corresponding to the value of $\pi/4$. This integration method can be used analogously in any other function within some integration requirements. In order to help the reader properly visualize what is happening here, Imagine that we want to measure the area of a pond with arbitrary shape.

Suppose that this pond is in the middle of a field with known area $A$. If we throw $N$ stones randomly, such that they land within the boundaries of the field, and we count the number of stones that fall in the pond $N_{in}$, the area of the pond will be approximately proportional to the fraction of stones that make a splash, multiplied by $A$:

$$A_{pond}=\frac{N_{in}}{N}A.$$

This simple procedure is an example of the “Monte Carlo” method.

Simple Monte Carlo integration

More generaly, imagine a rectangle of height $H$ in the integration interval $[a,b]$, such that the function $f(x)$ is within its boundaries. Compute $n$ pairs of random numbers $(x_i,y_i)$ such that they are uniformly distributed inside this rectangle. The fraction of points that fall within the area contained below $f(x)$, i. e., that satisfy $y_i \leq f(x_i)$ is an estimate of the ratio o fthe integral of $f(x)$ and the area of the rectangle. Hence, the estimate of the integral will be given by:

$$\int _a^b{f(x)dx} \simeq I(N) = \frac{N_{in}}{N}H(b-a).$$

Another Monte Carlo procedure is based on the definition:

$$\langle g \rangle=\frac{1}{(b-a)} \int _a^b{f(x)dx}.$$

In order to determine this average, we sample the value of $f(x)$:

$$\langle f \rangle \simeq \frac{1}{N}\sum_{i=1}^{N}f(x_i),$$

where the $N$ values $x_i$ are distributed unformly in the interval $[a,b]$. The integral will be given by

$$I(N)=(b-a) \langle f \rangle .$$

Monte Carlo error analysis

The Monte Carlo method clearly yields approximate results. The accuracy deppends on the number of values $N$ that we use for the average. A possible measure of the error is the “variance” $\sigma^2$ defined by:

$$\sigma ^2=\langle f^2 \rangle - \langle f \rangle ^2,$$

where

$$\langle f \rangle = \frac{1}{N} \sum_{i=1}^N f(x_i)$$

and

$$\langle f^2 \rangle = \frac{1}{N} \sum_{i=1}^{N} f(x_i)^2.$$

The “standard deviation” is $\sigma$. However, we should expect that the error decreases with the number of points $N$, and the quantity $\sigma$ defines by ([mc_sigma]) does not. Hence, this cannot be a good measure of the error.

Imagine that we perform several measurements of the integral, each of them yielding a result $I_n$. These values have been obtained with different sequences of $N$ random numbers. According to the central limit theorem, these values whould be normally dstributed around a mean $\langle I \rangle$. Suppouse that we have a set of $M$ of such measurements ${I_n}$. A convenient measure of the differences of these measurements is the “standard deviation of the means” $\sigma_M$:

$$\sigma_M ^2=\langle I^2 \rangle - \langle I \rangle ^2,$$

where

$$\langle I \rangle = \frac{1}{M} \sum_{n=1}^M I_n$$

and

$$\langle I^2 \rangle = \frac{1}{M} \sum_{n=1}^{M} I_n^2.$$

It can be proven that

$$\sigma_M \approx \sigma/\sqrt{N}.$$

This relation becomes exact in the limit of a very large number of measurements. Note that this expression implies that the error decreases with the square root of the number of trials, meaning that if we want to reduce the error by a factor 10, we need 100 times more points for the average. Now let's get our hands dirty on programming solving some practical exercises

Problem 1) One dimensional integration Example

Write a program that implements the “hit and miss” Monte Carlo integration algorithm. Find the estimate $I(N)$ for the integral of

$$f(x)=4\sqrt{1-x^2}$$

as a function of $N$, in the interval $(0,1)$. Choose $H=1$, and sample only the $x$-dependent part $\sqrt{1-x^2}$, and multiply the result by 4. Calculate the difference between $I(N)$ and the exact result $\pi$. This difference is a measure of the error associated with the Monte Carlo estimate. Make a log-log plot of the error as a function of $N$. What is the approximate functional deppendece of the error on $N$ for large $N$?

Estimate the integral of $f(x)$ using the simple Monte Carlo integration by averaging over $N$ points, using the monte carlo integral, and compute the error as a function of $N$, for $N$ upt to 10,000. Determine the approximate functional deppendence of the error on $N$ for large $N$. How many trials are necessary to determine $I_N$ to two decimal places?

Perform 10 measurements $I_n(N)$, with $N=10,000$ using different random sequences. Show in a table the values of $I_n$ and $\sigma$ according to the part [1] and [2]. Use [2] to estimate the standard deviation of the means, and compare to the values obtained from [2] using the 100,000 values.

To verify that your result for the error is independent of the number of sets you used to divide your data, repeat the previous item grouping your results in 20 groups of 5,000 points each.

First, let's plot the function that we intend to integrate within the desired range

import numpy as np
from matplotlib import plt

x = np.arange(0,1,0.02)
plt.plot(x, 4*np.sqrt(1-x**2))

[<matplotlib.lines.Line2D at 0x1d53f9c4bb0>]

ngroups = 16

I = np.zeros(ngroups)
N = np.zeros(ngroups)
E = np.zeros(ngroups)

n0 = 100
for i in range(ngroups):

    N[i] = n0
    x = np.random.random(n0)
    y = np.random.random(n0)
    I[i] = 0.
    Nin = 0
    for j in range(n0):
        if(y[j] < np.sqrt(1-x[j]**2)):
            Nin += 1
            
    I[i] = 4.*float(Nin)/float(n0)
    E[i] = abs(I[i]-np.pi)
    print (n0,Nin,I[i],E[i])
    n0 *= 2
    
            
plt.plot(N,E,ls='-',c='red',lw=3);
plt.plot(N,0.8/np.sqrt(N),ls='-',c='blue',lw=3);

plt.xscale('log')
plt.yscale('log')

100 70 2.8 0.3415926535897933
200 151 3.02 0.1215926535897931
400 303 3.03 0.11159265358979331
800 612 3.06 0.08159265358979306
1600 1263 3.1575 0.01590734641020708
3200 2516 3.145 0.0034073464102069018
6400 5059 3.161875 0.020282346410207097
12800 10000 3.125 0.016592653589793116
25600 20086 3.1384375 0.003155153589792903
51200 40313 3.149453125 0.007860471410206848
102400 80431 3.1418359375 0.00024328391020667084
204800 160752 3.1396875 0.0019051535897931515
409600 321978 3.14431640625 0.0027237526602070794
819200 643796 3.14353515625 0.0019425026602069018
1638400 1286953 3.14197509765625 0.0003824440664570439
3276800 2574792 3.143056640625 0.0014639870352066708

ngroups = 16

I = np.zeros(ngroups)
N = np.zeros(ngroups)
E = np.zeros(ngroups)

n0 = 100
for i in range(ngroups):

    N[i] = n0
    r = np.random.random(n0)
    I[i] = 0.
    for j in range(n0):
        x = r[j]
        I[i] += np.sqrt(1-x**2)
        
    I[i] *= 4./float(n0)
    E[i] = abs(I[i]-np.pi)
    print (n0,I[i],E[i])
    n0 *= 2
    
            
plt.plot(N,E,ls='-',c='red',lw=3);
plt.plot(N,0.8/np.sqrt(N),ls='-',c='blue',lw=3);
plt.xscale('log')
plt.yscale('log')

100 3.1724001248834464 0.030807471293653244
200 3.168516131243356 0.026923477653562955
400 3.111251561615521 0.03034109197427215
800 3.08945892899212 0.05213372459767296
1600 3.1180560913038424 0.02353656228595069
3200 3.1044286873856666 0.03716396620412654
6400 3.1608267065518723 0.01923405296207914
12800 3.1336578702440137 0.007934783345779461
25600 3.1407287788676905 0.0008638747221025866
51200 3.14648152814915 0.004888874559356715
102400 3.1444410903347695 0.002848436744976368
204800 3.1403457052726806 0.001246948317112473
409600 3.139915763763278 0.0016768898265149268
819200 3.141238560765198 0.000354092824595309
1638400 3.1418587364732993 0.00026608288350615794
3276800 3.1412164610802393 0.00037619250955378547

n0 = 100000
I = np.zeros(n0)
r = np.random.random(n0)
for j in range(n0):
    x = r[j]
    I[j] = 4*np.sqrt(1-x**2)

def group_measurements(ngroups):
    global I,n0
    
    nmeasurements = n0//ngroups
    for n in range(ngroups):
        Ig = 0.
        Ig2 = 0.
        for i in range(n*nmeasurements,(n+1)*nmeasurements):
            Ig += I[i]
            Ig2 += I[i]**2
        Ig /= nmeasurements
        Ig2 /= nmeasurements
        sigma = Ig2-Ig**2
        print(Ig,Ig2,sigma)
        
group_measurements(10)
print("=============================")
group_measurements(20)
print("=============================")
group_measurements(1)

3.1313448205025898 10.59825073971912 0.7929303548307232
3.1619765084335554 10.769616710167591 0.7715212702819336
3.1535004208844364 10.72694473428361 0.7823798297652917
3.142493335823633 10.670019431900931 0.7947550662049867
3.14149166589076 10.668642375725343 0.799672488864239
3.131952649800829 10.609457425437807 0.8003300248433707
3.1349410431354783 10.624715929291062 0.7968605853557005
3.1325774286322368 10.598110807860518 0.7850694614843619
3.1390244759038937 10.654422840599668 0.8009481802759542
3.1391661923174974 10.659186288571544 0.8048219055824095
=============================
3.1466193077354836 10.677482515813807 0.7762694480000736
3.11607033326968 10.519018963624434 0.8091246417410201
3.1658350349873574 10.791312321099795 0.7688008523463932
3.1581179818797716 10.747921099235382 0.7742119117630217
3.1566808982856824 10.74310233482624 0.7784680412245368
3.1503199434831815 10.71078713374102 0.7862713874331444
3.1470161306155577 10.684984309996857 0.7812737836423391
3.137970541031707 10.655054553804922 0.8081954374220963
3.143228433491774 10.658399646853168 0.7785146617420153
3.139754898289763 10.67888510459755 0.82082428326299
3.1335092111345135 10.61985031889182 0.8009703426269787
3.130396088467137 10.599064531983764 0.7996848612934127
3.1408077158254057 10.662913007584578 0.7982398997961759
3.129074370445559 10.58651885099758 0.7954124352183101
3.1377098328685413 10.614111858387012 0.7688888631070832
3.127445024395969 10.58210975733409 0.8011973767149865
3.142719722243314 10.682282332121895 0.8055950795448013
3.135329229564469 10.626563349077466 0.7962739713161415
3.13890901353122 10.65594444098498 0.8031946457574435
3.139423371103768 10.662428136158134 0.8064490331255865
=============================
3.140846854132515 10.65793672835549 0.7930177672413734

Variance reduction

If the function being integrated does not fluctuate too much in the interval of integration, and does not differ much from the average value, then the standard Monte Carlo mean-value method should work well with a reasonable number of points. Otherwise, we will find that the variance is very large, meaning that some points will make small contributions, while others will make large contributions to the integral. If this is the case, the algorithm will be very inefficient. The method can be improved by splitting the function $f(x)$ in two $f(x)=f_1(x)+f_2(x)$, such that the integral of $f_1(x)$ is known, and $f_2(x)$ as a small variance. The “variance reduction” technique, consists then in evaluating the integral of $f_2(x)$ to obtain:

$$\int _a^b{f(x)dx}=\int _a^b {f_1(x)dx} + \int _a^b{f_2(x)dx} = \int _a^b{f_1(x)dx}+J.$$

Importance Sampling

Imagine that we want to sample the function $f(x)=e^{-x^2}$ in the interval $[0,1]$. It is evident that most of our points will fall in the region where the value of $f(x)$ is very small, and therefore we will need a large number of values to achieve a decent accuracy. A way to improve the measurement by reducing the variance is obtained by “importance sampling”. As the name says, the idea is to sample the regions with larger contributions to the integral. For this goal, we introduce a probability distribution $P(x)$ normalized in the interval of integration

$$\int _a^b{P(x)dx} = 1.$$

Then, we can rewrite the integral of $f(x)$ as

$$I=\int _a^b{\frac{f(x)}{P(x)}P(x)dx}$$

We can evaluate this integral, by sampling according to the probability distribution $P(x)$ and evaluating the sum

$$I(N)=\frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{P(x_i)}.$$

Note that for the uniform case $P(x)=1/(b-a)$, the expression reduces to the simple Monte Carlo integral.

We are free to choose $P(x)$ now. We wish to do it in a way to reduce and minimize the variance of the integrand $f(x)/P(x)$. The way to to this is picking a $P(x)$ that mimics $f(x)$ where $f(x)$ is large. if we are able to determine an apropiate $P(x)$, the integrand will be slowly varying, and hence the variance will be reduced. Another consideration is that the generation of points according to the distribution $P(x)$ should be a simple task. As an example, let us consider again the integral

$$I=\int _0^1 {e^{-x^2}dx}.$$

A reasonable choice for a weigh function is $P(x)=Ae^{-x}$, where $A$ is a normalization constant.

Notice that for $P(x)=f(x)$ the variance is zero! This is known as the zero variance property. There is a catch, though: The probability function $P(x)$ needs to be normalized, implying that in reality, $P(x)=f(x)/\int f(x)dx$, which assumes that we know in advance precisely the integral that we are trying to calculate!

What you think of trying our new infos in another round of programming now?

Problem 2) Importance sampling

Choose the weight function $P(x)=e^{-x}$ and evaluate the integral:

$$\int _0^{\infty} {x^{3/2}e^{-x}dx}.$$
Choose $P(x)=e^{-ax}$ and estimate the integral

$$\int _0^{\pi} \frac{dx}{x^2+\cos ^2{x}}.$$

Determine the value of $a$ that minimizes the variance of the integral.

plt.xlim(0,10)
x = np.arange(0,10,0.1)
plt.plot(x,np.exp(-x), label = '$e^{-x}$')
plt.plot(x,np.exp(-x**2), label = '$e^{-x^2}$')
plt.plot(x,x**1.5*np.exp(-x), label = '$x^{3/2}e^{-x}$')
plt.legend()
plt.show()

def trapezoids(func, xmin, xmax, nmax):
    Isim = func(xmin)+func(xmax)
    h = (xmax-xmin)/nmax
    for i in range(1,nmax):
        x = xmin+i*h
        Isim += 2*func(x)

    Isim *= h/2
    return Isim

def f(x):
    return x**1.5*np.exp(-x)
    
    
print("Trapezoids: ", trapezoids(f, 0., 20., 100000))

# Simple Monte Carlo integration
n0 = 1000000
r = np.random.random(n0)

Itot = np.sum(r**1.5*np.exp(-r))
print("Simple Monte Carlo: ", Itot/n0)

x = -np.log(r)
Itot = np.sum(x**1.5)
print("Importance Sampling: ", Itot/n0)

Trapezoids:  1.3293401896452883
Simple Monte Carlo:  0.20058927608429167
Importance Sampling:  1.3273902308022574

plt.xlim(0,np.pi)
x = np.arange(0,np.pi,0.05)
plt.plot(x,1./(x**2+np.cos(x)**2), label = '$1/x^2 + \cos^2{x}$')
plt.plot(x,np.exp(-x), label = '$e^{-x}$')
plt.plot(x,np.exp(-2*x), label = '$e^{-2x}$')
plt.plot(x,np.exp(-0.2*x), label = '$e^{-0.2x}$')
plt.legend()
plt.show()

def g(x):
    return 1./(x**2+np.cos(x)**2)

print("Trapezoids: ", trapezoids(g, 0., np.pi, 1000000))

# Simple Monte Carlo integration

n0 = 1000000
a = np.arange(0.1,2.1,0.1)
I = np.arange(0.1,2.1,0.1)

r = np.random.random(n0)

I0 = np.sum(1./((r*np.pi)**2+np.cos(r*np.pi)**2))
print("Simple Monte Carlo: ", I0/n0*np.pi)

# Importance Sampling

print("Importance Sampling:")
x = -np.log(r)
              
i = 0
for ai in a:
    norm = (1.-np.exp(-ai*np.pi))/ai
    x1 = norm*x/ai
    Itot = 0.
    Nin = 0
    I2 = 0.
    for xi in x1:
        if(xi <= np.pi):
            Nin += 1
            Itot += g(xi)*np.exp(xi*ai)
            I2 += (g(xi)*np.exp(xi*ai))**2

    Itot *= norm
    I2 *= norm

    I[i] = Itot/Nin
    i += 1
    print(ai,Itot/Nin,np.sqrt(abs(Itot**2/Nin**2-I2/Nin))/np.sqrt(Nin))
    
plt.plot(a,I,ls='-',marker='o',c='red',lw=3)

Trapezoids:  1.5811879708476726
Simple Monte Carlo:  1.579710676531662
Importance Sampling:
0.1 1.5315541561682537 0.0032267199441707015
0.2 1.5004991240375105 0.0020593988666494107
0.30000000000000004 1.4918708893539523 0.0015520825736049495
0.4 1.4974293935141976 0.00125424501548706
0.5 1.51496200852535 0.0010523005767220485
0.6 1.5380498913870533 0.0008894526668870879
0.7000000000000001 1.561821163512152 0.0007319817735131842
0.8 1.579220539432681 0.0005396007649931935
0.9 1.584556864701477 0.00018090656647348993
1.0 1.5736792605778651 0.0004973207034846892
1.1 1.5444759312964988 0.0007365962209968338
1.2000000000000002 1.4975510936924072 0.0009140603566007534
1.3000000000000003 1.4360983655642734 0.0010510416138032712
1.4000000000000001 1.3635286161994897 0.0011520750349300344
1.5000000000000002 1.2844876468296897 0.0012203366102460107
1.6 1.203480032203493 0.0012613354805940387
1.7000000000000002 1.1234360090045563 0.0012779897778827463
1.8000000000000003 1.0464647605384148 0.0012740264350721165
1.9000000000000001 0.9741307299950658 0.0012550565343398179
2.0 0.9071620023317593 0.0012255386772457432

[<matplotlib.lines.Line2D at 0x1d540bd20d0>]

Bonus code: The Metropolis algorithm

Use the Metropolis algorithm to sample points according to a ditribution and estimate the integral

$$\int _0^4 {x^2e^{-x}dx},$$

with $P(x)=e^{-x}$ for $0 \leq x \leq 4$. Plot the number of times the walker is at points $x_0$, $x_1$, $x_2$, ... Is the integrand sampled uniformly? If not, what is the approximate region of $x$ where the integrand is sampled more often?

delta = 2
xmin = 0.
xmax = 4.

def f(x):
    return x**2*np.exp(-x)

def P(x):
    global xmin, xmax
    if(x < xmin or x > xmax):
        return 0.
    return np.exp(-x)


def metropolis(xold):
    global delta
    xtrial = np.random.random()    
    xtrial = xold+(2*xtrial-1)*delta
    weight = P(xtrial)/P(xold)
    xnew = xold
    if(weight >= 1): #Accept
        xnew = xtrial
    elif(weight != 0):
        r = np.random.random()
        if(r <= weight): #Accept
            xnew = xtrial
    return xnew


xwalker = (xmax+xmin)/2.
for i in range(100000):
    xwalker = metropolis(xwalker)

I0 = 0.
N = 300000

x = np.zeros(N)

x[0] = xwalker
for i in range(1,N):
    for j in range(20):
        xwalker = metropolis(xwalker)
    x[i] = xwalker
    I0 += x[i]**2

binwidth=0.1
plt.hist(x,bins=np.arange(xmin-1, xmax+1, 0.1))

print("Trapezoids: ", trapezoids(f,xmin,xmax,100000))
print("Metropolis: ", I0*(1.-np.exp(-4.))/N)

Trapezoids:  1.5237933888733828
Metropolis:  1.5326989079235165

plt.hist(x**2,bins=np.arange(xmin**2-1, xmax**2+1, 0.1))

(array([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        8.2508e+04, 2.7594e+04, 1.8523e+04, 1.4162e+04, 1.1593e+04,
        9.8220e+03, 8.6260e+03, 7.4960e+03, 6.7350e+03, 5.8230e+03,
        5.3460e+03, 4.8770e+03, 4.3970e+03, 4.0760e+03, 3.7690e+03,
        3.6100e+03, 3.2280e+03, 3.1570e+03, 2.9530e+03, 2.7080e+03,
        2.5530e+03, 2.4200e+03, 2.2380e+03, 2.1060e+03, 1.9560e+03,
        1.9580e+03, 1.8680e+03, 1.7750e+03, 1.6260e+03, 1.5860e+03,
        1.5600e+03, 1.4740e+03, 1.4490e+03, 1.3570e+03, 1.2500e+03,
        1.2500e+03, 1.1620e+03, 1.1160e+03, 1.0500e+03, 1.1000e+03,
        1.0220e+03, 9.8200e+02, 9.2600e+02, 8.6800e+02, 8.9100e+02,
        8.6900e+02, 8.1700e+02, 7.9700e+02, 7.8900e+02, 7.1500e+02,
        7.0600e+02, 6.9500e+02, 6.3600e+02, 6.7100e+02, 6.5800e+02,
        6.4800e+02, 6.0400e+02, 6.3300e+02, 5.4200e+02, 5.2700e+02,
        5.6400e+02, 5.1700e+02, 5.2300e+02, 4.9100e+02, 4.8600e+02,
        4.8600e+02, 4.5200e+02, 4.5700e+02, 4.3300e+02, 4.1400e+02,
        4.2900e+02, 4.0900e+02, 4.0800e+02, 3.6600e+02, 3.7800e+02,
        3.1600e+02, 3.6800e+02, 3.4300e+02, 3.4100e+02, 3.5000e+02,
        3.5200e+02, 2.8000e+02, 2.9300e+02, 3.0200e+02, 3.1400e+02,
        2.7800e+02, 2.6500e+02, 2.6800e+02, 2.6500e+02, 2.6900e+02,
        2.8300e+02, 2.6800e+02, 2.5400e+02, 2.1800e+02, 2.3000e+02,
        2.5400e+02, 2.1200e+02, 2.2400e+02, 2.0800e+02, 2.1500e+02,
        2.0600e+02, 2.0100e+02, 2.0500e+02, 1.9800e+02, 1.6000e+02,
        1.8700e+02, 1.8800e+02, 1.9100e+02, 1.4100e+02, 2.0000e+02,
        1.7600e+02, 1.6200e+02, 1.7100e+02, 1.4200e+02, 1.4600e+02,
        1.7300e+02, 1.2800e+02, 1.4800e+02, 1.5700e+02, 1.5600e+02,
        1.4200e+02, 1.3100e+02, 1.3900e+02, 1.3200e+02, 1.3800e+02,
        1.0600e+02, 1.2900e+02, 1.2500e+02, 1.2600e+02, 9.7000e+01,
        1.2800e+02, 1.1500e+02, 1.0700e+02, 8.4000e+01, 9.4000e+01,
        1.0700e+02, 1.1500e+02, 1.0200e+02, 9.8000e+01, 9.6000e+01,
        8.7000e+01, 1.1500e+02, 8.3000e+01, 9.5000e+01, 8.4000e+01,
        9.9000e+01, 8.7000e+01, 7.9000e+01, 8.8000e+01, 9.0000e+01,
        8.9000e+01, 7.7000e+01, 7.4000e+01, 6.1000e+01, 8.1000e+01,
        6.6000e+01, 6.9000e+01, 6.2000e+01, 6.8000e+01, 6.3000e+01,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]),
 array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
        -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
        -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
         2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
         6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01,
         1.00000000e+00,  1.10000000e+00,  1.20000000e+00,  1.30000000e+00,
         1.40000000e+00,  1.50000000e+00,  1.60000000e+00,  1.70000000e+00,
         1.80000000e+00,  1.90000000e+00,  2.00000000e+00,  2.10000000e+00,
         2.20000000e+00,  2.30000000e+00,  2.40000000e+00,  2.50000000e+00,
         2.60000000e+00,  2.70000000e+00,  2.80000000e+00,  2.90000000e+00,
         3.00000000e+00,  3.10000000e+00,  3.20000000e+00,  3.30000000e+00,
         3.40000000e+00,  3.50000000e+00,  3.60000000e+00,  3.70000000e+00,
         3.80000000e+00,  3.90000000e+00,  4.00000000e+00,  4.10000000e+00,
         4.20000000e+00,  4.30000000e+00,  4.40000000e+00,  4.50000000e+00,
         4.60000000e+00,  4.70000000e+00,  4.80000000e+00,  4.90000000e+00,
         5.00000000e+00,  5.10000000e+00,  5.20000000e+00,  5.30000000e+00,
         5.40000000e+00,  5.50000000e+00,  5.60000000e+00,  5.70000000e+00,
         5.80000000e+00,  5.90000000e+00,  6.00000000e+00,  6.10000000e+00,
         6.20000000e+00,  6.30000000e+00,  6.40000000e+00,  6.50000000e+00,
         6.60000000e+00,  6.70000000e+00,  6.80000000e+00,  6.90000000e+00,
         7.00000000e+00,  7.10000000e+00,  7.20000000e+00,  7.30000000e+00,
         7.40000000e+00,  7.50000000e+00,  7.60000000e+00,  7.70000000e+00,
         7.80000000e+00,  7.90000000e+00,  8.00000000e+00,  8.10000000e+00,
         8.20000000e+00,  8.30000000e+00,  8.40000000e+00,  8.50000000e+00,
         8.60000000e+00,  8.70000000e+00,  8.80000000e+00,  8.90000000e+00,
         9.00000000e+00,  9.10000000e+00,  9.20000000e+00,  9.30000000e+00,
         9.40000000e+00,  9.50000000e+00,  9.60000000e+00,  9.70000000e+00,
         9.80000000e+00,  9.90000000e+00,  1.00000000e+01,  1.01000000e+01,
         1.02000000e+01,  1.03000000e+01,  1.04000000e+01,  1.05000000e+01,
         1.06000000e+01,  1.07000000e+01,  1.08000000e+01,  1.09000000e+01,
         1.10000000e+01,  1.11000000e+01,  1.12000000e+01,  1.13000000e+01,
         1.14000000e+01,  1.15000000e+01,  1.16000000e+01,  1.17000000e+01,
         1.18000000e+01,  1.19000000e+01,  1.20000000e+01,  1.21000000e+01,
         1.22000000e+01,  1.23000000e+01,  1.24000000e+01,  1.25000000e+01,
         1.26000000e+01,  1.27000000e+01,  1.28000000e+01,  1.29000000e+01,
         1.30000000e+01,  1.31000000e+01,  1.32000000e+01,  1.33000000e+01,
         1.34000000e+01,  1.35000000e+01,  1.36000000e+01,  1.37000000e+01,
         1.38000000e+01,  1.39000000e+01,  1.40000000e+01,  1.41000000e+01,
         1.42000000e+01,  1.43000000e+01,  1.44000000e+01,  1.45000000e+01,
         1.46000000e+01,  1.47000000e+01,  1.48000000e+01,  1.49000000e+01,
         1.50000000e+01,  1.51000000e+01,  1.52000000e+01,  1.53000000e+01,
         1.54000000e+01,  1.55000000e+01,  1.56000000e+01,  1.57000000e+01,
         1.58000000e+01,  1.59000000e+01,  1.60000000e+01,  1.61000000e+01,
         1.62000000e+01,  1.63000000e+01,  1.64000000e+01,  1.65000000e+01,
         1.66000000e+01,  1.67000000e+01,  1.68000000e+01,  1.69000000e+01]),
 <BarContainer object of 179 artists>)