Monte Carlo of Lennard-Jones atoms¶

In this notebook you will use the Monte Carlo method to compute the properties of a system of Lennard-Jones atoms.

There is an accompanying notebook MC-integration.ipynb which discusses Monte Carlo sampling and Monte Carlo integration. The two notebooks are independent, so you can study them in either order, although the other one introduces more basic ideas, so we recommend you do that one first.

A third notebook, MC-Pressure.ipynb, focused on constant-pressure Monte Carlo simulation of Lennard-Jones atoms, will be the subject of a later workshop. It follows on from some of the material covered here.

Setup (optional)¶

The next cell needs to be run only if you use google colab. Commands may work in other environments too but are not tested.

[ ]:

! pip install h5py data_tutorials weas_widget ase
! apt install gfortran libhdf5-dev

now grab the data needed for tutorial

[ ]:

from data_tutorials.data import get_data
get_data(
    url="https://raw.githubusercontent.com/ccp5UK/summerschool/main/Day_4/MC_Tutorial/",
    filename=["hdf5_module.py","eos_lj.py","dat_to_ase.py"],
    folder=".",
)

# get the files for MC programme

get_data(
    url="https://raw.githubusercontent.com/ccp5UK/summerschool/main/Day_4/MC_Tutorial/",
    filename=["Makefile", "config_old.dat","config_io_module.f90","hdf5_module.f90","maths_module.f90","mc_module.f90","mc_nvt.f90","potential_module.f90"],
    folder=".",
)

Preliminaries¶

Start by importing some useful Python modules and functions.

[ ]:

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import trapezoid
from hdf5_module import read_file
from eos_lj import eos
from dat_to_ase import argon
from weas_widget import WeasWidget
plt.style.use(['seaborn-v0_8-talk','seaborn-v0_8-darkgrid','seaborn-v0_8-colorblind'])
plt.rc('image',cmap='viridis')
plt.rc('legend',frameon=True,framealpha=1.0)
plt.rc('hist',bins=100) # Default number of bins to use in histograms

Introduction¶

The Lennard-Jones potential is \begin{equation*} u_{\mathrm{LJ}}(r) = 4\varepsilon \left[ \left( \frac{\sigma}{r}\right)^{12}-\left(\frac{\sigma}{r}\right)^6 \right] \end{equation*} where \(r\) is the distance between the atoms, \(\varepsilon\) is an energy characterizing the strength of the interaction, and \(\sigma\) is a length scale that characterizes the size of the atoms. In the following, reduced units are adopted, so \(\varepsilon=1\) and \(\sigma=1\), and in addition Boltzmann’s constant is taken to be unity \(k_{\text{B}}=1\).

The system of interest here consists of atoms interacting through the cut-and-shifted Lennard-Jones potential defined by \begin{equation*} u(r) = \begin{cases} u_{\mathrm{LJ}}(r) - u_{\mathrm{LJ}}(r_{\text{cut}}) & r \leq r_{\text{cut}} \\ 0 & r> r_{\text{cut}} \end{cases} \end{equation*} where \(r_{\text{cut}}\) is the cutoff distance. The choice made here is \(r_{\text{cut}}= 2.5\sigma\). For this potential, the critical point is at \(T_{\text{c}}=1.0779\), \(\rho_{\text{c}}=0.3190\) (again in reduced units). This exercise concentrates on the supercritical state point, \(T=2.0\), \(\rho=0.5\).

Thermodynamic quantities, and the pair distribution function \(g(r)\), for this state point, were presented in lectures; in this exercise you will be comparing with those results. More generally, an accurate fitted equation of state for the fluid region has been developed by M Thol et al, Int J Thermophys, 36, 25 (2015). A Python program implementing their formulae is supplied in the file eos_lj.py. This can be run interactively from the command line, in which case it will ask for the values of \(T\) and \(\rho\), and will print out a range of thermodynamic data for that state point. Instead, this worksheet has imported a function eos from eos_lj, which takes \(T\) and \(\rho\) as arguments, and returns various thermodynamic quantities in a dictionary. The next cell prints out values for this state point, which should match the values given in the lecture reasonably well. We shall compare several of these fitted values with our simulations in this workshop.

[ ]:

eos_fit = eos(temperature=2.0,density=0.5)
for key, value in eos_fit.items():
    print(f'{key:20s} {value:10.4f}')

Monte Carlo program¶

The instruction in the following cell should build the program of interest to us here, mc_nvt.

[ ]:

!make mc_nvt

Now execute the mc_nvt program, with default parameters, by running the following cell. While it is running, read through the following description of some of the features of the program, looking at some of the program source files.

[ ]:

!echo '&nml  /' | ./mc_nvt

Take a look at the main program file mc_nvt.f90. This carries out a Monte Carlo simulation of \(N\) atoms, in a fixed volume \(V\), at specified temperature \(T\). Simulation parameters are provided through standard input in the form of a namelist, a feature of Fortran that allows default values to be specified within the program, while letting you change them, using keywords, if you wish. So, a run using default parameters is initiated with echo '&nml /' | ./mc_nvt, but you could choose a different maximum displacement by running with the command echo '&nml dr_max=0.3 /' | ./mc_nvt.

Have a look at the overall structure of the code in mc_nvt.f90: there is a loop over steps. Each step consists of an attempt to move \(N\) atoms; the appropriate routine is in the file mc_module.f90. In this case, the atoms are chosen randomly; you might like to consider whether this is the only valid approach. Also in that module is the routine for estimating the chemical potential by Widom test particle insertion.

The LJ potential details, including the value of \(r_{\text{cut}}\), are specified in the file potential_module.f90. The potential energy and virial functions are calculated here. You may also be interested in the routine which calculates the configurational temperature from the Laplacian and squared forces. These key properties, just mentioned, are stored at each step, and output at the end of the run to a file mc_nvt.hdf5, which you will read in shortly, for analysis. (We are using HDF5 format for this, but there will be no need to look closely at the details). Standard output is just used for the crucial information needed to confirm that the program is running: at (increasing) intervals, the step number, CPU time consumed so far, and the cumulative move acceptance ratio are printed, as you will see in the preceding cell. At regular intervals gap steps, the program stores all the atomic positions, and these are output to the mc_nvt.hdf5 file at the end. These will be used shortly to calculate the pair distribution function.

Think carefully about all the calculations done in the program, referring back to the lecture notes. If there is anything that seems unclear, feel free to ask! When you have finished examining the program source files, you can close them.

By now the run in the cell above should have finished; the statement Program ends will be printed, along with the CPU time taken by the run. Check this, and move on to the following cells. (Don’t start executing the program a second time!)

The program reads an input configuration, and writes an output configuration. This is handled by routines within config_io_module.f90, but there is no need to look at this file. An initial configuration of atoms was supplied in the file config_old.dat (and a backup copy is in config_old.bak, in case this gets overwritten at any stage, for example if you need to do equilibration followed by production runs). The format of the file is

n
xbox ybox zbox
x1   y1   z1
x2   y2   z2
x3   y3   z3
:    :    :
xn   yn   yn

where the first line gives the number of atoms, the second line gives the box dimensions (in this exercise the box is cubic) and the subsequent lines give the coordinates of each atom. All these are in LJ reduced units. An output file config.dat, in the same format, is written at the end. The following cell should allow you to visualize config_old.dat; feel free to look at config.dat as well.

[ ]:

atoms=argon('config_old.dat')
v=WeasWidget()
v.from_ase(atoms)
v

Reading the HDF5 file¶

We are using a Hierarchical Data Format (HDF5) file mc_nvt.hdf5 to store output from the simulation. HDF5 is a very flexible format for storing data, but we are only using some simple features: specifically mc_nvt.hdf5 is a completely flat file containing just two kinds of object: attributes and datasets. The attributes are used to store a few simulation parameters, such as the number of particles, box lengths, and temperature. Step-by-step values are stored in the datasets. These objects are key-value pairs, like Python dictionaries. The function read_file is provided to read these in. It prints a list of all the keys, returns the attributes as a dictionary of values, which we store in params, and returns the datasets as a dictionary of NumPy arrays, which we store in data.

[ ]:

params, data = read_file('mc_nvt.hdf5')

We store some of the parameters in named variables, (the number of atoms N, volume V, temperature T, and array of box lengths L).

[ ]:

print(params['Title'].astype(str))
print('Run steps',params['nstep'])
N = params['N']
V = params['V']
T = params['T']
L = params['L']
print(f'Number of atoms N = {N:10d}')
print(f'Volume          V = {V:10.4f}')
print(f'Temperature     T = {T:10.4f}')
print(f'Box lengths     L = {L[0]:10.4f}{L[1]:10.4f}{L[2]:10.4f}')

The dataset data['T'] contains the step-by-step values of the configurational temperature. It will be interesting to see if the average of this quantity agrees with the input temperature.

[ ]:

Tavg=data['T'].mean()
print(f'Simulation average T = {Tavg:10.4f}')
print(f'Specified value of T = {T:10.4f}')

Simulation Results¶

In the following we will look at some of the other quantities.

The activity is \(z=\exp(\mu/k_{\text{B}}T)\). However, the Widom test-particle insertion method gives us, after averaging, \(\exp(-\mu/k_{\text{B}}T)\), and this is what data['Z'] contains. So the simulation estimate of the activity \(z\) is the inverse of the average of data['Z']. Let’s see if it agrees with the value obtained from the fitted eos function. You can also compare with the value tabulated in the lecture notes for this state point.

[ ]:

zavg = 1/data['Z'].mean()
zeos = eos_fit['z']
print(f'Simulation estimate z = {zavg:10.4f}')
print(f'Fitted EOS value    z = {zeos:10.4f}')

Now: something for you to do! The average of the virial can be used to give the simulation pressure. Referring to the Statistical Mechanics lecture notes, here is the formula. \begin{equation*} P = \frac{ N k_{\text{B}} T}{V} + \frac{\langle W\rangle}{3V} \end{equation*} Calculate this in the cell below. Bear in mind that all the variables you need have already been read from the file, and that \(k_{\text{B}}=1\) in our reduced simulation units. Compare with the value quoted in the lecture notes for this state point, and with the value given by the approximate equation of state, which is already in the cell.

[ ]:

# Insert correct formula here
Pavg = 0.0
Peos = eos_fit['P']
print(f'Simulation average P = {Pavg:10.4f}')
print(f'Fitted EOS value   P = {Peos:10.4f}')

Hopefully the results are quite close. If we wanted, we could further analyze the step-by-step data for all these quantities to estimate the statistical error on the simulation averages, but this is not the topic of the current workshop.

Instead, we shall take a closer look at the potential energy \(U\) stored in data['U']. The next cell plots \(U\) as a function of step. You might like to plot a subset of the data, over fewer steps, to get an idea of how correlated successive values are.

[ ]:

fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel('step')
ax.set_ylabel(r'$U$')
ax.plot(data['U'])
plt.tight_layout()

The next cell plots a probability histogram \(\mathcal{P}(U)\). Take a close look at this, making sure that it looks sensible, especially compared with the mean value and standard deviation of the data. We also compare with the fitted EOS values of \(U\) and \(u=U/N\).

[ ]:

Uavg = data['U'].mean()
Ustd = data['U'].std()
uavg = Uavg/N
ueos = eos_fit['u']
print(f'Simulation average U = {Uavg:10.4f}')
print(f'Standard deviation U = {Ustd:10.4f}')
print(f'Simulation average u = {uavg:10.4f}')
print(f'Fitted EOS value   u = {ueos:10.4f}')
fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel(r'$U$')
ax.set_ylabel(r'$\mathcal{P}(U)$')
ax.hist(data['U'],density=True,label='Simulation')
ax.axvline(N*eos_fit['u'],c='C1',label='Fitted EOS')
ax.legend()
plt.tight_layout()

Heat Capacity¶

Now: something more for you to do!

It should be possible to use this to estimate the heat capacity at constant volume \(C_V\), or \(c_V=C_V/N\) per atom. Referring to the lecture notes, here is the formula: \begin{equation*} c_V/k_{\text{B}} = C_V/N k_{\text{B}} = \frac{3}{2} + \frac{\langle U^2\rangle - \langle U\rangle^2}{N(k_{\text{B}} T)^2} . \end{equation*} Do this calculation in the cell below, to give \(c_V\) (heat capacity per atom) as calculated in your simulation. Remember, that \(k_{\text{B}}=1\) in our reduced units, and that the other required values were already read in above. Compare this \(c_V\) with the value quoted in the lecture for this state point, and also the value returned by the fitted EOS, which appears in the cell below.

[ ]:

# Insert correct formula here
cavg = 0.0
ceos = eos_fit['c_V']
print(f'Simulation average c_V = {cavg:10.4f}')
print(f'Fitted EOS value   c_V = {ceos:10.4f}')

Pair Distribution Function¶

Don’t worry if you run out of time before tackling the remaining topics: the pair distribution function and ensemble reweighting. You can always return to this notebook in later workshops (you should not need to re-run the simulation, and after you import the necessary Python modules and functions at the top, you may skip to this point).

The next cell re-reads the mc_nvt.hdf5 file (in case you are returning here afresh).

[ ]:

params, data = read_file('mc_nvt.hdf5')

Once more, we assign names to a few of the important parameters. Having checked that the box is cubic, L is redefined to be a scalar rather than an array.

[ ]:

print(params['Title'].astype(str))
print('Run steps',params['nstep'])
N = params['N']
V = params['V']
T = params['T']
L = params['L']
print(f'Number of atoms N = {N:10d}')
print(f'Volume          V = {V:10.4f}')
print(f'Temperature     T = {T:10.4f}')
assert np.allclose ( L, L[0] ), print('Error: we are assuming a cubic box')
L = L[0]
print(f'Box length      L = {L:10.4f}')

The data['r'] dataset contains a set of configurations (atom positions). Interestingly, but not unexpectedly, the order of indices of the data['r'] array is reversed, compared with the Fortran order. Here, the step index comes first.

[ ]:

print(data['r'].shape) # Should be (nr,N,3)
nr = data['r'].shape[0]

We are going to calculate \(g(r)\) out to half the box length (it is a cubic box). This will involve accumulating a histogram of pair distances. For this simple example we’ll specify the number of bins, and this determines the bin width. We do the counting in a crude way, including both \(ji\) and \(ij\) for each pair. This is not necessarily the fastest approach, but the aim here is to show clearly what we are calculating. The loop over configurations should take a few seconds.

[ ]:

D = L/2         # Half the box length
b = 200         # Number of bins
h = np.zeros(b) # Histogram initialized to zero
for r in data['r']:
    d     = r[:,np.newaxis,:] - r[np.newaxis,:,:] # Set of all separation vectors (N,N,3)
    d     = np.fabs(d)                            # Absolute values of vector components
    d     = np.where(d<D,d,L-d)                   # Simple PBC in this case
    d     = np.sqrt(np.sum(d**2,axis=-1))         # Set of all separation distances (N,N)
    h1,rr = np.histogram(d,bins=b,range=(0.0,D))  # Separation histogram & bin edges
    h     = h + h1                                # Accumulate histogram
h[0] = 0      # Remove the counts arising from the diagonal of d
h    = h / nr # Normalise by number of configurations
h    = h / N  # Normalise by number of atoms

Now stop to consider what we have calculated.

For each configuration, h1 counts the number of pair separations lying in the range corresponding to each histogram bin. We have counted each distinct pair twice (the \(N\times N\) matrix d is symmetric). This is equivalent to considering each atom \(i\) in turn: each bin of h1 counts the number of neighbours \(j\) whose distances from \(i\) fall into that bin, and the results are summed over all \(i\). The double counting corresponds to \(i\) being a neighbour of \(j\) as well as \(j\) being a neighbour of \(i\). There are unwanted counts corresponding to the diagonal elements \(i=j\), but these can be removed when the loop is finished.

The h array simply sums these results for all nr stored configurations, and after normalizing by nr it contains the average number of pair separations lying in the range corresponding to each histogram bin. Again, think of this by considering each atom \(i\) in turn: each bin of h contains the average number of neighbours \(j\) whose distances from \(i\) fall into that bin, summed over all such atoms \(i\).

After further normalizing by N, each bin of h contains the average number of neighbours whose distance from any given atom lies in the range covered by that bin.

The rr array gives the bin edges, i.e. the separation values delimiting each bin. It has b+1 elements. We can use these values to compute the volume of the spherical shell corresponding to each bin. This in turn allows us to calculate what h would be in an ideal gas of the same density as our system. We proceed to do this, calling the result h_id: the ratio h/h_id is \(g(r)\). Before plotting, we use the rr array again to compute the mid-point (\(r\)-value) of each bin. Have a look at the results. Does the \(g(r)\) plot look sensible? More importantly, have you understood the way we calculate \(g(r)\)? Feel free to ask if anything is unclear!

[ ]:

V_shell = np.diff((4.0*np.pi/3.0)*rr**3)
rho     = N/V
h_id    = rho * V_shell
g       = h / h_id
r       = ( rr[1:] + rr[:-1] ) / 2 # Mid-points of all the bins
fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$g(r)$')
ax.plot(r,g)
ax.axhline(1,ls='dashed',c='C1')
plt.tight_layout()

Using the pair distribution function¶

This part is entirely optional, and can be skipped. An almost identical question appears in the MD tutorials: there is no need to answer both! In principle, the average potential energy may be calculated from \(g(r)\) using the formula \begin{equation*} u_{\text{avg}} = U/N = 2\pi\rho\int_0^{\infty} u(r) \, g(r) \, r^2 \, dr , \end{equation*} and a similar formula applies to the non-ideal contribution to the pressure. You can calculate this using a simple SciPy/NumPy numerical integration routine. Don’t forget that the interactions here are given by the cut-and-shifted LJ potential \(u(r)\); the next cell gives a suitable function for this. The integral ranges over \(0 \leq r \leq r_{\text{cut}}\) for this potential. How does the result compare with the average from the MC simulation (calculated in an earlier cell), and with the fitted EOS?

[ ]:

def u(r):
    """Lennard-Jones cut-and-shifted potential, r may be scalar or NumPy array."""
    rc  = 2.5 # Assumed cutoff distance
    rc2 = 1.0 / rc**2
    rc6 = rc2**3
    uc  = 4.0*(rc6-1.0)*rc6
    r2  = 1.0 / r**2
    r6  = r2**3
    ulj = 4.0*(r6-1.0)*r6
    return np.where ( r<rc, ulj-uc, 0.0 )

Use the above function, and the arrays g and r, in the next cell to calculate the desired integral by quadrature. For convenience, the trapezoid(f,r) function has been imported from the scipy.integrate sub-package. This uses the trapezoidal rule, where r is an array containing the sample points, and f is an array containing values of the integrand evaluated at those points.

Again, we can compare with the fitted EOS.

[ ]:

# Insert your code here
uavg = 0.0
eos_fit = eos(temperature=T,density=N/V)
ueos = eos_fit['u']
print(f'g(r) integral for u = {uavg:10.4f}')
print(f'Fitted EOS value  u = {ueos:10.4f}')

Ensemble Reweighting¶

You may prefer to skip this, and come back to it after the third MC lecture, where we go into the topic in more detail.

For convenience, the next cell re-reads the HDF5 file. Then the probability histogram for the potential energy, \(\mathcal{P}(U)\), is re-calculated.

[ ]:

params, data = read_file('mc_nvt.hdf5')

[ ]:

N = params['N']
V = params['V']
T = params['T']

[ ]:

P, U = np.histogram(data['U'],bins=100,density=True)
dU   = np.diff(U)       # Get differences in bin edges for later use
U    = (U[:-1]+U[1:])/2 # Convert bin edges into midpoint values

This is the distribution for the simulation run at \(T=2.0\). It is possible to use this data to estimate \(\mathcal{P}(U)\), and hence calculate \(\langle U\rangle\) and other functions of potential energy, at nearby temperatures.

The distribution function may be expressed as \(\mathcal{P}(U)\propto \Omega(U) \exp(-\beta U)\) where \(\Omega(U)\) is the density of states and \(\beta=1/k_{\text{B}}T\). From the same formula at a nearby temperature \(T_1\), it follows that the distribution \(\mathcal{P}_1(U)\) at this nearby temperature is \begin{equation*} \mathcal{P}_1(U) \propto \mathcal{P}(U) \times \exp\bigl[(\beta-\beta_1) U\bigr] \end{equation*} where \(\beta_1=1/k_BT_1\). The function \(\mathcal{P}_1(U)\) needs to be normalized, after calculating it this way, so that \(\int \mathcal{P}_1(U) \, dU = 1\). The following cell attempts to do this for \(T_1=1.8\), using the just-calculated \(\mathcal{P}(U)\) curve. It also compares the results with the fitted equation-of-state function at both temperatures, adding the expected values of \(\langle U\rangle\) as vertical dashed lines in the plot.

One subtlety, in general, is that the exponential function may produce very large or very small values, possibly leading to underflow and overflow issues. We should be dealing with float64 datasets, and hence the variables derived from them should also be float64. Provided \(\beta\) and \(\beta_1\) are not too different from one another, we expect this to be sufficient, but we print the normalization factors just to emphasize this danger.

[ ]:

beta  = 1/T
T1    = 1.8
beta1 = 1/T1

# Confirm that P(U) is already normalized
norm = np.sum(P*dU) # This is the integral of P
print(f'P(U)  norm = {norm:.5g}')

# Compute distribution at T1, and normalize
P1 = P * np.exp((beta-beta1)*U)
norm = np.sum(P1*dU) # This is the integral of P1
print(f'P1(U) norm = {norm:.5g}')
P1 = P1/norm

# Compute expected mean values from fitted EOS
Ueos  = N * eos(temperature=T, density=N/V)['u']
U1eos = N * eos(temperature=T1,density=N/V)['u']

# Plot distributions
fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel(r'$U$')
ax.set_ylabel(r'$\mathcal{P}(U)$')
line, = ax.plot(U,P,label='T={:5.2f}'.format(T))
ax.axvline(Ueos,ls='--',c=line.get_color())
line, = ax.plot(U,P1,label='T={:5.2f}'.format(T1))
ax.axvline(U1eos,ls='--',c=line.get_color())
ax.set_ylim(bottom=0)
ax.legend()
plt.tight_layout()

Now, experiment with \(T_1\). Try, say, \(T_1=2.2\), \(1.5\) or \(2.5\). What is the effect on \(\mathcal{P}_1(U)\) as \(T_1\) gets further from \(T\)?

You may also like to compare with the results of a fresh MC run at temperature \(T_1\) (taking care to equilibrate at the new temperature first).

Further work¶

There’s plenty of scope to experiment with the mc_nvt program. For example, you might tinker with the parameter dr_max, observing its effect on the move acceptance ratio. Consider the question: how should the “optimal” value of dr_max be determined? What do we want to optimize?

This concludes the notebook.

[ ]: