You are here: Home Phonon computations on grid A first example
Personal tools
Document Actions

A first example

by Riccardo Di Meo last modified 2008-08-25 15:49

Parallelization of the example06 in the quantum-espresso package which deals with phonon computation. The example may be used for more productive purposes.

Phononic computation on grid.

Here we present a simple port, based on the example06 of the package quantum espresso, in which the dynamic matrix is computed line by line on different Worker Nodes (abbr. WNs: the computers which actually run the jobs on the grid) in parallel.

The entire chain of computations, for the original example, takes in the order of minutes and, in such small cases, parallelizing is of course a waste of time; the approach can be however used for more complex systems even by simply adapting the example06 to bigger systems where the grid execution can be made of good use.

Though just an example, this simple implementation gives also all the clues necessary to port a large number of similar application to the grid and work is ongoing to implement scientifically significant phonon computations on the grid.

We will not dwell into the details of quantum espresso package, nor we will explain the scientific significance of the computation: for more details you can check the quantum espresso page.

About the example.

Example06 is a chain of calculations which, at the end, produces the dynamic matrix for a AlAs system: first an auto-consistent computation with pw.x is performed, then the entire dynamic matrix is computed at once with ph.x.

In order to split the calculation of the dynamic matrix into lines, some overhead needs to be added: another computations with pw.x needs to be performed for each q-point (adding a &phonon section and specifying calculation='phonon') before starting ph.x on it: this will lower the gain of the parallelization, but in the assumption that the number of q points is large enough (and that the ph.x computation is way longer than the pw.x one) the gain should well outweight the added computing time.

About the executables.

Both pw.x and ph.x behaves similarly from the informatic point of view: they are executed passing a configuration file in standard input, which specifies the input parameters and the location of potentials and other prerequisites, and retuns some logs in the standard output as well as writing their results on the disk in set direcories.

Both the executables can be compiled with MPI support (which could be also used on the grid with the reserve SMP nodes), though we choose to use the serial version of the executables, compiled in static form to overcome possible compatibility issues with the libraries on the grid.

Outline of the porting.

We decided to ignore the first auto-consistency computation, which is fast and performed only once, to concentrate on the second part of the example (the phononic computation), thus assuming that the user willing compute the dynamic matrix on the grid already obtained the output of pw.x, no matter the fact that getting it on the grid too would have been completely in our reach. This, in order to make the execution more transparent from the user point of view.

In order to start the execution of our system the user requires:

  1. The ph.x and pw.x binaries, compiled to suit the grid execution (in static form), uploaded on some Storage Element (abbr. SE: hosts on the grid whose only purpose is to store data)
  2. The output of the auto-consistent computation from pw.x (in a directory named "data", saved in a tar.bz2 archive) uploaded on a SE
  3. A mesh_k file generated with the quantum espresso kpoints utility, containing the q-points where the phonons need to be computed.
  4. A template of the .in file for the phonon calculation of pw.x, which is derived from the autoconsistent one (where calculation="phonon" and a &phonon section is added, with xqq(1,2,3) set to the strings "XXX","YYY" and "ZZZ" instead of the coordinates of a q-point).

     

  5. A template for the .in file for the ph.x computation with ldisp set to .false. but without the q-point specified at the end of the &inputph section (since it will be added automatically when required).

Also, all .in files should have the outdir parameter set to "data" (other values could be used if you know what you are doing) for the programs to find the output of the auto-consisten computation as well as the potentials.

The porting is constituted by two programs: a server and a client using the XMLRPC protocol: the first residing on a resolved host, usually the User Interface (abbr. UI: the machine where grid user submit their jobs to the grid. It is always provided with both inbound and outboound connectivity) and the seconds running on the WNs.

Both programs can be found in a tar.bz2 pacakge here.

The user starts the server passing it the right parameters (location of the binaries and data files on the grid as well as the location of the mesh and .in files locally, among others) and then submit the clients to the grid in large number.

As soon as the clients will reach the WNs, they contact back the server, requesting the location of the data and binaries (which they will then download from the SE); after this preliminary step, they will request, until completition of the dynamic matrix, a different q-point, compute it, and return the output (in the form of a .dyn file) to the server.

As soon as the whole dynamic matrix will be computed, the server will shut down and all client connected (and the upcoming one) will kill themselves, releasing all reasources on the grid. At that point, the user will find in the server's directory, a list of .dyn files: one for each line of the matrix.

Preparation of the input data.

As already mentioned, the user is expected to execute the auto-consistency part of the computation on his/her own, however, it's better to extract the data from the run_example script than actually execute it... Doing so (and replacing the variables appropriately) should result in 2 scripts: alas.scf.in and alas.ph.in.

 

alas.scf.in should result in something like:

 

 &control
    calculation='scf'
    restart_mode='from_scratch',
    tstress = .true.
    tprnfor = .true.
    prefix='alas',
    pseudo_dir = '/home/user/espresso/pseudo/',
    outdir='/home/user/tmp/'
 /
 &system
    ibrav=  2, celldm(1) =10.50, nat=  2, ntyp= 2,
    ecutwfc =16.0
 /
 &electrons
    conv_thr =  1.0d-8
    mixing_beta = 0.7
 /
ATOMIC_SPECIES
 Al  26.98  Al.vbc.UPF
 As  74.92  As.gon.UPF
ATOMIC_POSITIONS
 Al 0.00 0.00 0.00
 As 0.25 0.25 0.25
K_POINTS
 2
 0.25 0.25 0.25 1.0
 0.25 0.25 0.75 3.0

particularly important is the (infamous) choice of the user's tmp directory by the script since it's a static path (and since the script completely erases it without even prompting the user...): we can replace this choice by creating in the example_06 directory a "data" dir and replacing the line:

    outdir='/home/user/tmp/'

with:

    outdir='data'

which will instruct pw.x to use the "data" dir in the example06 directory.

At this point, running the pw.x executable like this:

# ./pw.x <alas.scf.in >alas.scf.out

should make appear the following files:

data/alas.save
data/alas.save/As.gon.UPF
data/alas.save/charge-density.dat
data/alas.save/K00001
data/alas.save/K00001/eigenval.xml
data/alas.save/data-file.xml
data/alas.save/Al.vbc.UPF
data/alas.save/K00002
data/alas.save/K00002/eigenval.xml
data/alas.wfc

which should be archived and sent, along with the executables, on a Storage element:

# tar cvjhf data_alas.tar.bz2 data
# globus-url-copy file:`pwd`/data_alas.tar.bz2 gsiftp://fictitious-se.grid.it/somewhere/data_alas.tar.bz2
# cd ~/espresso/bin
# globus-url-copy file:`pwd`/pw.x gsiftp://fictitious-se.grid.it/somewhere/pw.x
# globus-url-copy file:`pwd`/ph.x gsiftp://fictitious-se.grid.it/somewhere/ph.x

Now we need to use the program kpoint in the quantum-espresso package to generate a mesh file, which we will call k_mesh which will contain the list of q-points where the dynamic matrix will be computed.

Here is our example for it:

6
          1  -0.500000000  -1.000000000   0.000000000 1
          2   0.500000000   1.000000000   0.000000000 1
          3  -1.000000000   0.000000000  -0.500000000 1
          4   1.000000000   0.000000000   0.500000000 1
          5   0.000000000  -0.500000000  -1.000000000 1
          6   0.000000000   0.500000000   1.000000000 1

The last two steps are to create the 2 .in files we will use on the grid: the first one, is the file we will pass to pw.x and is a modification of the first input file:

 &control
    calculation='phonon'
    restart_mode='from_scratch',
    tstress = .true.
    tprnfor = .true.
    prefix='alas',
    pseudo_dir= 'data/alas.save/',
    outdir='data'
 /
 &system
    ibrav=  2, celldm(1) =10.50, nat=  2, ntyp= 2,
    ecutwfc =16.0
 /
 &electrons
    conv_thr =  1.0d-8
    mixing_beta = 0.7
 /
 &phonon
    xqq(1)=XXX,xqq(2)=YYY,xqq(3)=ZZZ
/
ATOMIC_SPECIES
 Al  26.98  Al.vbc.UPF
 As  74.92  As.gon.UPF
ATOMIC_POSITIONS
 Al 0.00 0.00 0.00
 As 0.25 0.25 0.25
K_POINTS
 2
 0.25 0.25 0.25 1.0
 0.25 0.25 0.75 3.0

The differences (highlighted) are:

  1. The type of calculation has been changed to phonon
  2. The pseudo potentials directory has been pointed to the local data/alas.save directory
  3. A &phonon section has been added
  4. Inside the new section the xqq(1..3) parameters have been added and their values set to XXX, YYY and ZZZ (uppercase!).

the file has been saved with the name alas.preph.in.

Last but not least we need to slightly modify the alas.ph.in file (which you should have already ripped from the run_example script) so it will look like this:

phonons of AlAs
 &inputph
  tr2_ph=1.0d-12,
  prefix='alas',
  nq1=4, nq2=4, nq3=4
  amass(1)=26.98,
  amass(2)=74.92,
  ldisp=.false.,
  outdir='data',
  fildyn='alas.dyn',
 /

obtained by removing the ldisp line from the original (or setting it to .false.) and replacing the original outdir with "data". Keep in mind that this is not a legal input for the ph.x binary since the q-point is missing (it will be added at the end of the file on each WN at running time however).

Running the computation on grid

Now that all the input are ready, all we need to start performing the computation is to start the server and submit the jobs to the grid.

The server accepts many arguments which can be shown using the -h option:

# ./phonon_server.py -h
./phonon_server.py [options]

Start the server which distributes the phonon computations.

The available options are:

        -p    port where the server will listen
        -P    a password to protect the server
        -m    file with the output of kpoints
        -b    gsi location of the dir. with the executables
              (pw.x and ph.x)
        -d    gsi location for the tarball with the data
        -i    location of the .in file for pw.x in the right
              format (calculation='phonon',
               xqq(1)=XXX,xqq(2)=YYY,xqq(3)=ZZZ)
        -I    location of the .in file for ph

        -h or -?  print this help message.

all of them, with the exception of -h and -?, are required.

If the files have the same names as in the previous section and are in the same directory of the server, the server should be started with a line like this:

# ./phonon_server.py -p 23017 -P foobar -m mesh_k  \
      -b gsiftp://fictitious-se.grid.it/somewhere  \
      -d gsiftp://fictitious-se.grid.it/somewhere  \
      -i alas.preph.in -I alas.ph.in

Where we choose port 23017 for the incoming connections and the password 'foobar' to protect our server (chose a better one, but don't use one of your precious/account passwords for it!!)

If the server exits immediately, then it's likely that you have selected a port whih cannot be assigned (e.g. a port in the range 1-1024, which can be used only by the administrator, or a port which is already used): we recommend you to pick a port randomly in the interval 23000-25000 (the globus port range, which should be open for both inbound and outbound connectivity on the UI and for outbound connectivity only on the Worker Nodes).

Another reason for the server to exit is that it didn't found the files you have specified in the command line arguments: the error message reported should be self-explanatory.

Otherwise the server should write something like this:

6 points in the requirements/mesh_k file

and hang there.

The last step before submission, is to modify the jdl provided in the tar to match the arguments passed to the server; specifically we have to instruct the client to connect to the correct host and port using the right password.

After our changes, the jdl should look like this:

Executable = "./phonon_client.py";
Arguments  = "-H my-ui.with.theserverrunning.org -p 23017 -P foobar";
InputSandbox ={"phonon_client.py","common.py"};

where bold face was used to highlight our modifications

Now we are ready to submit the jobs: as soon as a job will hit the grid, the WN will run our python script which will contact the server and request and executes some tasks, as long as computations will be available.

A nice side effect of this approach, is that we can submit jobs "blindly", in a "fire and forget" way: if the server is not available (because the machine has gone down due to a power failure or simply the server died since no more task where required), the client will simply exit without wasting CPU time.

After submission, if everything goes smoothly, and at least a client will start running on the grid, you should see messages like those ones:

Sending binaries...
Sending binaries...
Sending binaries...
Results for point 0 saved
Results for point 1 saved
Results for point 4 saved
Results for point 3 saved
Results for point 2 saved
Results for point 5 saved

Where each 'Sending binaries...' indicates that a new client has connected to the server, and 'Results for point x saved' that a point in the matrix has been computed.

Running in another environment

It should be clear by now, that the same scripts could be used to run the same simulation on any other system, as long as outbound connectivity to the server is provided.

To do so, just run the server somewhere as described above and, discarding the jdl, run the client on the hosts like:

# ./phonon_client.py -H portal.sissa.it -p 23000 -P foobar
Requesting an ID...
Requesting the prerequisites...
Requesting a job from the server...
phonons of AlAs
 &inputph
  tr2_ph=1.0d-12,
  prefix='alas',
  nq1=4, nq2=4, nq3=4
                    (...)

which will print a lot of junk, if everything goes fine.

This is useful both to perform real computations (e.g. if your desktop neighbor let you use his/her computer when he/she's away or you have a cluster for you to peruse) and to test the setup of your simulation (you can run locally a client and wait for 1 point to be computed to check that each step in the configuration was carried on correctly).

Related content
« June 2017 »
Su Mo Tu We Th Fr Sa
123
45678910
11121314151617
18192021222324
252627282930
 

Powered by Plone This site conforms to the following standards: