You are here: Home BEMuSE New version of BEMuSE: how to configure it with the fast_configure package
Personal tools
Document Actions

New version of BEMuSE: how to configure it with the fast_configure package

by Riccardo Di Meo last modified 2008-06-01 19:12

This is a VERY rough account about how to use the fast_configure interface to configure and submit a simulation. BETA.

all examples are for the VO euindia and a grid using the glite middleware.

What is "fast configure"?

"fast configure" it's the name of the interface which comes with the new version of BEMuSE and is therefore part of the BEMUSE package: it may be that future version of it will come with a more refined interface though.

For brevity purposes, the name "fast configure" will be used from now on to identify the BEMuSE package with said interface as well.

How is "fast configure" licensed?

BEMuSE has been created by Riccardo Di Meo for the EUIndiaGRID Project (see www.euindiagrid.ictp.it) at the International Centre of Theoretical Physics (ICTP - see wwww.ictp.it) under the supervision (and with the wisdom) of Stefano Cozzini (INFM - SISSA).

This version of BEMuSE is distributed under the Creative Common "Attribution-Noncommercial-No Derivative Works 2.5 Italy" License.

Keep in mind that BEMuSE is a front-end to the executables which actually perform the BIAS exchange meta-dynamics, which are not distributed with BEMuSE and are not covered by the same licence as well!

To obtain the executables required to perform BEM simulations, contact the creators of the algorithm: Alessandro Laio and Stefano Piana.

FIXME: pointers to Alessandro and Stefano P.

Many people contributed to various degrees to the creation and developing of this project: thanks to Alessandro Laio who provided the algorithm wich allowed this project to start, as well as invaluable assistance when it came to integrate it with BEMuSE, Fabio Pietrucci, whose collaboration and help has been critical to the success of this software and a special thank goes to Moreno Baricevic for his patience and expertise in the times of most dire need.

Where can i find the "fast configure" package?

It is freely available for download at the url:

http://www.ictp.it/~dimeo/BEMuSE_fast_configure.tar.bz2

Which steps do i need to perform to start a simulation?

In order to be able to take advantage of BEMuSE, you will need to configure a simulation with the set_up_sim.py script.

Though the number of options available makes BEMuSE an highly customizable system, set_up_sim.py greatly simplify the task of setting up a simulation, at the expense of limited flexibility (only a common setup will be supported).

A suggested approach, even for the more advanced users, is to use the script to set up a simulation, and then modify the configuration files (which are in the format of the .ini files of windows) to adjust it to their needs.

What is required to start a simulation?

You will need:

  • access to a grid: in our case we will discuss a glite based system, though the only requirement is of one with Worker Nodes with outbound connectivity and gsiftp transfer.
  • Python2.5 with bzip2 and gzip support compiled in.
  • access to a Storage Element where data can be downloaded and uploaded with the gsiftp protocol (other protocols are supported, though will only be briefly covered here).
  • an archive containing a python compiled in a way to be suitable for grid execution (in practice, you have to compile it on a UI...).
  • a statically compiled, bias-exchage meta-dynamics enabled binary of gromacs and a statically compiled tpbconv.
  • the file aminoacids.dat
  • a set archives containing:
    • a tpr file
    • a META_INP.0 file, with the parameters LABEL and RESTART specified
    • a file HILLS.0 consistent with the META_INP.0 definition, with at least one line (with the definition of the variables and label).

The python package, the gromacs binaries and aminoacids.hat should be put on the Storage element in a specified location before the simulation can start. See below for more about this.

The restarts packages must be put in a directory on the server's host, and they must follow a strict name convention: they must be gzip compressed tar packages, with names in the form restart_M.tar.gz, where M is an integer in the range [1,N], where N is the number of restarts required for the simulation.

IMPORTANT: the restart packages are read only the first time the server get started, subsequent restarts will make the server ignore the .tar.gz files in favor of it's own "data" directory (in this way a simulation can be restarted where it has been interrupted).

What's the logging service?

The logging service is a small server that receives the logs from the remote clients running on the grid (or in any host with outbound connectivity) and saves them on the local file-system.

The fast_configure package allows you to configure and start one automatically: since the bandwidth consumed is not very large and the output is very useful sometimes, it's warmly recommended to always turn this feature o, when requested.

The logs are saved in the fast_configure/simulations/<simname>/logging directory, each in it's own directory, in the form:

<remote address>/<local binded port>/<simname>_ProteinClient.log_<Unix time of creation>

e.g.:

gw-net.pd.infn.it/45670/sh3_ProteinClient.log-1211064572.0454929

which ensures that the path is unique.

Though no guarantees are provided that the server is 100% secure, it uses a long shared hash to authenticate incoming connections (through a challenge mechanism).

How can i turn the logging on and off?

As soon as the logging support is enabled, after the server for the simulation is started, a file start_logger.sh is created and executed in the logging directory of the simulation.

To stop the server, get it's pid from the file log_server.pid and kill the process (keep in mind that this will NOT stop the simulation from running, will only prevent the logs from being sent by the clients!)

To re-start the logger, just execute start_logger.sh. Keep in mind that clients which where unable to contact the logging services (either because it was not yet started or because it has been shut down while they where sending the logs) will not try to contact the logging service anymore: therefore their logs will not appear!

Which commands can i use to manage data on the grid?

We will cover here only gsiftp transfer, therefore the commands to be used are globus-url-copy, to transfer data to and from an SE, in the form:

$ globus-url-copy gsiftp://prod-se-01.pd.infn.it/proc/cpuinfo file:`pwd`/cpuinfo

the inverse operation is also possible:

$ globus-url-copy file:/proc/cpuinfo gsiftp://prod-se-01.pd.infn.it/tmp/foo

If you want a more user friendl(ish...) approach, or you want to perform more complex operations, then the command uberftp must be used.

$ uberftp prod-se-01.pd.infn.it
220 prod-se-01.pd.infn.it GridFTP Server 1.12 GSSAPI type Globus/GSI
wu-2.6.2 (gcc32dbg, 1062606889-42) ready.
230 User euindia002 logged in.
uberftp> cd /
uberftp> ls
drwxr-xr-x     3  root    4096 Dec  1 04:42  .EU_STORAGE_DIR
drwxr-xr-x     3  root    4096 Dec  1 04:42  .INFN.IT_STORAGE_DIR
-rw-r--r--     1  root       0 Jan  7 08:12  .autofsck
drwxr-xr-x     2  root    4096 Apr 26  2005  afs
drwxr-xr-x     2  root    4096 Apr  9 15:49  bin
drwxr-xr-x     3  root    4096 Jan 16  2007  boot
drwxr-xr-x    24  root  118784 Apr  9 12:36  dev
drwxr-xr-x    58  root    8192 Apr  9 15:52  etc
drwxr-xr-x     4  root    4096 Apr  4  2007  flatfiles
drwxr-xr-x  1626  root   32768 Apr  2 12:18  home
(...)

The command is very similar to a standard ftp client and supports (m)get/(m)put, (l)cd etc..

How can a proper Storage Element be found?

Use the command:

$ lcg-infosites --vo euindia se

to get a list of Storage Elements for your VO.

Then, after creating a certificate, use the command:

$ uberftp

to probe the SE and ensure that it is functional (e.g. you are able to log in), then search it for a proper location to save the data.

If you are using gsiftp for transfer (the only method covered in this small howto) you'll need a persistent directory where you can write and save your data.

Browse the file-system, typical locations which are suitable are in the form:

/flatfiles/euindia/
/flatfiles/SE00/euindia
/data/euindia
/scratch/euindia

and so on. Keep those examples as suggestions, the location should be found manually since is usually SE dependant.

After finding a proper location, create in it a directory where you will put your data (you can name it as your user name), this will be your "personal location" on that Storage Element.

How can the support file be put in the proper location (grid only)?

Use uberftp to create a directory in your newly created "personal location" on a Storage Element and put there the support files for the simulation (python.tar.bz2, aminoacids.dat, tpbconv and mdrun). From now on we'll assume that the name of the directory will be "download".

Additionally, create also a directory for each simulation you'll want to run, which will be used to upload the trajectories.

How can i figure out if something is going wrong on the remote hosts?

A first clue about that can be evinced from the server's log in fast_configure/simulations//server/ProteinServer.log: this file should periodically be checked for lines starting with the ERROR label (which contains the time and an explanation of the issue).

If the server starts rejecting a large number of WNs and produces lot of ERRORS, then this *may* be an indicator of some problem, depending on the reason.

However most of the time this is an indicator of some network problem between the server and the WNs (e.g. the connectivity stopped working or something like that) therefore it's always a good idea not to be too jumpy and try to submit again, before starting to worry about bugs.

Reasons like "Connection reset by peer" or "Timeout while waiting for an answer" are very likely just temporary network problems.

BEMuSE has been designed in order not to suffer from such incidents: therefore just resubmit and wait.

Though the logs on the server give some insight about what's happening on the grid, the most complete way to obtain information about that is to use the logging service and search the directory fast_configure/simulations//logging for the file belonging to the client who failed on the grid (the format of which is similar to the one for the server).

This can be quite a task, at first glance, but with a little knowledge of bash scripting and regular expression, it can be done quite easily.

e.g.

Let assume that a client in a simulation died, with the server reporting:

(...)
INFO  20/05 00:29:46: nat-1-out-1.lnl.infn.it:46517 replied to get_data
INFO  20/05 00:29:46: COLVAR, HILLS synchronized for 2!
ERROR 20/05 00:29:46: 2.get_data(): checkpoint failed!
INFO  20/05 00:29:46: Closing nat-1-out-1.lnl.infn.it:46517
(...)

since we are curious, we decide to search the logs for more information.

The directory logging, however, contains 28 directories and 168 log files...

$ find -iname "*Client*"
./dz04.ct.infn.it/42117/sh3_ProteinClient.log-1211065129.9021821
./dz04.ct.infn.it/42090/sh3_ProteinClient.log-1211065010.4804249
./alicegrid12.ba.infn.it/54878/sh3_ProteinClient.log-1211184732.239048
./alicegrid12.ba.infn.it/54818/sh3_ProteinClient.log-1211184690.548388
./gwn02.ilc.cnr.it/37787/sh3_ProteinClient.log-1211192552.227711
./gwn02.ilc.cnr.it/37903/sh3_ProteinClient.log-1211192665.6836879
./gwn02.ilc.cnr.it/37931/sh3_ProteinClient.log-1211192692.033484
./gwn02.ilc.cnr.it/37729/sh3_ProteinClient.log-1211192498.2540619
./gwn02.ilc.cnr.it/37959/sh3_ProteinClient.log-1211192719.245604
./gwn02.ilc.cnr.it/38107/sh3_ProteinClient.log-1211192851.1367271
./gwn02.ilc.cnr.it/37871/sh3_ProteinClient.log-1211192635.211324
./gwn02.ilc.cnr.it/38163/sh3_ProteinClient.log-1211192903.791203
(and much, much more...)

searching that manually would be quite an ordeal, however we know that the index of the client was "2" and we learn browsing the sh3_ProteinClient.log_xxx files a little, that a line in the form:

... Im the processor  ...

is present in each file.

By doing a simple:

$ grep -r Im\ the\ processor */*/* |grep \ 2
...
gw-net.pd.infn.it/45670/sh3_ProteinClient.log-1211064572.0454929:INFO 18/05 00:49:32: Im the processor 2
...

and matching the time and day of the error, we get the right file, search the ERROR lines and we discover that there where nothing to be worried about (the client simply closed the connection since the queue expired).

Errors in the code are usually shown in a format similar to this:

ERROR 20/05 00:29:46: ------- Traceback ------
ERROR 20/05 00:29:46: nasty messages
ERROR 20/05 00:29:46: more nasty messages
ERROR 20/05 00:29:46: (...)
ERROR 20/05 00:29:46: -------------------------

they can appear in both the server and client logs (although they shouldn't) and they have to be reported to the developed (simply copy and paste the error in the mail with the report, if the file with the logs is too big).

How can the server be stopped and restarted?

The server get started at the end of the configuration with set_up_sim.py: if it crashes or the machines goes down (e.g. due to a power failure or a reboot) the server will, of course stop.

To start it again, simply re-execute the set_up_sim.py script providing the simulation you want to re-start.

You can use the same method to stop the server, although a simple kill on the right PID should do; if you are wondering which is the pid currently owned by a server, simply inspect the file:

fast_configure/simulations/<simname>/server/ProteinServer_Lock

Although as soon as the server dies, all clients connected to it will follow it's doom, it's a good idea to cancel all the submitted jobs, since it will avoid to waste the resources each job will take, once in the grid, to establish a connection to the server, before noticing the server it's down..

How can i add a more advanced support for multiple storages in the server?

If a simulation is very large (it's designed to run many different walkers at once) a single SE may not be sufficient to satisfy the demands of all the clients on the grid at the checkpoint time.

To walk around this problem, more than a single SE may be specified, both to provide downloading and uploading for the WNs: this scenario is already supported by the set_up_sim.py in a simple way (read the help provided by the script itself for more info. about this).

It is very important to know that, with the script approach, the algorithm which assigns the SE to the WNs is very simple: each time a SE is necessary to carry on an operation on the WN, the server picks randomly one of the SE available.

The former, simple, approach is sufficient for most cases: e.g. it is adequate if a simulation is run on a grid which is not de-centralized (like euindia), since each SE will be near-enough to each WN.

However, for large decentralized structures, like the euindia one, which spawns between Europe and India, this approach doesn't work well. This because though adding an Indian repository will help the transactions of the indian WNs, it will greatly slow down the ones of the european WN (and a similar problem will show between indian WNs and european SEs).

This problem may be overcome enforcing a more strict match between WNs and SEs: to do that you should stop the server, modify the server.ini file as follows nd restart it.

The following sections in the server hold the locations where the binaries may be found by the clients, and where the trajectories are saved:

 

[Repositories]
Download = gsiftp://s2-se-03.infn.it/data1/euindia/johndoe/download#grid
        gsiftp://prod-se-03.ct.infn.it/flatfiles/SE00/euindia/johndoe/protein/download#grid
Upload = gsiftp://s2-se-03.infn.it/data1/euindia/johndoe/sh3#grid
        gsiftp://prod-se-03.ct.infn.it/flatfiles/SE00/euindia/johndoe/protein/sh3#grid

The #grid at the end is added by the setup script and is a label which is used to match the clients which request a repository.

When a client needs a SE to perform an operation, it provides it's label(s) to the server, and the server match them with the list, returning only the correct repositories.

This can be used to our advantage: let's assume we want to provide a repository for indian WNs only, we can simply get a proper location in the indian SE, and add to the file something like this:

[Repositories]
Download = gsiftp://s2-se-03.infn.it/data1/euindia/johndoe/download#grid
        gsiftp://prod-se-03.ct.infn.it/flatfiles/SE00/euindia/johndoe/protein/download#grid
        gsiftp://indian-se-03.pune.in/flatfiles/euindia/johndoe/protein/download#grid,india
Upload = gsiftp://s2-se-03.infn.it/data1/euindia/johndoe/sh3#grid
        gsiftp://prod-se-03.ct.infn.it/flatfiles/SE00/euindia/johndoe/protein/sh3#grid
        gsiftp://indian-se-03.pune.in/flatfiles/euindia/johndoe/protein/sh3#grid,india

in this way, only clients with both the labels grid and india will use the newly added repositories, where the ones with only the grid label will keep using only the old ones.

This may be also used to support intermixing of grid and HPC resources (or even to include your desktop computer) in the computation:

[Repositories]
Download = gsiftp://s2-se-03.infn.it/data1/euindia/johndoe/download#grid
        gsiftp://prod-se-03.ct.infn.it/flatfiles/SE00/euindia/johndoe/protein/download#grid
        gsiftp://indian-se-03.pune.in/flatfiles/euindia/johndoe/protein/download#grid,india
        file:/scratch/johndoe/download#local
Upload = gsiftp://s2-se-03.infn.it/data1/euindia/johndoe/sh3#grid
        gsiftp://prod-se-03.ct.infn.it/flatfiles/SE00/euindia/johndoe/protein/sh3#grid
        gsiftp://indian-se-03.pune.in/flatfiles/euindia/johndoe/protein/sh3#grid,india
        file:/scratch/johndoe/sh3#local

In this way clients with the local label will use the local filesystem to save, retrieve the relevant data (local to the client running the gromacs executable!): just be sure to save the various prerequisites in the right places...

After the configuration of the server (which should be restarted), you must make sure that the clients you are running are using the correct labels to request the SEs (otherwise they will always request the grid ones).

To do so, you must modify the script bootstrap.py in the client directory, line 20, from:

LABEL="grid"

to, e.g.:

LABEL="grid,india"

and from now on, all the jobs submitted will present themselves with the grid and india labels.

This procedure may be used to:

  • Balance the load on different storages
  • Support different architecture at once
  • Provide binaries very specifically optimized for the different architectures on the grid (or simply for AMD/Intel processors)
  • Mix usually non inter-operable resources (e.g. grid WNs and computers which cannot acces a SE due to the lack of middleware and/or certificate)
  • And so on...

Which Protocols are supported for the storage?

For the upload of the data: gsiftp:, lfn:, ftp: and file: urls are supported.

For the download, in addition also http is supported.

A lfn: can be specified in a non-standard way:

lfn:<vo>:<server>@<path>

e.g.:

lfn:euindia:se-01.somewhere.morename.it@/grid/euindia/johndoe/protein/download

with this format, BEMuSE can use the additional information to work around some kind of ill configured SE (specifically the ones with no VO_<VO>_DEFAULT_STORAGE or with the variable set to "Classic SE Host", other errors are not detected) where the server provided will be ignored in the remaining cases.

Where can i find the data generated by a simulation?

Two kind of data is produced by each running simulation: the one locally stored on the server (HILLS.0, COLVAR.0 and tpr file) and the remotely stored data (the trajectories and a backup copy of the tpr).

The trajectories, as already explained, can be retrieved using uberftp, the local data is stored in a directory called "data" in the server directory: it's important to realize that no backup copy of that data exists, therefore losing corresponds to losing the simulation.

The data directory is used by the server to handle the simulation and therefore, under no circumstances should be modified while the server is running.

However is fairly safe to read the data into it (though i'll advice to copy the entire directory elsewhere first, either as a precaution, or as backup) in order to inspect the simulation.

How do i retrieve the data from a SE (grid only)?

Use the command uberftp and search you home locations for the directory where you instructed the server to upload the data (each different simulation should dump the trajectories in different directories, possibly on multiple SE).

The same restrictions which hold for the "data" directory doesn't apply to the trajectories and tpr backup on the grid: after writing them on the SE, the clients forget completely about them and therefore they can be retrieved and removed at leisure: just try not to remove the files while the client is writing them, leave file newer than 1 hour alone.

How can i check if resources for my job are available?

Use the command:

$ lcg-infosites --vo euindia ce
valor del bdii: eu-india-03.pd.infn.it:2170
#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
   2       1       2              1        1    g9.ctsf.cdac.org.in:2119/jobmanager-lcgpbs-euindia
   4       1       2              2        0    gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-grid
  18      15       3              3        0    vecce01.vecc.eu-india.res.in:2119/jobmanager-lcgpbs-euindia
 180       0     490             61      429    gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long
 180       0      87             28       59    gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short
 180       0     510             90      420    gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite
 255       0      38             37        1    grid012.ct.infn.it:2119/jobmanager-lcglsf-euindia
 174       0      58             58        0    t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-euindia
 174       0      57             57        0    t2-ce-01.lnl.infn.it:2119/jobmanager-lcglsf-euindia
  48       1     406            117      289    prod-ce-01.pd.infn.it:2119/jobmanager-lcglsf-grid
  48       1     406            117      289    prod-ce-02.pd.infn.it:2119/jobmanager-lcglsf-grid
   8       8       0              0        0    ce01.euig-cdac-pnq.ernet.in:2119/jobmanager-lcgpbs-euindia
  10       0      12             10        2    ce-01.grid.sissa.it:2119/jobmanager-lcgpbs-euindia
 146      85       5              5        0    serv03.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-euindia
   8       8       0              0        0    ce01.unipune.ernet.in:2119/jobmanager-lcgpbs-euindia

this will give you a rough (and highly inaccurate) display of the situation on the grid. Don't trust it too much: just submit and hope for the best.

No, better: let's forget about this whole lcg-infosites thing... deal?

How can i create a proxy certificate?

Use the command:

$ voms-proxy-init --voms euindia --valid 1000:0

You can verify that the proxy has been correctly created with the following command:

$ voms-proxy-info --all
subject   : /C=IT/O=INFN/OU=Personal Certificate/L=SISSA/CN=John Doe/CN=proxy
issuer    : /C=IT/O=INFN/OU=Personal Certificate/L=SISSA/CN=John Doe
identity  : /C=IT/O=INFN/OU=Personal Certificate/L=SISSA/CN=John Doe
type      : proxy
strength  : 512 bits
path      : /tmp/x509up_u501
timeleft  : 999:99:24
=== VO euindia extension information ===
VO        : euindia
subject   : /C=IT/O=INFN/OU=Personal Certificate/L=SISSA/CN=John Doe
issuer    : /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it
attribute : /euindia/Role=NULL/Capability=NULL
timeleft  : 71:99:24

The certificate will allow you to operate on the grid for almost 72 hours.

How can i submit a job (grid only)?

Be sure to have an active proxy certificate.

To submit a single job from a jdl called villing.jdl to a non specified resource (which will be automatically chosen) and save the identifier in a file called id_grid.txt:

$ glite-wms-job-submit -a -o id_grid.txt villin.jdl

To do the same thing requesting a specific computing element:

$ glite-wms-job-submit -a -o id_grid.txt -r prod-ce-02.pd.infn.it:2119/jobmanager-lcglsf-grid villin.jdl

To do it 10 times:

$ for((i=0;i<20;i++));do glite-wms-job-submit -a -o id_grid.txt -r prod-ce-02.pd.infn.it:2119/jobmanager-lcglsf-grid villin.jdl;done

How can i check a job status on the grid?

Be sure to have an active proxy certificate.

To check the status of a job you have an id of:

$ glite-wms-job-status https://eu-india-04.pd.infn.it:9000/aZptGwRgdmaiSk0BWpdMPQ

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://eu-india-04.pd.infn.it:9000/aZptGwRgdmaiSk0BWpdMPQ
Current Status:     Done (Success)
Logged Reason(s):
    -
    - Job terminated successfully
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-grid
Submitted:          Mon May 19 21:16:18 2008 CEST
*************************************************************

To check all jobs in a file:

$ glite-wms-job-status -i --noint id_grid.txt
(status of all jobs)

How can i cancel a job in the grid?

Use the command glite-wms-job-cancel: the syntax is identical to the one of the glite-wms-job-status command.

How can "fast configure" be used to run the BEM on non grid resources?

First, be sure to have read this entire FAQ (and the paragraph: "How can i add a more advanced support for multiple storages in the server?" with special care) to get an hint of how BEMuSE works.

To execute BEMuSE on a local resource only a subset of the requirements is need:

  • Python2.5 with bzip2 and gzip support compiled in on both server (the machine where the set up script will be run) and clients (the machines whih will provide the CPU time for the simulation: WN from now on, in analogy with the grid case).
  • Outbound connectivity from the WNs to the server.
  • A shared directory among the WNs (not a strict requirement, it will simplify the configuration though) which will be used by the WNs to get the BEM executables and save the trajectories.
  • The BEM enabled mdrun executable, the tpbconv utility and the file aminoacids.dat
  • a set archives containing:
    • a tpr file
    • a META_INP.0 file, with the parameters LABEL and RESTART specified
    • a file HILLS.0 consistent with the META_INP.0 definition, with at least one line (with the definition of the variables and label).

Set up a simulation exactly as you would to perform a grid simulation (you can discard the logging service, if the resources are local and you have direct access to the running directories): when the repositories (gsiftp urls) are requested, give instead strings in the form:

file:/scratch/bemuse/download

where "/scratch/bemuse/download" is the shared directory on the WNs where the executables have been saved, and:

file:/scratch/bemuse/simname

where "/scratch/bemuse/simname" is the shared directory on the WNs created, with appropriate permissions, for the upload of the trajectories (as you have probably already understood, the directories may not be shared, as long as the paths are the same on all WNs and all the "download" ones contain the same, required, files).

This will create a server.ini file some lines like this:

[Repositories]
Download = file:/scratch/bemuse/download#grid
Upload = file:/scratch/bemuse/simname#grid

which will not need to be modified (since the #grid part is just label you may leave it as it is).

Now check the "client.ini" file in the "client" directory, and be sure that the lines in the "Server" section are set to an address which can be reached from the WNs:

[Server]
Host = server.machine.org   <---- should be reachable from the WNs!!!
Port = 24371                <---- should be open from the WNs to Host!!!

since the port is randomly selected in the GLOBUS_TCP_PORT_RANGE interval it may not be suited for your configuration: if you need to modify it, you will need also to:

  • Stop the server, if already running
  • Edit the "server.ini" file and change the "Port" option in the "Network" section to make it consistent with the new value in the "client.ini" file.
  • Re-start the server.

The same operations should be performed if you have enabled the logging support, for the "Logging" sections of each ".ini" file.

At this point you will need to take the "proteinclient.tar.bz2" package and the "client.ini" configuration file to system where you want to execute the simuation, put them in a empty directory and run the client with the lines:

$ tar xvjf proteinclient.tar.bz2
$ nohup python2.5 proteinclient.py grid &

which can be embedded into a submission script for your own queue management system.

Almost immediately, the client should contact the server and the simulation should start (check the .log files in the client and server directories if this doesn't happen to debug the problems, as well as the nohup.out file in the client directory).

Be particularly careful to NEVER ever start the protein client in a directory which already contains a COLVAR.0 or HILLS.0 file, since this may irreparably ruin your simulation!

The former warning is due to the fact that, after authentication, the server considers the clients "trusted" and doesn't double checks their input for errors or incorrect data (which is what a client will send if it get started in an unclean directory).

Soon after the first exchange, the trajectories will appear in the (possibly shared) upload directory on the WNs: HILLS and COLVAR will instead be periodiacally updated in the "data" dir on the server, as for the grid simulation.

More complex setups may be explored by skilled users.

« October 2017 »
Su Mo Tu We Th Fr Sa
1234567
891011121314
15161718192021
22232425262728
293031
 

Powered by Plone This site conforms to the following standards: