You are here: Home wiki EUIndiaGrid_Tools
Personal tools
Views

Tools developed inside the project.

In order to achieve some of the goals described in the previous sections we created some programs which exploit the available resources in ways that can be defined "non standard" (since they use the infrastructure with patterns that are not previsti in its design).

The ideas behind such binaries proved to be general and useful enough to be worth investing some extra time in the development, in order to make the resulting applications general and stand alone (e.g. adding command line options and presenting them as separate tools).

The results have been tested and integrated in different scenarios, proving to be useful in enriching the user experience and capabilities without increasing his/her workload.

Here we present a short description of some of the work done on this field with suggestions about how to integrate it in other applications.

Reserve SMP nodes.

Today most (if not all) CEs? on the grid have nodes which are multi processor and have therefore the capabilities to run very tightly coupled codes, at least in a limited fashion (using up to 4 CPUs? at once, though 2 processors per node is the more common setup): however with the current middleware, a software that allows an user to submit such kind of jobs is nowhere to be found.

To overcome this limitation, we developed an application (written in python) that allows an user to submit reserve an entire node on the grid and run computations on it without conflicting with other user's jobs.

The application, called reserve_smp_nodes is a system of 2 separate python scripts: a server which runs locally and a client which is executed on the grid.

The tool works on a job reservation basis: the server submits a number of jobs to a CE in order to increase the chances that more than one could end on the same node: once this happens, a script is executed on the node (now completely reserved), while all the other jobs expire in order to free the resources.

Since we acknowledge that, though this tool doesn't add anything new to what an uncaring user may do to cause inconveniences to others, using the job reservation should be avoided, since it increases the chances to waste CPU time, we invested a considerable effort adding security features to reduce considerably this risk, as well as the chance of keeping resources booked by mistake; moreover from our analysis it seems like another approach to provide the same service doesn't exists (or is very difficult to be found...) at a user level.

Therefore it seems like that, until edg will provide (if ever) the ability (through a JDL tag?) to book an entire node, our application will be the only way to run multi threaded or shmem MPI codes.

The reserve_smp_nodes program can be found at the address:

http://www.ictp.it/~dimeo/reserve_smp_nodes-1.10.tar.bz2

Usage

The Reseve_smp_nodes can be used in two ways: interactive and via command line (a third interface, with Tkinter, should be considered in alpha stage).

In order to make this application as simple and general as possible, we stripped it of all the unnecessary features (like complex data transfer or logging features) leaving the bare minimum to run an application: once a node is reserved, the user provide a single executable (usually a script) as long as it options, and such code will start running independently on the WN, thus allowing the user even to shut down the computer, if he/she likes to.

The user will require a configured UI with a valid proxy certificate, as well as a script (though a binary can be used) that carry on all the tasks necessary for the preparation of the environment, execution of the tasks and subsequent saving of the data on the grid (since, as said, no other data outside the user script, is sent to the WN).

Here is a very simple example about how to use a very bare bone script to perform some computation:

#!/bin/bash

# Load the input data. The Catalog may also be used
globus-url-copy  gsiftp://.../input.dat file:`pwd`/input.dat

# Get the executable, which will use threads (or fork)
globus-url-copy gsiftp://.../threaded_bin file:`pwd`/threaded_bin
chmod +x threaded_bin

# Run it, with the options provided by the user
# (i assume the -n options is the number of threads to spawn)
./threaded_bin -n ${RESERVE_SMP_NODES_CPU} $@

# Save the output back
globus-url-copy file:`pwd`/output.dat gsiftp://.../output.dat

The cleaver user may find in the previous script enough hints about how to convert other applications in a more efficient way (e.g. using compressed archives and executing more computations up the maximum queue lenght).

As can be plainly seen, deputing the script to perform all the required steps to run a simulation doens't involve more lines than the ones needed to write a JDL.

Using reserve_smp_nodes with MPICH with the shmem device

Since MPI usage, among scientists, seems to be far more common than the use of plain threads, we provided also a couple of solutions that could be used to port applications, like Quantum Espresso or RegCM?, which are already MPI enabled and which would greatly benefit from the very small shmem latency and large bandwidth (and cannot be used efficiently, if at all, with the normal p4 device).

The first, straightforward solution, is to compile and install the MPI package with shmem device enabled as experimental software on as many CE as possible.

We developed a script that takes care of this task (as well as of the installation of other softwares) and carried on it on a number of them, however this option lacks scalability, since not few CEs? have setups which doesn't allow an easy or automatic installation of applications (and in some cases the administrator of the site should be contacted personally), however this is the most clean, standard and fast solution (from the point of view of the code execution).

Due to the aforementioned limitations, we also studied the feasibility of an MPICH package which could be re-located on different directories from the one it was compiled in, and we easily succeeded in creating one by simple substitution of some variables in the "mpirun" executable.

This package (which can be trivially created from scratches) is provided at the address:

http://www.ictp.it/~dimeo/relocatable_mpich_shmem.tar.gz

and contains an MPICH environment compiled with the shmem device which is suitable to be executed on the EUIndia? grid, as well as a script called "relocate.sh" which adapt the package to the current directory.

Here is a simple example about how to execute an MPI code with shared memory with the last approach (as it will appear evident, some steps overlap with the previous script):

#!/bin/bash

# Load the input data. The Catalog may be used
globus-url-copy  gsiftp://.../input.dat file:`pwd`/input.dat

# Get the executable (compiled against shmem!)
globus-url-copy  gsiftp://.../shmem_bin file:`pwd`/shmem_bin
chmod +x shmem_bin

# This is the new step (not needed if shmem mpi is installed as
# experimental software): get the relocatable mpi package and adapt it
# to the current directory
globus-url-copy                                 \
  gsiftp://.../relocatable_mpich_shmem.tar.gz   \
  file:`pwd`/relocatable_mpich_shmem.tar.gz 
tar xvzf relocatable_mpich_shmem.tar.gz
./relocate.sh mpich_smp
source mpich.env.sh

# We are ready to run the code!
mpirun -np  ${RESERVE_SMP_NODES_CPU} ./shmem_bin

# Save the output
globus-url-copy file:`pwd`/output.dat gsiftp://.../output.dat

As can be plainly seen, once the relocatable package has been uploaded on a SE, only 4 extra commands are required in order to execute an MPI code with this approach.

A simple session.

In order to show how easy it is to use reserve_smp_nodes, a session with the interactive interface is provided here, where we submit 10 jobs in order to get a 2 processors node on ictpgrid-ce-1, as an example:

$ ./reserve_smp_nodes  -i
Listening port (23000)? 23790
VO (euindia): [enter]
NS type: edg/glite/glite-wms (Default glite-wms)? [enter]
Checking the resources available...
Destination:
--------------------------------------------------------------
0) grid0.fe.infn.it:2119/jobmanager-lcgpbs-grid
1) grid012.ct.infn.it:2119/jobmanager-lcglsf-euindia
2) gridce.sns.it:2119/jobmanager-lcgpbs-grid
3) gridce2.pi.infn.it:2119/jobmanager-lcglsf-grid4
4) ictpgrid-ce-1.ictp.it:2119/jobmanager-pbs-euindia
5) prod-ce-01.pd.infn.it:2119/jobmanager-lcglsf-grid
6) serv03.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-euindia
7) vecce01.vecc.eu-india.res.in:2119/jobmanager-lcgpbs-euindia
8) t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-euindia
9) gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite
10) gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long
11) gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short
12) * Use the matchmaking
Select an option(12): 4
How many cpus do you want to reserve (1)? 2
How many jobs do you want to submit (1)? 10
How long should i try to reserve the cpus (300")? [enter]
Script to execute? test.sh
Arguments to pass to it ("")? option1 option2 option3 
------------------------------------------
All jobs correctly submitted!
** New connection established from 140.105.46.200:38695
   + Hostname received: node037.beowulf.ictp.it
** New connection established from 140.105.46.200:38752
   + Hostname received: node038.beowulf.ictp.it
** New connection established from 140.105.46.200:38939
   + Hostname received: node039.beowulf.ictp.it
** New connection established from 140.105.46.200:38940
   + Hostname received: node039.beowulf.ictp.it
Script 'test.sh' sent.

At this point the program gives the prompt back and the user is free to execute another task (or even to shut the computer down): the test.sh script has been executed and will run on it's own on the WN node039.beowulf.ictp.it.

Logging tool

Along with the necessity to submit SMP jobs on the grid, one of the most requested features from scientists that would like to benefit from the large quantity of resources of the grid is the ability to keep track of the status of their simulations, in such a way to allow them to interrupt them or get preliminary results.

The LCG middleware provides already a tag to allow the user to control the standard streams of a job, thus allowing him/her to feed the program with input parameters, if needed, and to receive the standard input and output of the application as well (this can be achieved with the attribute Jobtype="interactive" JDL line).

However, the previous approach has severe limitations besides some instabilities (let's just say that your mileage with the interactive jobs may vary...) the established connection and the listener provided by the middleware command seem to be tailored for a persistent connection only.

Because of this, an user is forced to keep a terminal always opened to receive the output of his/her job, and as soon as the listener application is closed, the program stops as well. This isn't what the user usually wants.

Another limitation is that the interactive job is mutually exclusive with other types (e.g. MPI ones) since the JobType? tag can be specified only once inside the JDL.

Therefore we created a small python application that can be used to pipe data from a stream (like stdout or stderr) a selected host (much like netcat does), where the user may start an application to view all the data produced by his/her application (or only what he/she didn't have already seen up to then).

This design has been chosen in order to keep the application simple (dynamic input is not a requested feature) and to mimic the behavior of a local cluster, where scientists usually can log in, in order to control the flow of their simulations: the user may also choose to follow the application, in real time, receiving periodically the output from it, or to simply retrieve the data produced and get the prompt again.

Over the standard netcat, our tool has the following advantages:

  • the data is read from the simulation even while there is no connection with the user's computer.
  • doesn't block the simulation if the output is produced faster than it's sent over the network
  • the command which listens for the data, executed by the user, is protected by a password.
  • the logging tool in not stateless and keeps track of all the logs produced by the simulation: the user may choose to see only the new ones or all the history.
  • works in 2 ways: "one shoot" to read the information produced until the execution of the listener and "follow mode", to remain connected to the application (much like the listener from lcg), though the connection may be interrupted and resumed at any time.
  • both the transmitter and receiver are protected by a password.

The application can be found at the address:

http://www.ictp.it/~dimeo/logging_tool-0.8.tar.bz2

Integrating and using the log transmitter in a generic application.

It's pretty straightforward to add logging support into a generic application: it's sufficient (for a JDL application) to add the logs_sender application into the InputSandbox? and, in the script that will be used as executable, pipe the desired stream to it.

If the logs_sender is to be used with the reserve_smp_nodes, then the InputSandbox? cannot be used and the file should be retrieved by other means instead. This scenario is particularly interesting due to the default lack of feedback of the reserve_smp_nodes jobs.

Here is a simple example of script (which is a slightly modified version of the first example in the reserve_smp_nodes section) which send the standard output and error to the user interface of the user who submitted it (to the machine userhost.edu, though any machine which can be accessed from the internet will do) on port 24000, protected by the password foo:

#!/bin/bash

# Load the input data. The Catalog may also be used
globus-url-copy  gsiftp://.../input.dat file:`pwd`/input.dat

# Get the executable, which will use threads (or fork)
globus-url-copy gsiftp://.../threaded_bin file:`pwd`/threaded_bin
chmod +x threaded_bin

# Get the logs_sender command. 
globus-url-copy gsiftp://.../logs_sender file:`pwd`/logs_sender
chmod +x ./logs_sender

# Run the command and send both standard error and output streams 
# to the user.
./threaded_bin -n ${RESERVE_SMP_NODES_CPU} $@  2>&1 \
   |./logs_sender -H userhost.edu -p 24000 -P foo

# Save the output back
globus-url-copy file:`pwd`/output.dat gsiftp://.../output.dat

At this point, after a few moments after reserve_smp_nodes will report that the script has been sent (since it will start running immediatly thereafter) we will be ready to execute the check_logs command (the pause is necessary to allow the script above to download the data from the SE in order to start the command).

The user on 'userhost.edu' will execute the command:

./check_grid_logs -p 24000 -P foo 

and the command will print all the logs that haven't been already viewed by the user, or:

./check_grid_logs -p 24000 -P foo -a 

to see all the log from the beginning of the execution of the command.

After the command has printed the logs, the prompt will be returned, unless the -f flag is also specified, in which case, the check_grid_logs will keep querying the logs_sender for new logs periodically, until the simulation on the WN will stop or Ctrl+C will be issued by the user.

After the remote application stopped (or before it starts), if check_grid_logs will be executed, it will wait for 20" for new connections (though this period can be changed from the command line), and if no incoming connections will be detected, it will cleanly exit.

Both the logs_sender and check_logs commands print a help if executed with the -h flag which shows all the options.

subtopics:


« October 2017 »
Su Mo Tu We Th Fr Sa
1234567
891011121314
15161718192021
22232425262728
293031
 

Powered by Plone This site conforms to the following standards: