You are here: Home Grid tools and utilities Logging tools
Personal tools
Document Actions

Logging tools

by Riccardo Di Meo last modified 2007-12-19 20:07

A tool that can be used to retrieve the output streams of a simulation through the network.

Logging tool

Along with the necessity to submit SMP jobs on the grid, one of the most requested features from scientists that would like to benefit from the large quantity of resources of the grid, is the ability to keep track of the status of their simulations, to allow them to interrupt them or get preliminary results.

The interactive job type

The LCG middleware provides already a tag to allow the user to control the standard streams of a job, thus allowing him/her to feed the program with input parameters, if needed, and to receive the standard input and output of the application as well (this can be achieved with the Jobtype="interactive" JDL line).

However, the previous approach has severe limitations besides some instabilities (let's just say that your mileage with the interactive jobs may vary...) and the listener provided by the middleware command seem to be tailored for a persistent connection only.

Because of this, an user is forced to keep a terminal always opened to receive the output of his/her job, and as soon as the listener application is closed, the program stops as well (and this isn't what users usually want).

Another limitation is that the interactive job is mutually exclusive with other job types (e.g. MPI ones) since the JobType tag can be specified only once inside the JDL.

File Perusal attributes

The new WMS allows files to be periodically retrieved from the WNs to the RB and saved by the user on demand while the job is running (see the FilePerusal tags in the JDL specifications).

Though this feature is quite useful and match the requirements of most users, it has a couple of minor disadvantages to cope with which could make it unpractical in some situations:

  • It is fully supported only with the glite-wms-* commands (other commands have limited or no support for it)
  • The data is periodically transmitted to the WMS, even when the user doesn't need it (therefore considerably increasing the network traffic between the WMS and the CE), especially if the interval specified by the user for the automatic data upload is small
  • Increases the pressure on the WMS, especially if the size of the data to transmit is relevant and therefore it would be wise not to use it on a large scale.
  • All the data is retrieved at each automatic upload: if the file requested is very large, this not convenient.

To overcome such limitations, we created a small python application that can be used to pipe data from a stream (like stdout or stderr) to selected host (much like netcat does), where the user may start an application to view all the data produced by his/her application (or only what he/she didn't have already seen up to then).

In order to keep the application simple we didn't bother about the stdin stream (dynamic input is not a requested feature) and we focused to provide a tool which could mimic the behavior of a job on a local cluster (where scientists usually can log in, in order to control the flow of their simulations): the user may also choose to follow the application, in real time, receiving periodically the output from it (more or less like the standard unix application tail does, if executed with the -f flag), or to simply retrieve the data produced and get the prompt again.

Logging tool vs netcat

Over the standard netcat, our tool has the following advantages:

  • the data is saved from the simulation even when there is no connection with the user's computer (no data is lost).
  • doesn't block the simulation if the output is produced faster than it's sent over the network
  • the command which listens for the data, executed by the user, is protected by a password.
  • the logging tool in not stateless and keeps track of all the logs produced by the simulation: the user may choose to see only the new ones or all the history.
  • works in 2 ways: one shoot to read the information produced until the execution of the listener and follow mode, to remain connected to the application (much like the listener from lcg), though the connection may be interrupted and resumed at any time.
  • both the transmitter and receiver are protected by a password.

Logging tool vs File Perusal

Though some users might prefer the Job Perusal approach, our tool can probably be interesting in some scenarios, since:

  • Can be used with any middleware flavor (in fact, it doesn't even need the middleware to be installed...)
  • The connection is established directly from the WN to the UI or any other machine which has inbound connectivity
  • Doesn't increases the load on the WMS
  • Doesn't increase the traffic considerably, since the data is transmitted on demand
  • Our tool can be put easily into a pipe to filter data as it comes (which can be handy sometimes)
  • Only the "new" parts of the data can be retrieved, though making the tool both efficient and user friendly

The application can be found at the address:

http://www.ictp.it/~dimeo/logging_tool-0.8.tar.bz2

Integrating and using the log transmitter in a generic application.

It's pretty straightforward to add logging support into a generic application: it's sufficient (for a JDL application) to add the logs_sender application into the InputSandbox and, in the script that will be used as executable, pipe the desired stream to it.

If the logs_sender is to be used with the reserve_smp_nodes, then the InputSandbox cannot be used and the file should be retrieved by other means instead. This scenario is particularly interesting due to the default lack of feedback of the reserve_smp_nodes jobs.

 

Here is a simple example of script (which is a slightly modified version of the first example in the reserve_smp_nodes section) which send the standard output and error to the user interface of the user who submitted it (to the fictional machine "userhost.edu", though any machine which can be accessed from the internet will do) on port 24000, protected by the password "foo":

 

#!/bin/bash

# Load the input data. The Catalog may also be used
globus-url-copy  gsiftp://.../input.dat file:`pwd`/input.dat

# Get the executable, which will use threads (or fork)
globus-url-copy gsiftp://.../threaded_bin file:`pwd`/threaded_bin
chmod +x threaded_bin

# Get the logs_sender command.
globus-url-copy gsiftp://.../logs_sender file:`pwd`/logs_sender
chmod +x ./logs_sender

# Run the command and send both standard error and output streams 
# to the user.
./threaded_bin -n ${RESERVE_SMP_NODES_CPU} $@  2>&1 \
   |./logs_sender -H userhost.edu -p 24000 -P foo

# Save the output back
globus-url-copy file:`pwd`/output.dat gsiftp://.../output.dat

At this point, after a few moments after reserve_smp_nodes will report that the script has been sent (since it will start running immediatly thereafter) we will be ready to execute the check_logs command (the pause is necessary to allow the script above to download the data from the SE in order to start the command).

The user on "userhost.edu" will execute the command:

./check_grid_logs -p 24000 -P foo

and the command will print all the logs that haven't been already viewed by the user, or:

./check_grid_logs -p 24000 -P foo -a

to see all the log from the beginning of the execution of the command.

After the command has printed the logs, the prompt will be returned, unless the -f flag is also specified, in which case, the check_grid_logs will keep querying the logs_sender for new logs periodically, until the simulation on the WN will stop or Ctrl+C will be issued by the user.

After the remote application stopped (or before it starts), if check_grid_logs will be executed, it will wait for 20s for new connections (though this period can be changed from the command line), and if no incoming connections will be detected, it will cleanly exit.

Both the logs_sender and check_logs commands print a help if executed with the -h flag which shows all the options.

Related content
« August 2017 »
Su Mo Tu We Th Fr Sa
12345
6789101112
13141516171819
20212223242526
2728293031
 

Powered by Plone This site conforms to the following standards: