MonALISA Grid Monitoring
Menu mode: dynamic | fixed
  HOME       CLIENTS       REPOSITORIES       DOWNLOADS       LOOKING GLASS       FAST DATA TRANSFER  
Last update on:
Dec 03, 2015

Uptime: 113 days, 19h, 24m
Number of requests: 5815817
since 28 October 2005
ApMon User Guide

ApMon User Guide



PDF version

Chapter 1. ApMon - General User Guide

1. ApMon Initialization

There are several ways to initialize ApMon:

A first method to initialize ApMon is from a configuration file, which contains the IP addresses or DNS names of the hosts running MonALISA, to which the data will be sent; the ports on which the MonALISA services listen on the destination hosts should also be specified in the file. The configuration file also contains lines that specify lines for configuring xApMon (see Section 3, “xApMon - Automatically Sending Monitoring Information”). The lines that specify the destination hosts have the following syntax:

IP_address|DNS_name[:port] [password]

Examples:

        rb.rogrid.pub.ro:8884 mypassword
        rb.rogrid.pub.ro:8884
        ui.rogrid.pub.ro mypassword
        ui.rogrid.pub.ro
     

If the port is not specified, the default value 8884 will be assumed. If the password is not specified, an empty string will be sent as password to the MonALISA host (and the host will accept the datagram either if it does not require a password for the ApMon packets or if the machine from which the packet was sent is in the host's "accept" list). The configuration file may contain blank lines and comment lines (starting with "#"); these lines are ignored, and so are the leading and the trailing white spaces from the other lines.

Another method to initialize ApMon is to provide a list which contains hostnames and ports as explained above, and/or URLs; the URLs point to plain text configuration files which have the format described above. The URLs may also represent requests to a servlet or a CGI script which can automatically provide the best configuration, taking into account the geographical zone in which the machine which runs ApMon is situated, and the application for which ApMon is used. The geographical zone is determined from the machine's IP and the application name is given by the user as the value of the "appName" parameter included in the URL.

2. Sending Datagrams with User Parameters

A datagram sent to the MonaLisa module has the following structure:

  • a header which has the following syntax:
            v:<ApMon_version>p:<password>       
           
    
    (the password is sent in plaintext; if the MonALISA host does not require a password, a 0-length string should be sent instead of the password).
  • cluster name (string) - the name of the monitored cluster
  • node name (string) - the name of the monitored nodes
  • number of parameters (int)
  • for each parameter: name (string), value type (int), value (can be double, int or string)
  • optionally a timestamp (int) if the user wants to specify the time associated with the data; if the timestamp is not given, the current time on the destination host which runs MonALISA will be used. The option to include a timestamp is possible since version 2.0.

The configuration file and/or URLs can be periodically checked for changes, in a background thread or process, but this option is disabled by default. It can be enabled from the configuration file as follows:

  • to enable/disable the periodical checking of the configuration file or URLs:
    xApMon_conf_recheck = on/off
    
  • to set the time interval at which the file/URLs are checked for changes:
    xApMon_recheck_interval = number_of_seconds
    

3. xApMon - Automatically Sending Monitoring Information

ApMon can be configured to send automatically, in a background thread, monitoring information regarding the application or the system. The system information is obtained from the proc/ filesystem and the job information is obtained by parsing the output of the ps command. If job monitoring for a process is requested, all its sub-processes will be taken into consideration (i.e., the resources consumed by the process and all the subprocesses will be summed).

There are three categories of monitoring datagrams that ApMon can send:

a) job monitoring information - contains the following parameters:

  • run_time: elapsed time from the start of this job
  • cpu_time: processor time spent running this job
  • cpu_usage: percent of the processor used for this job, as reported by ps
  • virtualmem: virtual memory occupied by the job (in KB)
  • rss: resident image size of the job (in KB)
  • mem_usage: percent of the memory occupied by the job, as reported by ps
  • workdir_size: size in MB of the working directory of the job
  • disk_total: size in MB of the disk partition containing the working directory
  • disk_used: size in MB of the used disk space on the partition containing the working directory
  • disk_free: size in MB of the free disk space on the partition containing the working directory
  • disk_usage: percent of the used disk partition containing the working directory

b) system monitoring information - contains the following parameters:

  • cpu_usr: percent of the time spent by the CPU in user mode
  • cpu_sys: percent of the time spent by the CPU in system mode
  • cpu_nice: percent of the time spent by the CPU in nice mode
  • cpu_idle: percent of the time spent by the CPU in idle mode
  • cpu_usage: CPU usage percent
  • pages_in: the number of pages paged in per second (average for the last time interval)
  • pages_out: the number of pages paged out per second (average for the last time interval)
  • swap_in: the number of swap pages brought in per second (average for the last time interval)
  • swap_out: the number of swap pages brought out per second (average for the last time interval)
  • load1: average system load over the last minute
  • load5: average system load over the last 5 min
  • load15: average system load over the last 15 min
  • mem_used: amount of currently used memory, in MB
  • mem_free: amount of free memory, in MB
  • mem_usage: used system memory in percent
  • swap_used: amount of currently used swap, in MB
  • swap_free: amount of free swap, in MB
  • swap_usage: swap usage in percent
  • net_in: network (input) transfer in kBps
  • net_out: network (input) transfer in kBps
  • net_errs: number of network errors (these will produce params called sys_ethX_in, sys_ethX_out, sys_ethX_errs, corresponding to each network interface)
  • processes: curent number of processes (this will also produce parameters called processes_{D,R,T,S,Z}- number of processes in the D (uninterruptible sleep),R (running), T(traced/stopped), S (sleeping),Z (zombie) states)
  • uptime: system uptime in days
  • net_sockets: the number of open TCP, UDP, ICM, Unix sockets (this will produce parameters called sockets_tcp, sockets_udp, ...)
  • net_tcp_details: the number of TCP sockets in each possible state (this will produce parameters called sockets_tcp_ESTABLISHED, sockets_TCP_LISTEN, ...)

c) general system information - contains the following parameters:

  • hostname: the machine's hostname
  • ip: will produce ethX_ip params for each interface
  • cpu_MHz: CPU frequency
  • no_CPUs: number of CPUs
  • total_mem: total amount of memory, in MB
  • total_swap: total amount of swap, in MB

The parameters can be enabled/disabled from the configuration file (if they are disabled, they will not be included in the datagrams). In order to enable/disable a parameter, the user should write in the configuration file lines of the following form:

xApMon_job_parametername = on/off

(for job parameters) or:

xApMon_sys_parametername = on/off

(for job parameters) or:

xApMon_parametername = on/off

(for general system parameters) Example:

        xApMon_job_run_time = on
        xApMon_sys_load1 = off
        xApMon_no_CPUs = on
     

By default, all the parameters are enabled.

The job/system monitoring can be enabled/disabled by including the following lines in the configuration file:

        xApMon_job_monitoring = on/off
        xApMon_sys_monitoring = on/off
     

The datagrams with general system information are only sent if system monitoring is enabled, at greater time intervals than the system monitoring datagrams. To enable/disable the sending of general system information datagrams, the following line should be written in the configuration file:

xApMon_general_info = on/off

The time interval at which job/system monitoring datagrams are sent can be set with:

        xApMon_job_interval = number_of_seconds
        xApMon_sys_interval = number_of_seconds
     

Chapter 2. Version Specific User Guides

In the following pages you can find specific information for each ApMon version:


This, and other documents, can be downloaded from http://monalisa.cacr.caltech.edu/

For questions about MonALISA, write at <support@monalisa.cern.ch>.