There are several ways to initialize ApMon:
A first method to initialize ApMon is from a
configuration file, which contains the IP addresses or
DNS names of the hosts running MonALISA, to which the
data will be sent; the ports on which the MonALISA
services listen on the destination hosts should also be
specified in the file. The configuration file also
contains lines that specify lines for configuring xApMon
(see
Section 3, “xApMon - Automatically Sending
Monitoring Information”). The lines that
specify the destination hosts have the following
syntax:
IP_address|DNS_name[:port] [password]
Examples:
rb.rogrid.pub.ro:8884 mypassword
rb.rogrid.pub.ro:8884
ui.rogrid.pub.ro mypassword
ui.rogrid.pub.ro
If the port is not specified, the default value 8884
will be assumed. If the password is not specified, an
empty string will be sent as password to the MonALISA
host (and the host will accept the datagram either if it
does not require a password for the ApMon packets or if
the machine from which the packet was sent is in the
host's "accept" list). The configuration file may contain
blank lines and comment lines (starting with "#"); these
lines are ignored, and so are the leading and the
trailing white spaces from the other lines.
Another method to initialize ApMon is to provide a
list which contains hostnames and ports as explained
above, and/or URLs; the URLs point to plain text
configuration files which have the format described
above. The URLs may also represent requests to a servlet
or a CGI script which can automatically provide the best
configuration, taking into account the geographical zone
in which the machine which runs ApMon is situated, and
the application for which ApMon is used. The geographical
zone is determined from the machine's IP and the
application name is given by the user as the value of the
"appName" parameter included in the URL.
2. Sending Datagrams with
User Parameters
A datagram sent to the MonaLisa module has the
following structure:
The configuration file and/or URLs can be periodically
checked for changes, in a background thread or process,
but this option is disabled by default. It can be enabled
from the configuration file as follows:
3. xApMon - Automatically Sending
Monitoring Information
ApMon can be configured to send automatically, in a
background thread, monitoring information regarding the
application or the system. The system information is
obtained from the proc/ filesystem and the job
information is obtained by parsing the output of the ps
command. If job monitoring for a process is requested,
all its sub-processes will be taken into consideration
(i.e., the resources consumed by the process and all the
subprocesses will be summed).
There are three categories of monitoring datagrams
that ApMon can send:
a) job monitoring information - contains the following
parameters:
- run_time:
elapsed time from the start of this job
- cpu_time:
processor time spent running this job
- cpu_usage:
percent of the processor used for this job, as
reported by ps
- virtualmem: virtual memory
occupied by the job (in KB)
- rss:
resident image size of the job (in KB)
- mem_usage:
percent of the memory occupied by the job, as
reported by ps
- workdir_size: size in MB
of the working directory of the job
- disk_total: size in MB of
the disk partition containing the working
directory
- disk_used:
size in MB of the used disk space on the partition
containing the working directory
- disk_free:
size in MB of the free disk space on the partition
containing the working directory
- disk_usage: percent of the
used disk partition containing the working
directory
b) system monitoring information - contains the
following parameters:
- cpu_usr:
percent of the time spent by the CPU in user
mode
- cpu_sys:
percent of the time spent by the CPU in system
mode
- cpu_nice:
percent of the time spent by the CPU in nice
mode
- cpu_idle:
percent of the time spent by the CPU in idle
mode
- cpu_usage:
CPU usage percent
- pages_in:
the number of pages paged in per second (average for
the last time interval)
- pages_out:
the number of pages paged out per second (average for
the last time interval)
- swap_in:
the number of swap pages brought in per second
(average for the last time interval)
- swap_out:
the number of swap pages brought out per second
(average for the last time interval)
- load1:
average system load over the last minute
- load5:
average system load over the last 5 min
- load15:
average system load over the last 15 min
- mem_used:
amount of currently used memory, in MB
- mem_free:
amount of free memory, in MB
- mem_usage:
used system memory in percent
- swap_used:
amount of currently used swap, in MB
- swap_free:
amount of free swap, in MB
- swap_usage: swap usage in
percent
- net_in:
network (input) transfer in kBps
- net_out:
network (input) transfer in kBps
- net_errs:
number of network errors (these will produce params
called sys_ethX_in, sys_ethX_out, sys_ethX_errs,
corresponding to each network interface)
- processes:
curent number of processes (this will also produce
parameters called processes_{D,R,T,S,Z}- number of
processes in the D (uninterruptible sleep),R
(running), T(traced/stopped), S (sleeping),Z (zombie)
states)
- uptime:
system uptime in days
- net_sockets: the number of
open TCP, UDP, ICM, Unix sockets (this will produce
parameters called sockets_tcp, sockets_udp, ...)
- net_tcp_details: the
number of TCP sockets in each possible state (this
will produce parameters called
sockets_tcp_ESTABLISHED, sockets_TCP_LISTEN,
...)
c) general system information - contains the following
parameters:
- hostname:
the machine's hostname
- ip: will
produce ethX_ip params for each interface
- cpu_MHz:
CPU frequency
- no_CPUs:
number of CPUs
- total_mem:
total amount of memory, in MB
- total_swap: total amount
of swap, in MB
The parameters can be enabled/disabled from the
configuration file (if they are disabled, they will not
be included in the datagrams). In order to enable/disable
a parameter, the user should write in the configuration
file lines of the following form:
xApMon_job_parametername = on/off
(for job parameters) or:
xApMon_sys_parametername = on/off
(for job parameters) or:
xApMon_parametername = on/off
(for general system parameters) Example:
xApMon_job_run_time = on
xApMon_sys_load1 = off
xApMon_no_CPUs = on
By default, all the parameters are enabled.
The job/system monitoring can be enabled/disabled by
including the following lines in the configuration
file:
xApMon_job_monitoring = on/off
xApMon_sys_monitoring = on/off
The datagrams with general system information are only
sent if system monitoring is enabled, at greater time
intervals than the system monitoring datagrams. To
enable/disable the sending of general system information
datagrams, the following line should be written in the
configuration file:
xApMon_general_info = on/off
The time interval at which job/system monitoring
datagrams are sent can be set with:
xApMon_job_interval = number_of_seconds
xApMon_sys_interval = number_of_seconds