ApMon is a set of flexible APIs that can be used by any application to
send monitoring information to MonALISA services. The monitoring data
is sent as UDP datagrams to one or more hosts running MonALISA
services. Applications can periodically report any type of information
the user wants to collect, monitor or use in the MonALISA framework to
trigger alarms or activate decision agents. We provide ApMon
implementations for 5 programming languages: C, C++, Java, Perl and
Python. The library is easy to be used in complex data processing
programs as well as from scripts or utility programs and has the
advantages of flexibility, dynamic configuration, high communication
performance, structured storage of the information in MonALISA
databases.
ApMon's installation process is simple and does not require other
software components, which makes it a quick and non-expensive solution
for job/system monitoring. The routines provided by the library handle
the encoding of the monitoring data in the XDR representation and the
building and sending of the UDP datagrams. As shown in the following
diagram, the applications can use the API to send any specific
parameter values to one or more MonALISA services.
The use of the UDP protocol brings the advantage of the high
communication performance (hundreds and even thousands of datagrams per
second can be exchanged between ApMon and the MonALISA services). The
loss of some datagrams is not fatal as we are interested rather in
obtaining statistical information than in recording 100% of the
collected data.
The addresses of the MonALISA services to which ApMon sends the
monitoring data can be specified directly in the code or in local or
remote (http-accessible) files. ApMon is able to re-configure itself
dynamically, by periodically reloading the files and web pages.
In order to avoid the security problems, the MonALISA services have
the possibility to decide upon a policy by which the source from where
messages originate can be controlled. There are two ways to specify
which ApMon datagrams are accepted: by establishing a password that the
datagrams must contain or by setting a list with IPs that are
allowed/denied; these two methods may be used simultaneously.
Automated Job & System Monitoring
Since version 2.0, ApMon has as a new feature the
possibility to send, in a background thread, additional datagrams which
contain monitoring information regarding the system and/or the
application that uses ApMon. The system monitoring datagrams include
the current values for parameters like the CPU load, the number of
processes currently running, the amount of free memory, disk usage etc.
ApMon obtains the current values for parameters like memory/swap usage
or the number of processes currently running. For other parameters, the
values are averaged on the time interval between the moment when the
last monitoring datagram was sent and the current moment. Such
parameters are the percentage of CPU user/system/nice time from the
total time or the average amount of KB transferred per second through
each network interface. The job monitoring datagrams contain values for
parameters like the amount of memory, disk and CPU time consumed by the
application. Multiple jobs (determined by their parent pid and working
directories) can be monitored simultaneously; if a job is multithreaded
or has created children processes, all its threads/sub-processes are
considered when calculating the amount of resources consumed.
All the ApMon versions are written for Linux, the monitoring
information being obtained from the proc/ filesystem; an exception to
this is the Java version, which can be used both on Linux and Windows.
Performance Measurements
A set of measurements was done to evaluate the capacity of MonALISA
to receive and process ApMon datagrams at high rates, by estimating the
CPU usage (calculated as the percent of the time when MonALISA used the
CPU from the total running time). MonALISA was run on a Pentium 4 (2.6
GHz) machine, with 1 GB RAM, and the ApMon datagrams were sent from
hosts in the same LAN. The results (see the graph below) show the
MonALISA CPU usage, which had 70% as the maximum limit. The ApMon?€™s
CPU usage was close to 0. As can be seen from the graphic, MonALISA is
able to process messages at very high rates (up to 5000 messages per
second). In order to obtain this performance, the size of the receive
buffer for the socket on which MonALISA listens for ApMon messages was
increased to 512KB. At frequencies higher than 5000 messages per
second, we observed that a part of the messages were lost.
ApMonConfGen - Configuration generators for ApMon
The addresses of the MonALISA services, where the monitoring data is
sent, together with other ApMon settings, can be generated dynamically
by a servlet or CGI script. See the downloads section for a simple
example of such configuration generators. This demonstrates how one can
generate different configurations for ApMon based on the IP address of
the request, or some parameters received in the query. Being able to
generate the configuration based on the IP address of the request can
be useful if for example you have a distributed system that allows
users to run jobs and each worker node where the job is run you want to
report the data to the closest MonALISA service without having to know
apriori on which site the job would run. Generating the config based on
parameters is useful for example to send the the information from 2 or
more different applications to different dedicated MonALISA services.
In order to configure MonALISA to listen for UDP datagrams please see Section 2.3.5 from the
Service Configuration Guide.
ApMon can be obtained from the
MonALISA download page.