MonALISA Grid Monitoring
Menu mode: dynamic | fixed
Last update on:
Dec 03, 2015

Uptime: 174 days, 10h, 17m
Number of requests: 5643811
since 28 October 2005
Application Monitoring API

ApMon is a set of flexible APIs that can be used by any application to send monitoring information to MonALISA services. The monitoring data is sent as UDP datagrams to one or more hosts running MonALISA services. Applications can periodically report any type of information the user wants to collect, monitor or use in the MonALISA framework to trigger alarms or activate decision agents. We provide ApMon implementations for 5 programming languages: C, C++, Java, Perl and Python. The library is easy to be used in complex data processing programs as well as from scripts or utility programs and has the advantages of flexibility, dynamic configuration, high communication performance, structured storage of the information in MonALISA databases.

ApMon's installation process is simple and does not require other software components, which makes it a quick and non-expensive solution for job/system monitoring. The routines provided by the library handle the encoding of the monitoring data in the XDR representation and the building and sending of the UDP datagrams. As shown in the following diagram, the applications can use the API to send any specific parameter values to one or more MonALISA services.

The use of the UDP protocol brings the advantage of the high communication performance (hundreds and even thousands of datagrams per second can be exchanged between ApMon and the MonALISA services). The loss of some datagrams is not fatal as we are interested rather in obtaining statistical information than in recording 100% of the collected data.

The addresses of the MonALISA services to which ApMon sends the monitoring data can be specified directly in the code or in local or remote (http-accessible) files. ApMon is able to re-configure itself dynamically, by periodically reloading the files and web pages.

In order to avoid the security problems, the MonALISA services have the possibility to decide upon a policy by which the source from where messages originate can be controlled. There are two ways to specify which ApMon datagrams are accepted: by establishing a password that the datagrams must contain or by setting a list with IPs that are allowed/denied; these two methods may be used simultaneously.

Automated Job & System Monitoring

Since version 2.0, ApMon has as a new feature the possibility to send, in a background thread, additional datagrams which contain monitoring information regarding the system and/or the application that uses ApMon. The system monitoring datagrams include the current values for parameters like the CPU load, the number of processes currently running, the amount of free memory, disk usage etc. ApMon obtains the current values for parameters like memory/swap usage or the number of processes currently running. For other parameters, the values are averaged on the time interval between the moment when the last monitoring datagram was sent and the current moment. Such parameters are the percentage of CPU user/system/nice time from the total time or the average amount of KB transferred per second through each network interface. The job monitoring datagrams contain values for parameters like the amount of memory, disk and CPU time consumed by the application. Multiple jobs (determined by their parent pid and working directories) can be monitored simultaneously; if a job is multithreaded or has created children processes, all its threads/sub-processes are considered when calculating the amount of resources consumed.

All the ApMon versions are written for Linux, the monitoring information being obtained from the proc/ filesystem; an exception to this is the Java version, which can be used both on Linux and Windows.

Performance Measurements

A set of measurements was done to evaluate the capacity of MonALISA to receive and process ApMon datagrams at high rates, by estimating the CPU usage (calculated as the percent of the time when MonALISA used the CPU from the total running time). MonALISA was run on a Pentium 4 (2.6 GHz) machine, with 1 GB RAM, and the ApMon datagrams were sent from hosts in the same LAN. The results (see the graph below) show the MonALISA CPU usage, which had 70% as the maximum limit. The ApMon?€™s CPU usage was close to 0. As can be seen from the graphic, MonALISA is able to process messages at very high rates (up to 5000 messages per second). In order to obtain this performance, the size of the receive buffer for the socket on which MonALISA listens for ApMon messages was increased to 512KB. At frequencies higher than 5000 messages per second, we observed that a part of the messages were lost.

ApMonConfGen - Configuration generators for ApMon

The addresses of the MonALISA services, where the monitoring data is sent, together with other ApMon settings, can be generated dynamically by a servlet or CGI script. See the downloads section for a simple example of such configuration generators. This demonstrates how one can generate different configurations for ApMon based on the IP address of the request, or some parameters received in the query. Being able to generate the configuration based on the IP address of the request can be useful if for example you have a distributed system that allows users to run jobs and each worker node where the job is run you want to report the data to the closest MonALISA service without having to know apriori on which site the job would run. Generating the config based on parameters is useful for example to send the the information from 2 or more different applications to different dedicated MonALISA services.

In order to configure MonALISA to listen for UDP datagrams please see Section 2.3.5 from the Service Configuration Guide.

ApMon can be obtained from the MonALISA download page.