The Data Collection Engine
The system monitors and tracks site computing farms and network links, routers and switches using SNMP, and it dynamically
loads modules that make it capable of interfacing existing monitoring applications and tools (e.g. Ganglia, MRTG, LSF, PBS,
Hawkeye.). The core of the monitoring service is based on a multithreaded system used to perform the many data collection tasks
in parallel, independently. The modules used for collecting different sets of information, or interfacing with other monitoring tools,
are dynamically loaded and executed in independent threads. In order to reduce the load on systems running MonALISA, a dynamic pool
of threads is created once, and the threads are then reused when a task assigned to a thread is completed. This allows one to run
concurrently and independently a large number of monitoring modules, and to dynamically adapt to the load and the response time of
the components in the system. If a monitoring task fails or hangs due to I/O errors, the other tasks are not delayed or disrupted,
since they are executing in other, independent threads. A dedicated control thread is used to stop properly the threads in case of
I/O errors, and to reschedule those tasks that have not been successfully completed. A priority queue is used for the tasks that
need to be performed periodically. A schematic view of this mechanism of collecting data is shown in figure below.
This approach makes it relatively easy to monitor a large number of heterogeneous nodes with different response times, and at
the same time to handle monitored units which are down or not responding, without affecting the other measurements.
A Monitoring Module is a dynamic loadable unit which executes a procedure (or runs a script / program or
performs SNMP request) to collect a set of parameters (monitored values) by properly parsing the output of the procedure. In
general a monitoring module is a simple class, which is using a certain procedure to obtain a set of parameters and report them
in a simple, standard format.
Monitoring Modules can be used for pulling data and in this case it is necessary to execute them with a predefined frequency
(i.e. a pull module which queries an webservice) or to "install" (has to run only once) pushing scripts (programs) which are sending the monitoring results (via SNMP, UDP or
TCP/IP) periodically back to the Monitoring Service. Allowing to dynamically load these modules from a (few) centralized sites
when they are needed makes much easier to keep large monitoring systems updated and to provide new functionalities dynamically.
Users can implement easily any new dedicated modules and use it the MonALISA framework.
ApMon can be also used by any application to push monitoring information to MonALISA services.
The monitoring data is sent as UDP datagrams to one or more hosts running MonALISA services.
Applications can periodically report any type of information the user wants to collect, monitor or use in the MonALISA framework to trigger alarms or activate decision agents.
We provide ApMon implementations for 5 programming languages: C, C++, Java, Perl and Python.
More details about the ApMon can be found in this section.