MonALISA Grid Monitoring
Menu mode: dynamic | fixed
Last update on:
Dec 03, 2015

Uptime: 48 days, 22h, 39m
Number of requests: 4022739
since 28 October 2005
MonALISA Extensions Guide

MonALISA Extensions Guide

Chapter 1. Pluggable Components Interfaces

1. Monitoring Modules

1.1. Introduction

New monitoring modules can be easily developed. These modules may use SNMP requests or can simply run any script (locally or on a remote system) to collect the requested values. The mechanism to run these modules under independent threads, to perform the interaction with the operating system or to control a SNMP session are inherited from a basic monitoring class. The user basically should only provide the mechanism to collect the values, to parse the output and to generate a result object. It is also required to provide the names of the parameters that are collected by this module.

While the modules currently provided with MonALISA are integrated in the service binary distribution, the source code of some example modules is provided in the ${MonaLisa_HOME}/Service/usr_code directory. This is also the directory in which the users can develop their own modules. The next section contains instructions for creating and running new modules.

1.2. How to Write a New Module

Creating a new module means writing a class that extends the lia.Monitor.monitor.cmdExec class and implements lia.Monitor.monitor.MonitoringModule interface.

This interface has the following structure:

     package lia.Monitor.monitor;
     public interface MonitoringModule extends lia.util.DynamicThreadPoll.SchJobInt {
     public MonModuleInfo init( MNode node, String args ) ;
     public String[] ResTypes() ;
     public String   getOsName();
     public Object   doProcess() throws Exception ;
     public MNode getNode();
     public String getClusterName();
     public String getFarmName();
     public boolean isRepetitive() ;
     public String getTaskName();
     public MonModuleInfo getInfo();

The doProcess function is actually the function that collects and returns the results. Usually the return type is a vector of lia.Monitor.monitor.Result objects, but it can also be a single Result object.

The init function initializes the useful information for the module, like the name of the cluster that contains the monitoring nodes, the name of the farm and the parameters for this module. This function is the first called when the farm loads the module. The second parameter of the function represents the list of parameters provided for the module in the farm configuration file (see the section on activating the modules), which should be parsed to obtain the parameter values.

The isRepetitive function tells if the module has to collect results only once or repetitively. The return values is the isRepetitive module's boolean variable. If true, then the module is called from time to time. The repetitive time is specified in the <farm>.conf file. If not there, then the default repetitive call time is 30s.

The other functions return different module information, that is usually set in the init() method. In the source code examples from usr_code you can find models for writing these functions.

1.3. How to Activate a New Module

.... (myFarm.conf) ...

In order for MonALISA to be able to load the new module, the path to the module's directory should be added to the CLASSURLs property from the ${MonaLisa_HOME}/Service/ file. For example:


Multiple directories can be specified here separated by commas.

1.4. Examples

Examples to generate new modules can be found in ${MonaLisa_HOME}/Service/usr_code.

In usr_code/MDS there is an example of writing the received values into MDS. This is done using a unix pipe to communicate between the dynamically loadable java module and the script performing the update into the LDAP server.

Another simple example which simply prints all the values on sysout can be found on usr_code/SimpleWriter.

Another example to write the values into UDP sockets is in usr_code/UDPWriter.

2. Data Filters / Event Triggers

Filters allow to dynamically create any new type of derived value from the collected values. As an example it allows to evaluate the integrated traffic over last n minutes, or the number of nodes for which the load is less than x. Filters may also send an email to a list or SMS messages when predefined complex condition occur. These filters are executed in independent threads and allow any client to register for its output. They may be used to help application to react when certain conditions occur, or to help in presenting global values for large computing facilities.

Each Filter has it's own Thread in MonALISA Service, so that they can run independently from each other.
To write your own Filters/Triggers please follow the following steps:
  1. Your filter MUST extend lia.Monitor.Filters.GenericMLFilter
  2. It must have a constructor with a String param (the FarmName) in which you must call super(farmName). This constructor is used to dynamicaly instantiate your filter at runtime.
  3. Your filter MUST override the following methods:
    • public String getName()
      returns the Filter name

      It is a short name to identify data sent by your filter in the client. It is also used by MonALISA clients to inform the Service that they are interested in the data processed by this filter. It MUST be unique because all the filters in ML are identified by their name.

    • public String getName()
      returns the Filter name

      It is a short name to identify data sent by your filter in the client. It is also used by MonALISA clients to inform the Service that they are interested in the data processed by this filter. It MUST be unique because all the filters in ML are identified by their name.

    • public monPredicate[] getFilterPred()
      returns a vector of monPredicate(s)

      These predicates are used to filter only the interested results that they want to receive from the entire data flow. If it returns null, the filter will receive all the monitoring information.

    • public void notifyResult(Object o)

      This method is called every time a Result matches a predicate defined at b). The Filter could save this in a local buffer for future analysis, or it can take some real time decision(s)/action(s) if it is a trigger.

    • public Object expressResults()
      returns a vector of Gresults and/or Results

      This method is called from time to time to let the filter to process the data that it has received. It should return a Vector of Gresults and/or Results that will be further sent to all the registered clients, or null if no data should be sent to Clients (e.g. the filter is a trigger).

    • public long getSleepTime()
      returns a vector of Gresults and/or Results

      Returns a time(in milliseconds) for how often expressResults() should be called.
      E.g.: If this method returns 2*60*1000 the function expressResults() will be called every 2 minutes.

  4. In your file please add the path to the directory where filter has it's .class files. The parameter is lia.Monitor.CLASSURLs (if there are more filters/directories please separate them by ,(commas))
  5. In you must specify what filters should be loaded,separated by commas.
The Service/usr_code/FilterExamples directory contains some simple examples of dynamic filters One of them (ExTrigger) is a simple alarm which send an email if the Load5 parameter on master node reaches a threshold value, and the other one (ExLoadFilter) computes min, max and mean value for a cluster. The data flux between MonALISA Service and clients can contain, more or less, the following two classes:

3. Autonomous Agents

Agents are entities loaded on MonALISA service that process the monitoring gathered data and communicate between them for resolving a distributed task based on these data.

An agent respects a given interface. Writing an agent actualy means creating a class that implements lia.Monitor.monitor.AgentI interface. This interface has the following structure:

     import lia.Monitor.DataCache.AgentsCommunication;
     import lia.Monitor.monitor.AgentInfo ;
     public interface AgentI { 
     public void init(AgentsCommunication comm);
     public void doWork();
     public String getName();
     public String getGroup();
     public String getAddress();
     public AgentInfo getAgentInfo ();
     public void processMsg(Object msg);
     public void processErrorMsg (Object msg);  

For an agent to be able to communicate, the agent-to-agent communication environment has to be initiated. An agent can do this by implementing the init method. This method is called by the Agents Engine when first loading the agent.

Agents hosted on the monitoring service usually communicate using the agents communication platform created over the tcp connections to all the proxy services. The communication is one reliabe, secure, fast and scalable.

The AgentCommunication has methods to send agent-to-agent messages (the sendMsg method), or agent-to-proxy message (the sendCtrlMsg method) for getting information about other agents from the distributed system (the list of agents from a group or the number of agents from a group).

     package lia.Monitor.DataCache;
     public interface AgentsCommunication {
     public void  sendMsg (Object o);
     public void  sendCtrlMsg (Object o, String cmd); 

Messages sent between agents are of a specified format:

     public class AgentMessage implements {
     public Integer messageID;
     public Long timeStamp;    
     public Integer messageType; 
     public Integer priority; 
     public String agentAddrS; 
     public String agentAddrD; 
     public String agentGroupS;
     public String agentGroupD;
     public Integer messageTypeAck ;
     public Object message ;

In the messages sent between clients there are the following fields:

- messageID - an integer number for messages sequance.

- timeStamp - time in milliseconds when the messages was sent from the source.

- messageType - type of the message.

- priority - messages priority, a number between 1 and 10, default 5. If the priority is high, the message is forwarded faster by the proxy service than the other messages.

-agentAddrS - address of the source agent.

- agentAddrD - address of the destination agent(s). Can be a multicast address sent to all the agents registered in a group.

- agentGroupS - the group of the source agent. If the source agent hasn't had registered in a group yet, then this field is null. When specified for the first time, the agent registers in the group. If is the first agent that registeres in the specified group, then the new group is created in the proxy service.

- agentGroupD - the group of the destination agent.

- messageTypeAck - if its an ACK message, then a confirmation is required when reaching the destination.

- message - the effective message transmitted. Can be any serializing object.

What an agent does is implemented in the doWork function. An agent is loaded on the monitoring service calling the addAgent function from the lia.Monitor.DataCache.AgentsEngine. Anytime an agent is loaded a new execution thread is created. This thread executes the agent's dowork function.

An agent is identified in the monitoring service by its name. Every agent has to have a unique name. Based on this name and on the monitoring service (hosting service) ID, an agent has a distinct address in the whole distributed system, agentName@farmID. Also, an agent can register itself in an agent group. Agent groups make possible multicast messages sent to all agents registered in a group. If the agent doesn't want to register in a group, it doesn't set the group field. All the information about agent's name, group, address can be known by calling getName , getAddress or getAgentInfo methods. For the last mentioned method, an object of AgentInfo type is returned, containing all the information about an agent. The lia.Monitor.monitor.AgentInfo class has the following structure:

     public class AgentInfo {
     public String agentName;
     public String agentGroup;
     public String farmID;
     public String agentAddr;
     public AgentInfo (String agentName, String agentGroup, String farmID) {
     this.agentName = agentName ;
     this.agentGroup = agentGroup;
     this.farmID = farmID;
     this.agentAddr = agentName+"@"+farmID;

Messages can be received from other agents in the distributed system. Messages are process by the processMsg method.

If a message sent by the agent couldn't reach the destination, and error message returns to the sending agent to announce it about communication failure. The error message is processed by the processErrorMsg method.

An abstract class, lia.Monitor.Agents.AbstractAgent exists to simplify the agents developement. This class wraps the AgentI interface, defining all AgentI methods, except processMsg and doWork methods. There also is a method for messages creating:

     public AgentMessage createMsg( 
     int messageID, 
     int messageType, 
     int messageTypeAck, 
     int priority, 
     String agentAddrD, 
     String agentGroupD, 
     Object message);