MonALISA Grid Monitoring
Menu mode: dynamic | fixed
Last update on:
Dec 03, 2015

Uptime: 174 days, 10h, 15m
Number of requests: 5643810
since 28 October 2005

How MonALISA is used to Monitor, Administrate and Self-Organize the VRVS System

VRVS ( is complex video conferencing system developed by CALTECH. The VRVS reflector is the key engine in distributing video and audio streams among users connected in a video conference. VRVS reflectors are monitored and controlled by MonALSIA services and agents. The MonALISA system provides the overall framework and monitoring foundation, to make the system able to take dynamic and automatic action to optimize the real-time dataflow among the reflectors, as well as the clients connected in a collaborative session.

The monitoring system provides real-time information about the number of conferences and clients, the topological connectivity of the reflectors, the quality of the network connectivity along each path in the reflector network, along with information on the state of the servers in the network (I/O traffic, CPU load). Clients can also obtain historical data for any of these parameters. A typical monitoring display is shown in Figure 1.

For each VRVS reflector, a MonALISA service stores the results it monitors locally in an embedded database, in a mode that aims to minimize the reflector resources it uses (typically less than 16 MB of memory and practically no system load). Dedicated modules have been developed to interact with the VRVS reflectors to (1) collect information about the topology of the reflector network, (2) monitor and track the traffic among the reflectors and report communication errors with the peers, and (3) track the number of clients and active virtual rooms. In addition, overall system information is monitored and reported in real time for each reflector (e.g. load, CPU usage, and total traffic in and out).

The subscription mechanism allows an administrator or authorized user to monitor in real time any measured parameter in the system. This is because all the updates are dynamically displayed in the open windows of the GUI. Examples of some of the services and information available are:

  • visualizing the number of clients and the active virtual rooms,
  • the traffic in and out of all the reflectors,
  • problems such as lost packets between reflectors.

Figure 1: Monitoring the VRVS reflectors and the connectivity among them.

Global Repository for the entire VRVS system

A generic framework for building "pseudo-clients" for the MonALISA services was developed. This has been used for creating dedicated Web service repositories with selected information from specific groups of MonALISA services. The pseudo-clients use the same set of lookup services to find all the active MonALISA services in a specified set of groups. They subscribe to these services according to a list of predicates and filters that specify the information the pseudo-client wants to collect. The pseudo-client stores all the values it receives from the running services in a local MySQL database. It uses procedures written as Java threads to compress old data. Currently a VRVS repository (at ) is used to keep the long-time history for all the monitoring information collected from the entire system.

Secure automated control of the VRVS system

Maintaining and controlling large scale distributed systems such as VRVS can be a very time consuming job. For this reason, in addition to dedicated monitoring modules and filters for VRVS, we have developed agents that are able to supervise the running of the VRVS reflectors automatically. This is proving to be essential, as the VRVS system infrastructure and the size of its user community both continue to scale up.

If a VRVS reflector stops (or does not answer monitoring requests correctly) the agent will try to restart it automatically. This is done for the entire VRVS reflector or only for specific components such as the H.323 agent. If this operation fails twice, the agent will send a notification email or SMS message to a list of administrators. Such agents are the first generation of a family of modules that are capable of reacting to, and taking well- defined actions following errors in the system. They can be dynamically loaded with new operating code. For security reasons, the code sent to the agents is digitally signed with trusted certificates that are declared for each running service.

MonALISA also provides an administrative graphical user interface which connects to any reflector using SSL with X.509 certificates. Each MonALISA service keeps a list of trusted administrator certificates in a private keystore. An authenticated administrator is allowed to update the VRVS software, stop/restart a reflector, and load agents with new monitoring modules.

We developed a global service which subscribes to connectivity information from all the VRVS reflectors and analyzes it 24 X 7 in real-time. It has several levels of alarm- triggers and informs the relevant VRVS, site or network administrator(s) by email when it detects loops in the connection topology, when reflectors are asymmetrically connected, or when the quality of connections is too low.

Optimized Dynamic Routing

With the help of MonALISA, the VRVS system maintains a connectivity tree that links the reflectors. This tree is used to compute the optimal routes for the videoconferencing data streams dynamically, based on information about the quality of alternative possible connections between each pair of reflectors. If one or more links goes down or is substantially degraded, the tree is rebuilt and re-optimized in real-time, making VRVS resistant to failures. A typical graph (a snapshot of a picture that evolves with time) illustrating the complexity of the set of interconnections managed by MonALISA is shown in Figure 2.

Figure 2: A MonALISA display of the connectivity of the VRVS reflector network

In order to find the set of interconnections that optimizes the global data flow, the reflectors and all the potential peer connections are represented as a graph (Figure 3). The problem is then to find the "minimal spanning tree" (MST) that links all of the reflectors (represented by vertices in the graph) for which the total connection "cost" is minimized. We have developed monitoring agents to provide the information needed to compute the MST. These agents are deployed in all MonALISA services, and run continuously (typically every two seconds) making measurements with a selected set of potential peers, using small UDP packets to evaluate the Round Trip Time (RTT), its jitter and the percentage of lost packets. The "cost" of each connection between two reflectors is then evaluated in real-time using the UDP measurements taken from both sides. If lost packets are detected or if the jitter is high on a link, the cost value for that connection in the tree increases rapidly.

Based on the values provided by the deployed agents, the MST is calculated in near real-time. The MST solution is obtained using an implementation of Baruvka's Algorithm, which is well suited for parallel/distributed processing. Once a link is part of the MST a "momentum" factor is attached to that link, which prevents the assigned cost from varying too rapidly. This is to avoid triggering too many reconnections in the tree in response to small fluctuations in the set of measurements. Such cases may occur when two possible peers have very similar parameters (or they may be at the same location). Figure 3 also shows an example of an MST for connecting the VRVS reflectors, represented by the thick lines in the figure.

Figure 3: The Minimum Spanning Tree connections and the peer-to-peer connection quality for a set of VRVS reflectors

LISA: End-System Agents to Extend Intelligence to the Edge

LISA is a lightweight monitoring agent that runs on any end-user's system (Linux, Windows, or MacIntosh) using Java Web Start technology and is part of the MonALISA distribution. The LISA agent detects the architecture on which it is deployed and dynamically loads the binary applications necessary to perform monitoring and end-to-end network performance measurements. It uses MonALISA lookup services to discover and register with the services and applications it needs, based on a set of attributes (see Figure F.4). As it monitors the end-system and network state, it reports all the monitored values to the relevant MonALISA services. When using an external MonALISA service, the LISA agent reports the real IP address and domain name of the computer on which the agent is running, and whether a network address translation (NAT) is being used. This allows the external service to contact the end-system as needed.

LISA discovers the running reflectors (i.e., the Panda servers) that are good possible candidates to be used by the (Koala) client. This is done by detecting network proximity (Panda servers in the same network, region or country) as well as the load on each of the candidate servers and the current number of clients each one is serving. It creates a short list of candidate "best" reflectors, taking the load on each reflector as well as the network connectivity to it into account. .

End-to-end network performance measurements are performed periodically with the reflectors on the short list, and based on these measurements the LISA agent provides the best candidate to the application program. These measurements are continuously performed in the background and in case the connectivity with the "best service" changes, it will notify the application to reconnect to the new "best service". This process is shown schematically in Figure 4.

Figure 4: A schematic view of how LISA agents discover MonALISA services, then use the MonALISA services to connect to the best candidate application instance (a Panda server in the case of EVO). As the selection process proceeds, the MonALISA services perform dynamic load balancing by considering the load on each server

Figure 5: Administration and monitoring functionalities provide by MonALISA