VRVS (http://vrvs.org) is complex video conferencing system developed by CALTECH.
The VRVS reflector is the key engine in distributing video and audio streams among
users connected in a video conference. VRVS reflectors are monitored and controlled
by MonALSIA services and agents. The MonALISA system provides the overall framework
and monitoring foundation, to make the system able to take dynamic and automatic action
to optimize the real-time dataflow among the reflectors, as well as the clients connected
in a collaborative session.
The monitoring system provides real-time information about the number of conferences and
clients, the topological connectivity of the reflectors, the quality of the network connectivity
along each path in the reflector network, along with information on the state of the servers in
the network (I/O traffic, CPU load). Clients can also obtain historical data for any of these
parameters. A typical monitoring display is shown in Figure 1.
For each VRVS reflector, a MonALISA service stores the results it monitors locally in an
embedded database, in a mode that aims to minimize the reflector resources it uses (typically
less than 16 MB of memory and practically no system load). Dedicated modules have been developed
to interact with the VRVS reflectors to (1) collect information about the topology of the reflector
network, (2) monitor and track the traffic among the reflectors and report communication errors
with the peers, and (3) track the number of clients and active virtual rooms. In addition, overall
system information is monitored and reported in real time for each reflector (e.g. load, CPU usage,
and total traffic in and out).
The subscription mechanism allows an administrator or authorized user to monitor in real time
any measured parameter in the system. This is because all the updates are dynamically displayed
in the open windows of the GUI. Examples of some of the services and information available are:
- visualizing the number of clients and the active virtual rooms,
- the traffic in and out of all the reflectors,
- problems such as lost packets between reflectors.
Figure 1: Monitoring the VRVS reflectors and the connectivity among them.
Global Repository for the entire VRVS system
A generic framework for building "pseudo-clients" for the MonALISA services was developed.
This has been used for creating dedicated Web service repositories with selected information
from specific groups of MonALISA services. The pseudo-clients use the same set of lookup
services to find all the active MonALISA services in a specified set of groups. They subscribe to
these services according to a list of predicates and filters that specify the information the
pseudo-client wants to collect. The pseudo-client stores all the values it receives from the
running services in a local MySQL database. It uses procedures written as Java threads to
compress old data. Currently a VRVS repository (at http://pccit6.cern.ch:8080 ) is used to
keep the long-time history for all the monitoring information collected from the entire system.
Secure automated control of the VRVS system
Maintaining and controlling large scale distributed systems such as VRVS can be a very time
consuming job. For this reason, in addition to dedicated monitoring modules and filters for
VRVS, we have developed agents that are able to supervise the running of the VRVS reflectors
automatically. This is proving to be essential, as the VRVS system infrastructure and the size
of its user community both continue to scale up.
If a VRVS reflector stops (or does not answer monitoring requests correctly) the agent will
try to restart it automatically. This is done for the entire VRVS reflector or only for
specific components such as the H.323 agent. If this operation fails twice, the agent will
send a notification email or SMS message to a list of administrators. Such agents are the
first generation of a family of modules that are capable of reacting to, and taking well-
defined actions following errors in the system. They can be dynamically loaded with new
operating code. For security reasons, the code sent to the agents is digitally signed
with trusted certificates that are declared for each running service.
MonALISA also provides an administrative graphical user interface which connects to any
reflector using SSL with X.509 certificates. Each MonALISA service keeps a list of trusted
administrator certificates in a private keystore. An authenticated administrator is allowed
to update the VRVS software, stop/restart a reflector, and load agents with new monitoring modules.
We developed a global service which subscribes to connectivity information from all the VRVS
reflectors and analyzes it 24 X 7 in real-time. It has several levels of alarm- triggers and
informs the relevant VRVS, site or network administrator(s) by email when it detects loops in
the connection topology, when reflectors are asymmetrically connected, or when the quality of
connections is too low.
Optimized Dynamic Routing
With the help of MonALISA, the VRVS system maintains a connectivity tree that links the reflectors.
This tree is used to compute the optimal routes for the videoconferencing data streams dynamically,
based on information about the quality of alternative possible connections between each pair of
reflectors. If one or more links goes down or is substantially degraded, the tree is rebuilt and
re-optimized in real-time, making VRVS resistant to failures. A typical graph (a snapshot of a
picture that evolves with time) illustrating the complexity of the set of interconnections managed
by MonALISA is shown in Figure 2.
Figure 2: A MonALISA display of the connectivity of the VRVS reflector network
In order to find the set of interconnections that optimizes the global data flow, the reflectors
and all the potential peer connections are represented as a graph (Figure 3). The problem is then
to find the "minimal spanning tree" (MST) that links all of the reflectors (represented by vertices
in the graph) for which the total connection "cost" is minimized. We have developed monitoring
agents to provide the information needed to compute the MST. These agents are deployed in all
MonALISA services, and run continuously (typically every two seconds) making measurements with
a selected set of potential peers, using small UDP packets to evaluate the Round Trip Time (RTT),
its jitter and the percentage of lost packets. The "cost" of each connection between two reflectors
is then evaluated in real-time using the UDP measurements taken from both sides. If lost packets are
detected or if the jitter is high on a link, the cost value for that connection in the tree
increases rapidly.
Based on the values provided by the deployed agents, the MST is calculated in near real-time.
The MST solution is obtained using an implementation of Baruvka's Algorithm, which is well suited
for parallel/distributed processing. Once a link is part of the MST a "momentum" factor is attached
to that link, which prevents the assigned cost from varying too rapidly. This is to avoid triggering
too many reconnections in the tree in response to small fluctuations in the set of measurements.
Such cases may occur when two possible peers have very similar parameters (or they may be at the
same location). Figure 3 also shows an example of an MST for connecting the VRVS reflectors,
represented by the thick lines in the figure.
Figure 3: The Minimum Spanning Tree connections and
the peer-to-peer connection quality for a set of VRVS reflectors
LISA: End-System Agents to Extend Intelligence to the Edge
LISA is a lightweight monitoring agent that runs on any end-user's system (Linux, Windows,
or MacIntosh) using Java Web Start technology and is part of the MonALISA distribution.
The LISA agent detects the architecture on which it is deployed and dynamically loads the
binary applications necessary to perform monitoring and end-to-end network performance measurements.
It uses MonALISA lookup services to discover and register with the services and applications
it needs, based on a set of attributes (see Figure F.4). As it monitors the end-system and network
state, it reports all the monitored values to the relevant MonALISA services. When using an external
MonALISA service, the LISA agent reports the real IP address and domain name of the computer on
which the agent is running, and whether a network address translation (NAT) is being used. This
allows the external service to contact the end-system as needed.
LISA discovers the running reflectors (i.e., the Panda servers) that are good possible candidates
to be used by the (Koala) client. This is done by detecting network proximity (Panda servers in
the same network, region or country) as well as the load on each of the candidate servers and
the current number of clients each one is serving. It creates a short list of candidate "best"
reflectors, taking the load on each reflector as well as the network connectivity to it into
account. .
End-to-end network performance measurements are performed periodically with the reflectors
on the short list, and based on these measurements the LISA agent provides the best candidate
to the application program. These measurements are continuously performed in the background
and in case the connectivity with the "best service" changes, it will notify the application to
reconnect to the new "best service". This process is shown schematically in Figure 4.
Figure 4: A schematic view of how LISA agents discover MonALISA services, then use the MonALISA services to connect to the best candidate application instance (a Panda server in the case of EVO). As the selection process proceeds, the MonALISA services perform dynamic load balancing by considering the load on each server
Figure 5: Administration and monitoring functionalities provide by MonALISA