Autonomic Systems

Autonomic systems

Autonomic computing aims at providing systems and applications with self-management capabilities, including self-configuration, self-optimization, self-healing, and self-protection. These functions are performed by feedback control loops. More details on our approach to autonomic computing may be found here.

We currently investigate three aspects of self management, described below, using Jade as a common platform for the experiments. The managed system is a multitier J2EE server deployed on a cluster.

Self-optimization

The goal of self-optimization is to maintain optimal (or near-optimal) system performance in spite of wide variations of the load or of the amount of available resources. Performance may be measured by various criteria, such as average response time or average throughput for an Internet service, or bounded jitter for a video server, etc.

In the first experiments, using the Jade framework, we have selected an entire node as the resource allocation unit, and we have implemented a simple control loop based on thresholds. When the load (measured by CPU usage) goes over a preset threshold, a new node is allocated. When the load falls under another threshold, a node is released. Two instances of this control loop are used: one for the database tier, the other one for the application server tier. As a result, the system keeps the response time fairly stable under wide variations of the load.

Self-repair

The goal of self-repair is to restore the integrity of a system in the presence of failures. Up to now, we have considered a system running on a cluster of nodes, and recovery from node failures. Failures of a software component will be investigated in a further step.

This target system has the form of a set of interconnected components, equipped with wrappers according to the general scheme defined by Jade . Critical components are replicated, and the consistency of the replicas is maintained during execution. In addition, the knowledge base maintained by the repair manager contains a representation of the system state, including the configuration of each component, the bindings between the components, and the placement of the components on the nodes of the cluster. This representation contains all the necessary information to reconstruct the system after a node failure.