3.3. Understanding ShaoLin InfiniCluster Components

3.3.1. Resources

A resource can be shared by the nodes inside the application failover domain, such as storage, IP address. An application may have several resources, and before the node can run this application, it should first get all the resources own by the application. Resource is what you can use in system, it can be hardware or software. In order to let the cluster system to take related action for the resource, a correct resource type shall be selected to represent the resource. Resource type can be application, block device, filesystem, etc. The defined resource belongs with number of nodes. This number of nodes within a cluster considered as failover domain. Failover domain is this number of nodes which executing this resources in a priority list. When the resource is running in the node and failure existed, the resource will switch to the other node which is higher priority. A group with resource, resource type, and failover domain is considered as a resource group

3.3.1.1. Resource Group

A service or application is usually made up with one or many resources which form a Resource Group. Resources in a Resource Group will be controlled in a sequence such that to fulfill the application need. The sequence of the resources are predetermined by the system administrator and Will be controlled by ShaoLin InfiniCluster at run-time. For example, a web server Resource Group will require a virtual IP address, a file system mount for storing content, a volume device such as LVM to store the file system, and a web server application. These resources will be grouped and manage together at once with a predefined start/stop control sequence.

3.3.2. Service Monitoring Agents (SMA)

ShaoLin InfiniCluster is a software-based high-availability management software, which able to monitor and to recover specific applications or network services an resource monitoring. ShaoLin InfiniCluster Service Monitoring Agents (SMA) are SIC processes that detects and recovers application failures and detect resource availability. It features both simplicity and precision of application availability for your needs. SMA is a plug-in for application specific scenarios. It offers off-the-shelf support for a wide range of applications and different types of SMA.

3.3.2.1. Agent Architecture

SIC agents provide the capability to control a wide array of hardware and software resources. The agent abstraction makes it simple for a developer to support new and changing applications in the SIC control framework. The SIC agent framework is a set of common, predefined functions compiled into each agent. These functions include the ability to connect to the ShaoLin InfiniCluster Manager and to understand common configuration attributes. The agent framework frees the developer from developing support functions required by the cluster, and instead focus on controlling a specific resource type. For more information on developing agents in C and shell scripts, see the ShaoLin InfiniCluster Service Monitoring Agent Developer's Guide which is available in the purchase of the ShaoLin InfiniCluster Service Monitoring Agent Development Kit.

3.3.2.1.1. Agent Operations

Agents carry out specific operations on resources on behalf of the Cluster Manager. The functions an agent performs are entry points, code sections that carry out specific functions, such as start, stop, and status. Entry points can be compiled into the agent itself or can be implemented as individual Perl scripts. For details on any of the following entry points, ShaoLin InfiniCluster Service Monitoring Agent Developer's Guide.

3.3.2.2. Agent Types

System Resource Agents

Provide control and monitoring ability for system resources. This includes Ip address, file systems, volume managers, raid etc.

Application Agents

Provide monitoring for specific applications such as Oracle, DB2, Websphere etc.

Custom Agents

In order to provide a wider support for application availability, new Service Monitoring Agents are being developed continuously. SMA Development Kit is also available to create custom agents to monitor, fault-detect and recover virtually any application.

3.3.3. Cluster Infrastructure and Control

3.3.3.1. The ShaoLin General Parallel Cluster Infrastructure (SGPCI)

The SGPCI is a kernel cluster membership and communication software that provides the basic low level cluster communication and control. The SGPCI is a very important component to allow ShaoLin InfiniCluster to become a crash-safe process by providing a crash-safe, reliable, and high performance cluster communication system. TCP is a reliable network protocol but can create system race when gets to a low memory situation. The SGPCI uses UDP on IP networks or it can behave as a layer-2 protocol which directly deal with the datalink layer of the network interfaces.

3.3.3.1.1. Membership

Cluster membership is a way of telling whether a node is online in a cluster. This is totally done by the SGPCI, each heartbeat channel go up or down, or a node join-in or down is reported and controlled by the SGPCI. The SGPCi provide a reliable crash safe kernel-to-kernel cluster membership identification. A reliable cluster membership helps to prevent split-brain.

3.3.3.1.2. Heartbeat

ShaoLin InfiniCluster uses heartbeat as communication channel between cluster member. Basically, heartbeat is a kind of message send out by the cluster member periodically to tell other cluster members that it is alive. If don't receive heartbeat message from a member for a period of time, this node will be treat as dead, other cluster member will takeover the resource groups that it is owning. The heartbeat can be serial channel or Ethernet channel.

3.3.3.2. Load Balanced Communication

When multiple communication channels are defined between cluster nodes, all traffic between the nodes are communicated via the SGPCI network driver. The SGPCI network driver automatically distribute the network traffic other these communication channels. The way of traffic distribution uses a weighted round robin algorithm which ensures all communication channels gets a chance to transmit data, and they get a fair load according to the network bandwidth available. This way ensures the performance of cluster communication to maximum available bandwidth.

3.3.3.3. Reliable Data Communication

The SGPCI provides ShaoLin InfiniCluster a reliable network communication channel. The SGPCI support strong 128-bit checksums for data corruption detection and a reliable transmission control protocol. When working with multiple communication channels, failed communication channels are automatically excluded while data packets are automatically redirected to other available healthy communication paths.

3.3.3.4. Kernel Watch Dog and Kernel Kill

The SGPCI provides a kernel watch dog, it automatically and reliably monitor system load capacity, and provide a feature called KernelKill . When the system become unresponsive in case of errors and look like hang, other nodes can still issue a KernelKill command remotely which reliably restart or stop the kernel of the unresponsive node to make cause a reliable failover.

3.3.4. Cluster Manager

The Cluster Manager is the core cluster engine. It provides cluster configuration, application, system resources, and resource group control, and the core logic of the whole cluster management system.

3.3.4.1. Cluster Configuration Database (CCDB)

The Cluster Database is a strong 128-bit error correction cluster wide distributed journaling database system for storing cluster configurations. Each cluster node holds a copy of the database and the database instances automatically synchronizes. When cluster manager startup, the cluster database is checked and will automatically synchronize when needed.

3.3.4.2. Cluster Administration Daemon (CAD)

It is a daemon thread running inside the cluster manager. It accepts user administration commands and report cluster events to user interfaces.

3.3.4.3. Group Communication Service (GCS)

The Group Communicating Service (GCS) synchronizes the resource group ownership information at run-time over the failover domain. The GCS uses a distributed logic dead-lock-free protocol to synchronize runtime resource group ownership information, and makes decision for taking ownership of any resource group for the a cluster node.

3.3.4.4. System Resource Management Service

The System Resource Management Service (SRMS) provides the interface to control and monitor resources in the resource groups. The SRMS communicates with the Service Monitoring Agents (SMA) and execute resource control operations.

3.3.4.5. SIGKILL and Cluster Manager

We should never use "SIGKILL" the slicmgr because if we SIGKILL slicmgr, resources can remain online afterwards as slicmgr did not have a chance to clean up (make them offline). If other node sees slicmgr on the node goes offline, it will try to takeover, then serious problem will occur (split brain) .