Upgrading and maintaining RSF-1 cluster nodes

Introduction

This guide documents the process of performing maintenance of RSF-1 cluster nodes with minimal downtime. Maintenance includes the following possible scenarios:

Upgrading the base OS.
Upgrading RSF-1 software.
General hardware maintenance.

The maintenance process can be broken down into the following steps:

Set services to manual on all nodes.
Move all services to one cluster node.
Perform maintenance on the non service running node(s).
Check cluster health once maintenance is complete on a node.
Move services over to the upgraded node.
Repeat the process for the next node.
Final steps.

1. Set services to manual on all nodes.

Setting all services to manual is a safety measure to prevent any unwanted failover/migrations during this process. Note, this action will NOT stop any running services.

Using the Webapp
Select Dashboard on the main menu, then select ⋮ on the Services panel, and then Set all services to manual

QS image 1

Using the CLI


# /opt/HAC/RSF-1/bin/hacli service manual --name <servicename>  --node <nodename>
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 0.032,
  "error": false,
  "output": "Putting <servicename> in manual mode on appliance <nodename>"
}

Note that when using the CLI it is necessary to set all services to manual on each node. For example if we have ServiceA clustered on nodes NodeA and NodeB, it is necessary to issue the following CLI commands:


# /opt/HAC/RSF-1/bin/hacli service manual --name ServiceA  --node NodeA
.
.
.
# /opt/HAC/RSF-1/bin/hacli service manual --name ServiceA  --node NodeB
.
.
.

2. Move all services to one cluster node.

All running services should now be moved to a single cluster node so the other node(s) in the cluster are free for maintenance procedures to be performed.

Using the Webapp
From the main dashboard select a service on the Services panel and then select ⋮ next to the running instance and finally Move <service> to <node>. This operation needs to be performed for each running service.

QS image 5

Using the CLI


[root@mgc71 ~]# /opt/HAC/RSF-1/bin/hacli service move --name <servicename> --dest <nodename>
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 7.055,
  "error": false,
  "output": "Service <servicename> is now moving to node <nodename>"
}

3. Perform maintenance on the non service running node(s).

With all services running on a single node, maintenance may now be performed on the other node(s) in the cluster. Before performing maintenance consider undertaking the following where applicable:

If possible perform a snapshot/backup of the complete system.
If applicable, make copies of all licenses.
If performing hardware upgrades/additions, if possible verify compatability and check for conflicts.
Best practice is to perform a test reboot once maintenance is complete.

4. Check cluster health once maintenance is complete on a node.

Once maintenance has been performed on any cluster node, check that the node has sucessfully rejoined the cluster and is operating normally using the following checklist:

All heartbeats are up and running.
All cluster nodes can see each other and agree on the state of services.
On the upgraded node all services should be marked as manual and unblocked.
Clients of cluster services are still operating as normal (this is specific to the individual setup, i.e. NFS/SMB/iSCSI shares, application clients etc).

Using the Webapp
Navigate to Dashboard - cluster health overview should all be OK.

QS image 2

If RSF-1 has been upgraded navigate to Help==>About and check the version displayed is correct.

QS image 3

Using the CLI
Interrogate the status of the cluster and check the health fields of the returned object (there should be no alerts and all fields should be marked OK).


# /opt/HAC/RSF-1/bin/hacli cluster info
{
  "timeout": 40,
  "errorMsg": "",
  "execTime": 0.053,
  "error": false,
  "output": {
    "bootstrap": 0,
    "cacheTime": 1,
    "clusterName": "example",
    "crc": "d9ac",
    "description": "No description given",
    "fcMonitoringEnabled": false,
    "health": {
      "alerts": [],
      "clusterHealth": "OK",
      "networkHeartbeatsHealth": "OK",
      "nodesHealth": "OK",
      "servicesHealth": "OK"
    },
  }
  .
  .
  .
}

5. Move services over to the upgraded node.

Once the previous upgrade step has completed sucessfully, services can now be moved to the newly upgraded node and maintenance performed on the other node(s) in the cluster.

Using the Webapp

From the main dashboard select a service on the Services panel and then select ⋮ next to the running instance and finally Move <service> to <node>. This operation needs to be performed for each running service.

QS image 4

Using the CLI
Again, this will need to be performed for each running service:


# /opt/HAC/RSF-1/bin/hacli service move --name <servicename> --dest <nodename>
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 7.051,
  "error": false,
  "output": "Service <servicename> is now moving to node <nodename>"
}

6. Repeat the process for the next node.

Once the running services have been migrated to the upgraded node and confirmed to be operating as expected, the other cluster node(s) can now be maintained.

7. Final steps.

Once all nodes in the cluster have been sucessfully upgraded, services can be migrated to their normal nodes. Once the migration is complete a final check on the services should be perfromed from a client perspective.