Creating a ZFS HA Cluster using shared or shared-nothing storage

This guide goes through a basic setup of a RSF-1 ZFS HA cluster. Upon completion the following will be configured:

A working Active-Active cluster with either shared or shared-nothing storage
A clustered service sharing a ZFS pool (further services can be added as required)
A virtual hostname by which clients are able access the service

Introduction

RSF-1 supports both shared and shared-nothing storage clusters.

Shared Storage

A shared storage cluster utilises an common set of storage devices that are accessible to both nodes in the cluster (housed in a shared JBOD for example). A ZFS pool is created using these devices and access to that pool is controlled by RSF-1.

Pool integrity is maintained by the cluster software using a combination of redundant heartbeating and PGR3 disk reservations to ensures any pool in a shared storage cluster can only be accessed by a single node at any one time.

Shared-Nothing

A shared-nothing cluster consists of two nodes, each with their own locally accessible ZFS storage pool residing on non shared storage:

Data is replicated between nodes by an HA synchronisation process. Replication is always done from the active to the passive node, where the active node is the one serving out the pool to clients:

Should a failover occur then synchronisation is effectively reversed:

Before creating pools for shared nothing clusters

To be eligible for clustering the storage pools must have the same name on each node in the cluster
It is strongly recommended the pools are of equal size, otherwise the smaller of the two runs the risk of depleting all available space during synchronization

Download cluster software

If not already done so, download and install the RSF-1 cluster software onto each cluster node. More information can be found here.

Initial connection and user creation

Before starting

Please make sure that any firewalls in the cluster environment have the following ports open before attempting configuration:


- 1195 (TCP & UDP)
- 4330 (TCP)
- 4331 (TCP)
- 8330 (TCP)

If setting up a shared-nothing cluster, both nodes require ssh access to each other without a password. This is needed for the replication of the ZFS pool.

To connect to the RSF-1 GUI, direct your web browser to:


https://<hostname>:8330

Next, create an admin user account for the GUI. Enter the information in the provided fields and click the Submit button when ready:

QS image 1

Once you click the Submit button, the admin user account will be created and you will be redirected to the login screen. Login with the username and password just created:

QS image 2

Once logged in the main dashboard page is displayed:

QS image 3

Configuration and Licensing

Editing your /etc/hosts file

Before continuing, ensure the /etc/hosts file is configured correctly on both nodes. Hostnames cannot be directed to 127.0.0.1, and both nodes should be resolvable. Here is a correctly configured hosts file for two example nodes, node-a and node-b:


127.0.0.1 localhost
10.6.18.1 node-a
10.6.18.2 node-b

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

To begin configuration, click on Create/Destroy option on the side-menu (or the shortcut on the panel shown when first logging in). The Cluster Create page scans for clusterable nodes (those running RSF-1 that are not yet part of a cluster) and presents them for selection:

QS image 4

Now enter the cluster name and description, and then select the type of cluster being created (either shared-storage or shared-nothing).

If setting up a shared-nothing cluster an additional option to add a node manually is shown at the bottom of the page. This is because RSF-1 will detect nodes on the local network, but for shared-nothing clusters, the partner node could be on a separate network/location, and therefore may not automatically be detected¹.

QS image 24

Trial Licenses

If any of the selected nodes have not been licensed, a panel is shown to obtain 45 day trial licenses: QS image 6

Next, the RSF-1 End User License Agreement (EULA) will be displayed. Click accept to proceed:

QS image 7

Once the license keys have been successfully installed, click the Create Cluster button to initialize the cluster:

QS image 23

Creating a Pool in the WebApp

If a zpool isn't already created, this can be done via the WebApp. Click Volumes on the side menu, then +Create:

QS image Pools 5

Enter the desired Pool Name and select a Pool Mode (jbod, raidz2 or mirror). Add your drives to the pool by selecting them in the list and choosing their role using the buttons at the bottom.

QS image Pools 6

To configure multiple mirrors in a pool, select the first set of drives from the list and add them as data disks. Next select your next set of drives, and click data then New mirror:

QS image Pools 7

Once configured, click submit and your pool is created and ready to be clustered:

QS image Pools 8

Preparing Pools to Cluster

Pools must be imported on one of the nodes before they can be clustered. Check their status by selecting the Volumes option on the side menu.

Shared-nothing clusters

For a shared-nothing cluster, the pools will need to have the same name and be individually imported on each node manually.

QS image Pools 1

In the above example pool1 and pool2 are exported, snpool is imported. To import pool1 first select it:

QS image Pools 2

The select Actions, followed by Import Pool:

QS image Pools 4

The status of the pool should now change to Imported and CLUSTERABLE:

QS image Pools 3

Unclusterable Pools

Should any issues be encountered when importing the pool it will be marked as UNCLUSTERABLE. Check the RestAPI log (/opt/HAC/RSF-1/log/rest-operations.log) for details on why the import failed. With a shared-nothingcluster, this may happen if the pools aren't imported on both nodes.

The pool is now ready for clustering.

Clustering a Pool

Highlight the desired pool to be clustered (choose only pools marked CLUSTERABLE ), then select Actions followed by Cluster this pool:

QS image 10

Fill out the description and select the preferred node for the service:

What is a preferred node

When a service is started, RSF-1 will initially attempt to run it on it's preferred node. Should that node be unavailable (node is down, service is in manual etc) then the service will be started on the next available node.

QS image 11

With a shared-nothing pool the GUID's for each pool will be shown:

QS image 25

To add a virtual hostname to the service click Add in the Virtual Hostname panel. Enter the IP address, and optionally a hostname, in the popup. For nodes with multiple network interfaces, use the drop down lists to select which interface the virtual hostname should be assigned to. Click the next button to continue:

QS image 12

Finally, click the Create button:

QS image 13

The pool will now show as CLUSTERED:

QS image 14

View Cluster Status

To view the cluster status, click on the Dashboard option on the side-menu:

QS image 15

The dashboard shows the location of each service and the respective pool states and failover modes (manual or automatic). The dashboard also allows the operator to stop, start and move services in the cluster. Select a pool then click the ⋮ button on the right hand side to see the available options:

QS image 16

Cluster Heartbeats

To view cluster heartbeat information select the Heartbeats option on the left side-menu:

QS image 20

To add an additional network heartbeat to the cluster, select Add Network Heartbeat Pair. In this example an additional connection exists between the two nodes with the hostnames mgub01-priv and mgub02-priv respectively. These hostnames are then used when configuring the additional heartbeat:

QS image 21

Click Submit to add the heartbeat. The new heartbeat will now be displayed on the Heartbeats status page:

QS image 22

This completes basic cluster configuration.

RSF-1 uses broadcast packets to detect cluster nodes on the local network. Broadcast packets are usually blocked from traversing other networks and therefore cluster node discovery is usually limited to the local network only. ↩