Skip to content
RSF-1 ZFS Cluster Software Configuration | High Availability

Configuration Guide

Introduction

This guide is for when a cluster and services have been created and configured; for details on creating clusters please see the following guides:


Terminology

Services

In an RSF-1 cluster a service refers to a ZFS pool that is managed by the cluster. The cluster may consist of one or more services under it's control, i.e. multiple pools. Furthermore an individual service may consist of more than one pool - refered to as a pool group, where actions perfromed on that service will be performed on all pools in the group.

A service instance is the combination of a service and a cluster node that that service is eligible to run on. For example, in a 2-node cluster each service will be configured to have two available instances - one on each node in the cluster. Only one instance of a service will be active at any one time.

Modes (automatic/manual)

Each service instance has a mode setting of either automatic or manual. The mode of a service is specific to each node in the cluster, so a service can be manual on one node and automatic on another. The meaning of the modes are:

AUTOMATIC

Automatic mode means the service instance will be automatically started when all of the following requirements are satisfied:

  • The service instance is in the stopped state
  • The service instance is not blocked
  • No other instance of this service is in an active state
MANUAL

Manual mode means the service instance will never be started automatically on that node.

State (running/stopped etc)

A service instance in the cluster will always be in a specific state. These states are divided into two main groups, active states and inactive states1. Individual states within these groups are transitional, so for example, a starting state will transition to a running state once the startup steps for that service have completed successfully, and similarly a stopping state will transition to a stopped state once all the shutdown steps have completed successfully (note that this state change stopping==>stopped also moves the service instance from the active state group to the inactive state group).

Active States

When the service instance is in an active state, it will be utilising the resources of that service (e.g. an imported ZFS pool, a plumbed in VIP etc.). In this state the service is considered up and running and will not be started on any other node in the cluster until it transitions to a inactive state; for example if a service is STOPPING on a node it is still in an active state, and cannot yet be started on any other node in the cluster until it transitions to a inactive state - see below for the definition of inactive states.

The following table describes all the active states.

Active State
Description
STARTING The service is in the process of starting on this node. Service start scripts are currently running - when they complete successfully the service instance will transition to the RUNNING state.
RUNNING The service is running on this node and only this node. All service resources have been brought online. For ZFS clusters this means the main ZFS pool and any additional pools have been imported, any VIPs have been plumbed in and any configured logical units have been brought online.
STOPPING The service is in the process of stopping on this node. Service stop scripts are currently running - when they complete successfully the service instance will transition to the STOPPED state.
PANICKING While the service was in an active state on this node, it was seen in an active state on another node. Panic scripts are running and when they are finished, the service instance will transition to PANICKED.
PANICKED While the service was in an active state on this node, it was seen in an active state on another node. Panic scripts have been run.
ABORTING Service start scripts failed to complete successfully. Abort scripts are running (these are the same as service stop scripts). When abort scripts complete successfully the service instance will transition to the BROKEN_SAFE state (an inactive state). If any of the abort scripts fail to run successfully then the service transitions to a BROKEN_UNSAFE state and manual intervention is required.
BROKEN_UNSAFE The service has transitioned to a broken state because service stop or abort scripts failed to run successfully. Some or all service resources are likely to be online so it is not safe for the cluster to start another instance of this service on another node.

This state can be caused by one of two circumstances:
  • The service failed to stop - for example, a zpool imported as part of the service startup failed to export during shutdown, or the cluster was unable to unplumb a VIP associated with the service, etc.
  • The service failed to start and abort scripts were run in order to undo any possible actions performed during service startup (for example if a zpool was imported during the start phase then the abort scrips will attempt to export that pool). However, during the abort process one of the abort actions failed and therefore the cluster was unable to shut the service down cleanly.
Inactive States

When a service instance is in an inactive state, no service resources are online. That means it is safe for another instance of the service to be started elsewhere in the cluster.

The following table describes all the inactive states.

Inactive State
Description
STOPPED The service is stopped on this node. No service resources are online.
BROKEN_SAFE This state can be the result of either of the following circumstances:
  • The service failed to start on this node but had not yet brought any service resources online. It transitioned directly to BROKEN_SAFE when it failed.
  • The service failed to start after having brought some resources online. Abort scripts were run to take the resources back offline and those abort scripts finished successfully.

Blocked (blocked/unblocked)

The service blocked state is similar to the service mode (AUTOMATIC/MANUAL) except that instead of being set by the user, it is controlled automatically by the cluster's monitoring features.

For example, if network monitoring is enabled then the cluster constantly checks the state of the network connectivity of any interfaces VIP's are plumbed in on. If one of those interfaces becomes unavailable (link down, cable unplugged, switch dies etc.) then the cluster will automatically transition that service instance to blocked.

If a service instance becomes blocked when it is already running, the cluster will stop that instance to allow it to be started on another node so long as there is another service instance in the cluster that is UNBLOCKED, AUTOMATIC and STOPPED, otherwise no action will be taken.

Also note, a service does not have to be running on a node for that service instance to become blocked - if a monitored resource such as a network interface becomes unavailable then the cluster will set the nodes service instance to a blocked state, thus blocking that node from starting the service. Should the resource become available again then the cluster will clear the blocked state.

The following table describes all the blocked states.

Blocked State
Description
BLOCKED The cluster's monitoring has detected a problem that affects this service instance. This service instance will not start until the problem is resolved, even if the service is in automatic mode.
UNBLOCKED The service instance is free to start as long as it is in automatic mode.

Dashboard

The Dashboard is the initial landing page when connecting to the webapp once a cluster has been created. It provides a quick overview of the current status of the cluster and allows you to perform operations such as stopping, starting and moving services between nodes:

dashboard-main-window

The dashboard is made up of three main sections along with a navigation panel on the left hand side:

  • The status panel located at the top of the page providing a instant view of the overall health of the cluster with node, service and heartbeat summary status.
  • The nodes panel detailing each nodes availability in the cluster along with its IP address and heartbeat status.
  • The services panel detailing the services configured in the cluster, which node thay are running on, if any, and any associated VIPs.

Clicking on the icon for an individual node or service brings up a context sensitive menu, described in the following sections.

Nodes panel

The nodes panel shows the status of each node in the cluster:

dashboard-node-panel

Clicking on a node opens a side menu that allows control of services known to that node. In the example above, clicking on the icon for node-a would bring up the following menu:

dashboard-node-popup

Available actions can then be viewed by clicking on the button in the right hand column for an individual service:

dashboard-node-popup-node-menu

Alternatively, the button on the Clustered Services row brings up a menu that performs actions on all services on that node:

dashboard-node-popup-multi-menu

Services Panel

The services panel shows the status of each service in the cluster:

dashboard-services-panel

Clicking on a service opens up a side menu that allows control of that service in the cluster. In the example above clicking on the icon for pool1 would bring up the following menu:

dashboard-service-popup

Available actions can then be viewed by clicking on the button in the right hand column for an individual service:

dashboard-service-popup-service-menu

New Services

When a service is added to an RSF-1 High Availability cluster, its state will initially be set to stopped / automatic and the cluster will start the service on the services' preferred node.


Clustering a Docker Container

These steps show the process of creating a Clustered docker container. The container will be created using a standard docker compose.yaml file.

  1. Navigate to HA-Cluster -> Docker in the webapp:

    Docker Image 1

  2. Click Cluster a Docker application to get to the creation/addition page and fill in the fields

    Available options:

    • Select HA Service - Select the service/pool to associate the container to in the event of a failover
    • Container Description - Optional description of the container
    • Location of compose.yaml file within selected service - The path in the selected pool/service to save the compose.yaml
    • Contents of compose.yaml file - Enter the contents of the compose.yaml file for the container
    • An Example compose.yaml:
    services:
      apache:
       image: httpd:latest
       container_name: my-apache-app
       ports:
        - 8080:80
       volumes:
        - ./website:/usr/local/apache2/htdocs
       restart: no
    

    Warning

    When adding your content, make sure to add restart: no to your service configurations. RSF-1 will manage the restart of clustered containers in the event of a failover

    Docker Image 2

    Docker Image 3

  3. When finished click Create.

    Docker Image 4

  4. By default the container will remain stopped until started. Click the Start button to spin up the container.

    Docker Image 5


Heartbeats

In the cluster, heartbeats perform the following roles:

  • To continually monitor the other nodes in the cluster, ensuring they are active and available.
  • Communicate cluster and service status to the other nodes in the cluster. Status information includes mode and state for every service on that node (manual/automatic running/stopped etc), along with any services that are currently blocked.
  • A checksum of the active cluster configuration on that node.

Configuration checksums

The configuration checksums must match on all cluster nodes to ensure the validity of the cluster; should a mismatch be detected then the cluster will lock the current state of the all services (active or not) until the mismatch is resolved. This safety feature protects against unexpected behaviour as a result of unsynchronised configuration.

The cluster supports two types of heartbeats:

  • Network heartbeats
  • Disk heartbeats

Heartbeats are unidirectional therefore for each heartbeat configured there will be two channels (one to send and one to receive).

The same information and structures is transmitted over each type of heartbeat. The cluster supports multiple heartbeats of each type. When the cluster is first created a network hearbeat is automatically configured between cluster nodes using the node hostnames as the endpoints. Disk heartbeats are automatically configured when a service is created and under normal circumstances require no user intervention.

It is recommended practice to configure network heartbeats across any additional network interfaces. For example, if the hostnames are on a 10.x.x.x network, and an additional private network exists with 192.x.x.x addresses, then an additional heartbeat can be configured on that private network. Using the following example hosts file an additional network heartbeat can be configured using the node-a-priv and node-b-priv addresses as endpoints:

10.0.0.1 node-a
10.0.0.2 node-b
192.168.72.1 node-a-priv
192.168.72.2 node-b-priv

By specifying the endpoint using the address of an additional interface the cluster will automatically route heartbeat packets down the correct network for that interface.

To view the cluster heartbeats navigate to HA-Cluster -> Heartbeats on the left side-menu:

heartbeats-main-window

Adding a Network Heartbeat

To add an additional network heartbeat to the cluster, select Add Network Heartbeat Pair. In this example an additional physical network connection exists between the two nodes. The end points for this additional network are given the names SAM node-a-priv and node-b-priv respectively. These hostnames are then used when configuring the additional heartbeat:

heartbeats-add-network

Click Submit to add the heartbeat. The new heartbeat will now be displayed on the Heartbeats status page:

heartbeat-main-window-additional-net

Removing a Network Heartbeat

To remove a network heartbeat select the heartbeat using the slider on the left hand side of the table and click the remove selected button:

heartbeat-main-window-remove-net

Finally, confirm the action:

heartbeat-main-window-remove-confirm-net

Disk heartbeats

Under normal circumstances it should not be necessary to add or remove disk heartbeats as this is handled automatically by the cluster.


Unix Users

Creating Users

Creating Unix users in the WebApp will create the user across all cluster nodes using the same credentials (Username, UID and GUID).

  1. In the WebApp, navigate to System -> UNIX Users, and click +Add:

    Users Image 1

  2. Enter the Username and Password, and provide any of the additional information if required:

    • List of Groups - Add the user to any available groups (optional)
    • UID/GID - Specify the User ID and Group ID of the user (optional - if unspecified the next available UID/GID will be used).
    • Add user to sudo group - This user will be able to issue commands as a different user (requires sudo package to be installed).
    • Enable SMB support for user - Adds this user to the valid Samba users.
    • Home Directory - Specify location for the user home directory (optional)
    • Shell - Specify the default shell for the user (optional)

    Users Image 2

  3. When done click SAVE. Once saved, the user will be created on all nodes in the cluster:

    Users Image 3

    Warning

    If the user name or UID specified already exists on any node in the cluster then the user add operation will fail with the message "Error creating user clusterwide..."

Modifying Users

To modify a user, click on the pencil icon on the left hand side of the user list table:

Users Image 4

Deleting Users

To delete a user from all cluster nodes click trash can icon and then confirm the deletion:

Users Image 5

Note

Local users (users that exist on one node only) can only be modified and deleted by logging into the WebApp on the node where the user exists.


Datasets

Creating Datasets

ZFS uses datasets to organize and store data. A ZFS dataset is similar to a mounted filesystem and has its own set of properties, including a quota that limits the amount of data that can be stored. Datasets are organized in the ZFS hierarchy and can have different mountpoints. More specifically it is a generic name for the following ZFS components: clones, file systems, snapshots, and volumes.

The following steps will show the process of creating a dataset within clustered and non clustered pools.

Note

Datasets can only be created or edited on the node where the service is running.

  1. To create a dataset navigate to ZFS -> Datasets and click CREATE DATASET

    Datasets Image 1

  2. Select the Parent Pool and enter a Dataset Name and set any options required:

    Datasets Image 2

    Available options:

    Option
    Description
    Mountpoint path Alternative path to mount the dataset or pool. Do not change the mountpoint of a clustered pool, doing this will break automatic failover
    Compression Enable compression or select an alternative compression type (lz4/zstd)
    • off - No compression.
    • on - The current default compression algorithm will be used (does not select a fixed compression type; as new compression algorithms are added to ZFS and enabled on a pool, the default compression algorithm may change).
    • lz4 - A high-performance compression and decompression algorithm, as well as a moderately higher compression ratio than the older lzjb (the original compression algorithm).
    • zstd - Provides both high compression ratios and good performance and is preferable over lz4.
    Dataset Quota Set a limit on the amount of disk space a file system can use.
    Reservation Size Guarantee a specified amount of disk space is available to a file system.
    ACL Inherit Setermine the behavior of ACL inheritance (i.e. how ACLs are inherited when files and directories are created). The following options are available:
    • discard - No ACL entries are inherited. The file or directory is created according to the client and protocol being used.
    • noallow - Only inheritable ACL entries specifying deny permissions are inherited.
    • restricted - Removes the write_acl and write_owner permissions when the ACL entry is inherited, but otherwise leaves inheritable ACL entries untouched. This is the default.
    • passthrough - All inheritable ACL entries are inherited. The passthrough mode is typically used to cause all data files to be created with an identical mode in a directory tree. An administrator sets up ACL inheritance so that all files are created with a mode, such as 0664 or 0666.
    • passthrough-x - Same as passthrough except that the owner, group, and everyone ACL entries inherit the execute permission only if the file creation mode also requests the execute bit. The passthrough setting works as expected for data files, but you might want to optionally include the execute bit from the file creation mode into the inherited ACL. One example is an output file that is generated from tools, such as cc or gcc. If the inherited ACL does not include the execute bit, then the output executable from the compiler won't be executable until you use chmod(1) to change the file's permissions.
      ACL Mode Modify ACL behavior whenever a file or directory's mode is modified by the chmod command or when a file is initially created.
      • discard - All ACL entries are removed except for the entries needed to define the mode of the file or directory.
      • groupmask - User or group ACL permissions are reduced so that they are no greater than the group permission bits, unless it is a user entry that has the same UID as the owner of the file or directory. Then, the ACL permissions are reduced so that they are no greater than owner permission bits.
      • passthrough - During a chmod operation, ACEs other than owner@, group@, or everyone@ are not modified in any way. ACEs with owner@, group@, or everyone@ are disabled to set the file mode as requested by the chmod operation.
      Extended Attributes Controls whether extended attributes are enabled for this file system. Two styles of extended attributes are supported: either directory-based or system-attribute-based.
      • off - Extended attributes are disabled
      • on - Extended attributes are enabled; the default value of on enables directory-based extended attributes.
      • sa - System based attributes
      • dir - Directory based attributes
      Enable NFS Share enable NFS sharing dataset via ZFS. Do not enable if managing Shares via the WebApp.
      Enable SMB Share enable SMB sharing dataset via ZFS. Do not enable if managing Shares via the WebApp.
      Update Access Time controls whether the access time for files is updated on read.
      What is an ACL

      An ACL is a list of user permissions for a file, folder, or other data object. The entries in an ACL specify which users and groups can access something and what actions they may perform.

      What are Extended Attributes

      Extended file attributes are file system features that enable additional attributes to be associated with computer files as metadata not interpreted by the filesystem, whereas regular attributes have a purpose strictly defined by the filesystem.

      In ZFS directory based extended attributes (dir) imposes no practical limit on either the size or number of attributes which can be set on a file. Although under Linux the getxattr(2) and setxattr(2) system calls limit the maximum size to 64K. This is the most compatible style of extended attribute and is supported by all ZFS implementations.

      With system extended attributes (sa) the key advantage is improved performance. Storing extended attributes as system attributes significantly decreases the amount of disk I/O required. Up to 64K of data may be stored per-file in the space reserved for system attributes. If there is not enough space available for an extended attribute then it will be automatically written as a directory-based xattr. System-attribute-based extended attributes are not accessible on platforms which do not support the xattr=sa feature. OpenZFS supports xattr=sa on both FreeBSD and Linux.

    • Click SUBMIT to create. The Dataset is now created in the pool.

      Datasets Image 3

    Modifying Datasets

    To modify an existing dataset, click the DS pencil button to the right of the dataset:

    Datasets Image 4

    Deleting datasets

    To delete a dataset, click the DS bin button to the right of the dataset and click REMOVE DATASET. You will be prompted to confirm the delete:

    Datasets Image 5


    NFS shares

    Enabling clustered NFS

    By default RSF-1 does not handle NFS shares - the contents of the /etc/exports file are left to be managed by the system administrator manually on each node in the cluster. To enable the management of the exports file from the webapp and synchronise it across all cluster nodes, navigate to Shares -> NFS and click ENABLE NFS SHARE HANDLING:

    NFS Image 1

    Once enabled the shares table will be shown:

    NFS Image 2

    Before creating new shares the option to import the existing /etc/exports file is available (this option is disabled once any new shares are added via the webapp):

    Clustering an NFS share

    1. Navigate to Shares -> NFS and click +Add on the NFS table to fill in the required info. The available options are:

      • Description - Description of the Share (optional)
      • Path - Path of the directory/dataset to share - for example /pool1/nfs
      • Export Options - For a detailed description of the available options click the SHOW NFS OPTIONS EXAMPLES button.

      NFS Image 3

    2. Click to add the share:

      NFS Image 4

      The share will now be available and clustered.

    FSID setting for failover

    NFS Version 4 NFS identifies each file system it exports using a file system UUID or the device number of the device holding the file system. NFS clients use this identifier to ensure consistency in mounted file systems; if this identifier changes then the client considers the mount stale and typically reports "Stale NFS file handle" meaning manual intervention is required.

    In an HA environment there is no guarantee that these identifiers will be the same on failover to another node (it may for example have a different device numbering). To alleviate this problem each exported file system should be assigned a unique identifier (starting at 1 - see the note below on the root setting) using the NFS fsid= option, for example:

    /tank      10.10.23.4(fsid=1)
    /sales     10.01.23.5(fsid=2,sync,wdelay,no_subtree_check,ro,root_squash)
    /accounts  accounts.dept.foo.com(fsid=3,rw,no_root_squash)
    

    Here each exported file system has been assigned a unique fsid thereby ensuring that no matter which cluster node exports the filesystem it will always have a consistent identifier exposed to clients.

    For NFSv4 the option fsid=0 or fsid=root is reserved for the "root" export. When present all other exported directories must be below it, for example:

    /srv/nfs       192.168.7.0/24(rw,fsid=root)
    /srv/nfs/data  192.168.7.0/24(fsid=1,sync,wdelay,no_subtree_check,ro,root_squash)
    

    As /srv/nfs is marked as the root export then the export /srv/nfs/data is mounted by clients as nfsserver:/data. For further details see the NFS manual page.

    Modifying an NFS Share

    To modify an NFS chare, click the pencil icon to the left of the dataset:

    NFS Image 5

    When done, click to update the share.

    Deleting an NFS Share

    To delete an NFS share click the trash can icon and then confirm the deletion

    NFS Image 6


    1. RSF-1 uses broadcast packets to detect cluster nodes on the local network. Broadcast packets are usually blocked from traversing other networks and therefore cluster node discovery is usually limited to the local network only. 

    2. A broken_safe state is considered a stopped state as, althought the service was unable to start up successfully, it was able to free up all the resources during the shutdown/abort step (hence the safe state).