Replication

Understanding the data replication process

Shared nothing clusters replicate data from the Active node to the Passive node using ZFS snapshots. Snapshots are taken at regular intervals on the active node and are then transfered to the passive node using the ZFS send/receive protocol. Snapshots received on the passive node are then applied to the local pool to 'fast forward' it to the state of the pool on the active node at the time the snapshot was taken.

Snapshot create interval

Operationally the active node is responsible for creating snapshots. The frequency at which snapshots are taken is controlled by the Active node snapshot interval setting under Settings->Shared Nothing, with the default value being every 15 minutes:

active-node-snapshot-interval

This interval represents the timeout the snapshot process is currently using; any change to this value will only be applied once the current timeout expires and the next snapshot is taken.

To clarity, if the snapshot interval is set to 15 minutes when a shared nothing service is created, then the first snapshot will be taken 15 minutes later. If, during that period, the snapshot interval is modified, then that setting will only come into effect once the current timeout of 15 minutes has expired and the next snapshot taken, at which point the new snapshot interval will be used.

Snapshot retention

The number of snapshots retained on the active server in controlled by the Snapshot retention setting under Settings->Shared Nothing:

snapshot-retention

This value specifies the number of snapshots that should be retained on the Active node, with the oldest snapshot being deleted once the maximum number of snapshots has been reached.

Snapshot pull interval

The passive node performs the task of transferring (and applying) snapshots from the active node. The frequency with which the active node is checked for new snapshots is controlled by the Passive node snapshot interval setting under Settings->Shared Nothing, the default value being every 3 minutes:

passive-node-snapshot-interval

For each cycle of the timer, the passive node interrogates the active node for a list of snapshots it holds for clustered pools. This list is then compared to the snapshots held locally, with any missing snapshots being transferring and applied to the local pools to bring them in sync with the active node.

There are a number of advantages to having the passive node keep track of snapshots:

The active node need not concern itself with the online state of the passive node.
The passive node is in the best position to decide which snapshots are required to synchronize a pool.
The interval by which the passive node checks for snapshots can be at a much faster beat rate than the snapshot creation interval.
If the passive node becomes unavailable, upon recovery it can immediately start the process of pulling and applying missing snapshots.

This value should be left quite low, typically between 1-5 minutes is an acceptable setting. There are certain circumstances however where less frequent updates are desirable, for example to reduce the amount of burst traffic on the cluster network interconnect - in these cases a higher value can be configured.

Setting considerations

The values used for the snapshot settings directly impact how the cluster operates during normal running, therefore careful consideration should be given to the following points.

Rollback window

The rollback window is a combination of the number of snapshots taken and how long they are available for. For example, if the active node snapshot interval is set to 5 minutes and the retention count is set to 24 then the rollback window is two hours:

This setting provides a fine level of granularity when selecting a point in time to roll back to, but only a 2 hour window of available rollback points. Changing the snapshot interval to 30 minutes results in a reduction in snapshot granularity but an increase in the retention period to 10 hours:

A much longer rollback window can be achived using a 2 hour interval with a retention count of 84:

Ultimately the values chosen will be influenced by the type of data held in the pools; for a fairly static use case (such as a web server with minimal changes) then daily snapshots with a long retention period is applicable, whereas a high level of activity (i.e. a database) would benefit from more frequent snapshots with a shorter retention period.

Service synchronization

When a cluster is first created, pools on the passive node need to be synchronised with their counterparts on the active node to bring them inline with each other; this is known as bootstrapping the pools and involves copying all the data from pools on the active node over to the passive node. Once this has been accomplished the normal process of pulling and applying snapshots proceeds.

Bootstrapping the pools is also necessary when a passive node has been unavailable for a period of time, which leads to the scenario where there are no common snapshots between the two nodes. To understand how this situation can occur consider the following when the passive node becomes unavailable:

Snapshots on the active node are taken every 15 minutes.
Snapshot retention on the active node is set to 40 snapshots.

With these settings 4 snapshots are taken every hour. As the retention policy is 40 snapshots then after 10 hours the crossover window for snapshots is reached and the passive node will no longer retain any common snapshots from the active node.

The following diagram illustrates a scenario where the passive node has been unavailable from 00:00 and at 10:00 drops out of sync as the two snapshot windows diverge:

Once the passive node comes back online it compares its list of snapshots with that of the active node and will recognise there are no snapshots in common, which in turn will trigger a complete re-sync of the pool.

Pool re-syncing

A complete re-sync of a pool means all the data in the pool has to be transferred from the active to the passive node in order to recreate it. Depending upon the size of the pool this transfer could take minutes, hours or even days. It is therefore important to strike the right balance between snapshot creation and retention settings on the active node as they essentially dictate the amount of possible down time on the passive node before a complete re-sync is required.

For comparison, when the two nodes are in sync, the passive node will slightly lag behind the active node as snapshots are pulled and applied:

Expiring snapshots on the passive node

During normal operation the passive node will remove copies of older snapshots that no longer exist on the active node. This is necessary in order to prevent unfettered snapshots accumulating on the passive node, which ultimately could consume all the available space in the pool and cause the synchronization process to fail.