Search Results for:

Bringing High Availability to ZFS Storage

What is ZFS?

ZFS Enterprise Features
  • Self Healing
  • End-to-end data integrity
  • Infinitely scalable
  • Storage Pooling
  • Combined file system and volume manager
  • RAID-Z
    Replication
  • Deduplication
  • Compression
  • Unlimited snapshots
 

ZFS, and its open-source sibling, OpenZFS, is an advanced storage platform that combines a file system with a logical volume manager incorporating built-in advanced enterprise features such as data protection, replication, deduplication, compression and unlimited snapshots. It is inherently massively scalable allowing for file sizes of up to 16 Exabytes and up to 256 Quadrillion Zettabytes of storage. Some of its key advanced features include storage pooling, RAID-Z, copy-on-write, end-to-end data integrity verification and automatic repair.

ZFS is a transactional self-healing 128-bit file system that supports almost unlimited storage capacity. One of it’s unique features is that is calculates checksums of all data and meta data allowing it to identify and self-repair data corruption.

Combining what have been traditionally separate storage components: data management, storage and device management, virtualized volumes and file system management into one, ZFS does not require hardware RAID controllers or battery-backed NVRAM to ensure data integrity in the event of physical component failure.

A Brief History of ZFS

Designed to overcome the limitations of general-purpose file systems, ZFS was originally conceived and created by Sun Microsystems in 2001 for their proprietary Solaris platform. In 2005, the source was released in the open-sourced OpenSolaris platform shortly before Sun’s acquisition by Oracle. Since then, Oracle placed ZFS development back under closed source license on their Solaris platform and they continue to sell their own proprietary ZFS based storage appliances.

Since 2005 however, growing open-source communities have ported and improved ZFS on other Unix platforms including the OpenSolaris derived illumos, OpenIndiana and OmniOS operating systems, Linux, Mac OSX and FreeBSD. Since 2013, ongoing ZFS development and releases have been coordinated by OpenZFS, an umbrella organisation whose team comprises of individuals and companies that use, improve and promote the ZFS file system, including many commercial organisations that embed ZFS in their own products.

Why the Enterprise should consider OpenZFS

ZFS Storage Components
  • Commodity x86 server hardware
  • Any drive mix of HDD, SSD, AFA, NVMe
  • Choice of Solaris, illumos, OmniOS, Linux and FreeBSD Operating Systems
  • OpenZFS software

OpenZFS is a reliable and key component for enterprise storage as it:

  • Has been proven in demanding enterprise environments for nearly two decades
  • Has almost unlimited scalable capacity
  • Is self-healing
  • Includes many advanced enterprise storage features
  • Has a mature and growing open-source community continuing developments
  • Has a wealth of commercial buy-in
  • Is very cost-effective compared to closed proprietary offerings
  • Is compatible with the latest and greatest storage technologies and components (e.g. flash, NVMe, Infiniband)
  • Is being deployed widely in all storage tiers across all vertical markets

In addition, OpenZFS can be deployed on a wide range of generic commodity hardware from a myriad of suppliers on the Operating System platform of your choice. Open source, open architecture, commodity hardware, hugely flexible topologies and many hardware options and vendors mean ZFS is enterprise-ready.

Bringing High Availability to ZFS Storage Appliances

Being more than just a file-system and incorporating advanced logical volume management capability, ZFS has a number of revolutionary built-in features that customers would normally pay large additional license fees for elsewhere. For example, thin provisioning, iSCSI and Fibre Channel support, NFS & CIFS, unlimited snapshots, replication and so on.

Whilst ZFS has built-in data verification and integrity checking, what ZFS isn’t however, is a clustered file-system meaning that ZFS pools can only be served by a single server head at a time; a clear single-point-of-failure. If the server fails, even though the data may be safe, the ZFS storage and associated file and block services become inaccessible.

For enterprise use, businesses demand high availability to ensure business continuity in the event of system breakdown or disaster, and no matter how reliable the hardware, there is no guarantee of maintaining service availability in the event of failure or error of a single hardware component, whether storage device or server head.

High Availability considerations for ZFS
  • Not a clustered file system
  • No pool import protection
  • No inherent data fencing mechanisms
  • Varying ZFS pool import times
  • Managing ZFS cache devices
  • File & Block service failover time and timeout
  • Network connectivity and failover
  • Failover management framework

Adding a second server head to improve storage availability in the event of a single server failure can be achieved using simple active/passive high availability and failover technologies. However, bringing enterprise-grade high availability features to ZFS based storage appliances is significantly more complicated than simply failing over ZFS pools from one server head to another.

Due to its inherent design and the inventors’ foresight to know that physical disk drives gradually wear out over time, ZFS does an incredible job of protecting against data corruption as well as continuous integrity checking and automatic repair. It is reasonably easy however to inadvertently corrupt entire ZFS pools in clustered configurations if attempts are made to import pools on two or more storage server nodes simultaneously.

Although ZFS pools can be made available to any number of other storage nodes, there is no concept in ZFS for pools to be accessed by more than one node at a time, and there is no inherent protection against multi-import catastrophic data corruption scenarios. Whilst storage high availability is vital to ensure services are always up, securing and protecting the data foremost is absolutely critical.

In addition to ensuring the availability of ZFS pools and data protection, bulletproof data fencing capabilities are needed to protect the physical data from potential split-brain scenarios. For example, sometimes a server, or network access, may freeze for a few moments and recover during or after a service failover, leading to the potentially disastrous scenario of storage corruption by data pools being written to by two or more servers at the same time.

It is also important to provide high availability and integrity of associated file and block services (e.g. NFS, SMB, iSCSI, ALUA), to maintain connections for virtualized environments, to ensure replication and snapshot capabilities and backup schedules services are maintained, and so on.

More complex storage topologies may also involve a large number of ZFS pools serviced across multiple servers meaning a simple active/passive (two node) configuration is not possible. Simple high availability solutions reliant on network heartbeats may not accommodate stretched storage topologies where servers physically reside in different locations or include a remote replicated third-node in a backup or disaster recovery site.

RSF-1 for ZFS
  • Multiple server, multiple pool failover support
  • File and block service failover
  • Multiple heartbeat mechanisms
  • Bulletproof data fencing
  • ZFS-specific fast failover
  • Stretch metro capability
  • API integration framework
  • GUI and CLI administration toolset
  • Proven 20+ year enterprise pedigree
  • Proven 10+ year ZFS support
  • Thousands of ZFS enterprise deployments
 

Bringing these enterprise-grade high availability features to a critical ZFS storage deployment also requires deploying a bulletproof and proven high availability solution; this is where High-Availability.com’s RSF-1 for ZFS comes in.

RSF-1 for ZFS allows multiple ZFS pools to be managed across multiple servers providing High Availability for both block and file services beyond a traditional two-node Active/Active or Active/Passive topology. With RSF-1 for ZFS Metro edition, highly available ZFS services can also span beyond the single data centre.

A typical 2-server RSF-1 for ZFS High Availability Topology

RSF-1 for ZFS storage is therefore more than a Disaster Recovery solution failing back to a previous snapshot point-in-time, but provides real-time failover in the event of failure or disaster where data is always up-to-date and failover transparent to system users. It is also a sophisticated system administration tool that allows for controlled management of storage services to facilitate smooth and seamless system upgrades and maintenance.

Managed by a standalone GUI, command line interface and a rich API, RSF-1 for ZFS can also be easily seamlessly integrated into your own management administration toolset.

At the heart of the RSF-1 for ZFS solution is a mature and stable enterprise class high availability product. It was the first commercial HA solution for Sun/Solaris environments and has a 20+ year track record in data centres worldwide providing high-availability assurance for some of the most demanding customer service availability needs. RSF-1 for ZFS has provided Enterprise-grade High-Availability ZFS Storage services to thousands of mission-critical deployments across all industries worldwide since 2009.

Thank you

Thank you for registering. One of our agents will be in touch very soon!

Support

If you have purchased RSF-1 from a reseller, distributor or integrator, you should raise a support ticket with that organisation in the first instance.

To raise a support call on any of the High-Availability.Com products, please ensure you have all required information to hand including the following:

  • Maintenance contract number
  • Software version number
  • Business name and location
  • Call back telephone number
  • Nature of the fault

You should call:

UK – 01625 527360       International +44 (0) 1625 527360

You should also follow up your call with an e-mail to:

support@High-Availability.Com

with a fuller description of the problem and include as an attachment the output from High-Availability’s supplied diagnostic tool (/opt/HAC/bin/hacdiag)

RSF-1 for ZFS is a fully featured software-only middleware product that turns your Solaris, illumos, FreeBSD or Linux storage servers into highly available ZFS NAS cluster appliances, which can be installed and ready for enterprise storage use within minutes.

RSF-1 for ZFS allows multiple ZFS pools to be managed across multiple servers providing High Availability for both block and file services beyond a traditional two-node Active/Active or Active/Passive topology. With RSF-1 for ZFS Metro edition, highly available ZFS services can also span beyond the single data centre.

Managed by a standalone GUI, command line interface and a rich API, RSF-1 for ZFS can also be easily seamlessly integrated into your own management administration toolset.

At the heart of the RSF-1 for ZFS solution is a mature and stable enterprise class high availability product. It was the first commercial HA solution for Sun/Solaris environments and has a 20+ year track record in data centres worldwide providing high-availability assurance for some of the most demanding customer service availability needs.

RSF-1 for ZFS has provided Enterprise-grade High-Availability ZFS Storage services to thousands of mission-critical deployments across all industries worldwide since 2009.

Example High Availability ZFS Topology

The following section describes how RSF-1 brings Highly Available storage services to ZFS in a two-server node, shared storage topology.

This example consists of a storage service of two storage servers (node A and Node B) with shared storage made up of two ZFS pools (Pool1 and Pool2). The two storage nodes are interconnected with public network, private network and storage connectivity.

RSF-1 for ZFS installs on both servers and communicates via a number of heartbeat connections. Each heartbeat transmits RSF-1 state and control information describing each node’s view of the cluster. In this example, heartbeats are established via both private and public networks (TCP/IP) and via High-Availability’s unique stateful disk heartbeat mechanism. This mechanism ensures that in the event of total network failure, cluster control is maintained independently.

Any number of heartbeat connections (disk, network and serial connection) can be used in an RSF-1 cluster, and at least two different mechanisms are recommended. In this example, we are using two independent network and two independent disk heartbeat mechanisms.

In this simple two-node two-pool topology example, we are going to deploy an Active-Active configuration where each of the two servers will manage ZFS services for each of the two ZFS pools.  Note that Active-Active here refers to the fact that both servers can actively run ZFS pool services.

On start-up, each node in the RSF-1 cluster determines which services need to be started as defined by the RSF-1 configuration preferences and will assume a role of “master” or “standby” for each service depending on the state of each service at that time.

When RSF-1 has determined that a ZFS pool service is not already active, it initiates a countdown to become master for that service. Once that countdown has expired, it will initiate service startup informing the rest of the cluster that it will be master for that service. If the service is already running elsewhere, it will become standby server for that service.

On service startup, it begins by fencing the underlying storage to protect the ZFS pool data. It does this by locking access to the drives that make up the ZFS pool to ensure that any other storage server cannot access them.

Once node A has protected and secured the underlying storage, it imports the ZFS pool, starts the associated ZFS services (file and/or block) and enables a Virtual IP interface for network access to ZFS Pool1.

At the same time, node B executes the same process for ZFS Pool2. Each service has a defined node preference order and associated timeouts that must expire before a service is started. If, in this example, node B were not started within the ZFS Pool2 timeout after node A, node A would also assume control and start services for Pool2.

The ZFS services are now available to the rest of the network and each pool is accessible via the Virtual IPs. Each node continues to constantly monitor all other nodes in the cluster via all the available heartbeat channels.

In the event of a node failure (as determined by loss of heartbeats from all live mechanisms), the failover process is initiated. In this example, let’s assume that node A has crashed. Network access to ZFS Pool1 will hang momentarily during the failover process.

Node B begins the failover process by first breaking the low level locks placed on the underlying ZFS Pool1 drives. It then imports the ZFS pool and starts the associated ZFS services Virtual IP. After a short interruption, while failover completes, network storage access will continue. Although RSF-1 exploits a number of mechanisms for speeding up ZFS importation, actual failover time will vary depending on the complexity of the ZFS pool structure such as number of drives, RAID levels, volume of ZFS snapshots etc.

Node B is now running ZFS services for both Pool1 and Pool2 with minimal disruption.

In the event that node A recovers during or after the failover process, RSF-1 triggers an immediate panic and shutdown as the underlying storage devices have been reserved elsewhere. When node A is restarted, (e.g. after repair and/or reboot), it will rejoin the cluster and act as standby server for both services). The system administrator can manually failback ZFS Pool1 to node A at a convenient time.

Whilst the above example describes a simple two-node two-pool architecture, RSF-1 for ZFS supports multiple nodes, multiple pools and multiple VIPs to provide extremely flexible ZFS storage topologies.

Please view the short video below to see how RSF-1 for ZFS provides highly available ZFS services in more detail.

Contact us today for a non-obligation free evaluation to see how RSF-1 for ZFS can work for you.

RESOURCES

RSF-1 HA Plugin ZFS Storage Cluster Concept Guide  (pdf)

RSF-1 for ZFS Washington University Case Study (pdf)

Contact

Get in touch…

By continuing to submit your data you agree to our Privacy Policy as outlined here: http://high-availability.com/privacy/ and agree to be contacted regarding your enquiry.


High Availability
Pentland House,
Village Way,
Wilmslow,
Cheshire,
SK9 2GH
01625 527360
Mo,Tu,We,Th,Fr 9:00 am – 5:00 pm

About Us

High-Availability’s flagship High Availability Cluster product, RSF-1, has been a trusted commercial Unix failover technology since 1995 and has been widely deployed as the “HA plugin” of choice for many critical applications, databases and enterprise services across many Unix, Linux and BSD platform variants to provide enhanced enterprise-grade service availability.

Since 2009, we have worked extensively with the ZFS file-system in collaboration with the world’s leading Software Defined Storage vendors to provide the highest levels of data protection for critical storage system availability demands.

In addition to licensing its software products, High-Availability offers a comprehensive suite of Professional Services to assist its customers and resellers with both pre-sales and post-sales application and technical requirements.

High-Availability also operates a 24×7 support organization to help ensure that its customer’s mission-critical applications and services continue to be highly available.

For over twenty years, our technology has continued to keep pace with evolving enterprise platforms, technologies and evolving customer needs to ensure we provide the highest levels of service availability to enterprise customer needs.

History

UK-based High-Availability.com was formed in 1995 as an offshoot project to a Sun Enterprise consulting business setup by Grenville Whelan and Paul Griffiths in 1994.

The original concept came from an extraordinary incident in Manchester, England that rendered the Head Quarters and IT infrastructure of a 100-year old insurance customer inaccessible for a week. At the time, the founders were providing Sun enterprising consulting services to this client who realised that they needed a comprehensive Disaster Recovery capability in the event of a similar future catastrophe, without which the continuous operation of their IT infrastructure would have significantly threatened the ability for the business to survive.

As there were no appropriate commercial High Availability solutions available on the Sun enterprise systems at that time, the founders designed, conceived and developed the first commercial High Availability product for Solaris, RSF-1. This very first version (providing failover services to SPARCCentre 2000 and 1000 clusters) ensured the client’s Ingres-based bespoke applications were Highly Available with fully automated failover in the event of failure.

As more interest came from other enterprise Sun users, High-Availability.com was formed as a separate business. Product development and customer take-up increased with many early implementations ordered on customers behalf by Sun Microsystems. During these early days, most RSF-1 implementations protected various bespoke and third-party applications based on Oracle, Ingres and Sybase databases on Sun SPARC Solaris enterprise systems. A Linux version was also released in 1996.

Customers liked the product’s flexible and uncomplicated yet powerful, effective and robust design and that it “did exactly what it says on the tin”. Over the next few years, RSF-1 proved itself to be reliable, effective and bullet-proof and trusted by a wide range of customers with mission-critical needs including: banks, government, emergency services, manufacturing, education and retail.

Over time, more applications were integrated with RSF-1 and large notable rollout deployment successes included delivery of highly available application services to Checkpoint Firewall-1 on Solaris, Oracle DB on Red Flag Linux and MySQL on Linux.

Since 2009, we have worked extensively with the ZFS file-system and have partnered with a number of ZFS Open Storage vendors including Nexenta and Coraid.

Today, with over 23 years experience and thousands of proven mission-critical enterprise deployments globally, we continue to evolve and improve RSF-1 for ZFS. With releases on most Unix variants including: Solaris, OpenIndiana, illumos, OmniOS, FreeBSD and Linux, we have delivered over 4,000 RSF-1 for ZFS clusters on these platforms around the world through OEM partners, resellers and directly.

High-Availability for ZFS

RSF-1 brings advanced HA (High Availability) features to the ZFS file-system providing a more resilient and robust storage offering tolerant to system failures.

Learn More

High-availability for OEMs

RSF-1 has been licensed by many third-party vendors and incorporated into their own products and services, often embedded in turnkey appliances.

Learn More 

We take care of it…

How RSF-1 failover works

RSF-1 failover works by monitoring heartbeats across servers. When the heartbeat skips the ZFS filesystem fails over automatically.

Blog

TAKE A LOOK AT OUR LATEST NEWS: