Failover
Why is my failover taking a long time?
There are many factors that can influence failover time. If a failover is taking longer than expected, the rsfmon logs should be checked to find which part(s) of the failover is taking the time.
The main RSF-1 log file is /opt/HAC/RSF-1/log/rsfmon.log
. Each line in the log is timestamped, so it is possible to work out how much time each stage is taking. The log file will also contain a line telling you how long a service start or stop took in total:
[23440 Jan 31 17:18:50] [pool4 rsfexec] Total run time for service start: 9 seconds
Log file parser
Log files can be complex to read - especially when multiple services are starting or stopping simultaneously. To help with diagnosis, the tool /opt/HAC/RSF-1/bin/rsflog-summary.sh
can be used to provide a summary of the activity stored in the log files.
Here is an example of running the log summary tool:
# /opt/HAC/RSF-1/bin/rsflog-summary.sh -h
Usage: rsflog-summary.sh [-l|--log <logfile|logdir>] [service]...
Options:
-l --log <log>: Specify a log file or log directory to search.
If a file is given, only that file will be searched.
If a directory is given, that directory will be searched
for rsfmon.log and rsfmon.log.[0-9].
The default is to use the standard RSF-1 log directory
/opt/HAC/RSF-1/log
-h --help: Show this help text and exit
The config file /opt/HAC/RSF-1/etc/config will be scanned for services to search for
unless a list has been provided on the command line.
# /opt/HAC/RSF-1/bin/rsflog-summary.sh
Services to search for: cpw_pool pool4
Searching log files in /opt/HAC/RSF-1/log:
rsfmon.log.9 rsfmon.log.8 rsfmon.log.7 rsfmon.log.6 rsfmon.log.5 rsfmon.log.4 rsfmon.log.3 rsfmon.log.2 rsfmon.log.1 rsfmon.log.0 rsfmon.log
###############################################
Service startups and shutdowns for cpw_pool:
###############################################
Reading rsfmon.log.9...
Reading rsfmon.log.8...
Reading rsfmon.log.7...
Reading rsfmon.log.6...
Reading rsfmon.log.5...
Reading rsfmon.log.4...
Reading rsfmon.log.3...
Reading rsfmon.log.2...
Reading rsfmon.log.1...
Reading rsfmon.log.0...
Reading rsfmon.log...
Delay before service start - 6 seconds
Service startup #1: Feb 04 10:59:25 (rsfmon.log)
Fping test complete for testvip: 2 seconds
Script S01announce run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc run time: 5 seconds (place reservations)
Reservations taken in 4 seconds.
Script S20zfs run time: 3 seconds (import pool + LUs)
Zpool import completed status 0, in 3 seconds
Comstar mapping restored in 0 seconds
Script S21res_drives run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted run time: 1 seconds (event notify)
Script S99announce run time: 1 seconds (write to log file)
Service start scripts took 17 seconds
Plumb in VIP interface: 0 seconds
Total run time for service start: 17 seconds
Delay before service start - 6 seconds
Service startup #2: Feb 04 11:06:26 (rsfmon.log)
Fping test complete for testvip: 2 seconds
Script S01announce run time: 0 seconds (write to log file)
Reservations released in 0 seconds.
Script S02ApplianceStarting run time: 1 seconds (event notify)
Script S14res_drives run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc run time: 4 seconds (place reservations)
Reservations taken in 4 seconds.
Script S20zfs run time: 4 seconds (import pool + LUs)
Zpool import completed status 0, in 3 seconds
Comstar mapping restored in 0 seconds
Script S21res_drives run time: 0 seconds (refresh res_drives)
Script S98ApplianceStarted run time: 1 seconds (event notify)
Script S99announce run time: 0 seconds (write to log file)
Service start scripts took 16 seconds
Plumb in VIP interface: 0 seconds
Total run time for service start: 16 seconds
###############################################
Service startups and shutdowns for pool4:
###############################################
Reading rsfmon.log.9...
Reading rsfmon.log.8...
Reading rsfmon.log.7...
Reading rsfmon.log.6...
Reading rsfmon.log.5...
Reading rsfmon.log.4...
Reading rsfmon.log.3...
Delay before service start - 18 seconds
Service startup #1: Jan 29 15:29:37 (rsfmon.log.3)
Fping test complete for vip4: 2 seconds
Script S01announce run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc run time: 5 seconds (place reservations)
Reservations taken in 4 seconds.
Script S20zfs run time: 3 seconds (import pool + LUs)
Zpool import completed status 0, in 2 seconds
Comstar mapping restored in 0 seconds
Script S21res_drives run time: 0 seconds (refresh res_drives)
Script S98ApplianceStarted run time: 1 seconds (event notify)
Script S99announce run time: 0 seconds (write to log file)
Service start scripts took 15 seconds
Plumb in VIP interface: 0 seconds
Total run time for service start: 15 seconds
Service shutdown #1: Jan 29 15:39:24 (rsfmon.log.3)
Script K01announce run time: 0 seconds (write to log file)
Script K02ApplianceStarted run time: 0 seconds
Script K02ApplianceStopping run time: 1 seconds (event notify)
Script K79res_drives run time: 0 seconds (nothing)
Script K80zfs run time: 1 seconds (export pool + LUs)
Comstar mapping removed in 0 seconds
Zpool export completed in 1 seconds
Script K85zfs_mhdc run time: 1 seconds (release reservations)
Reservations released in 0 seconds.
Script K86res_drives run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 0 seconds
Script K98ApplianceStopped run time: 1 seconds (event notify)
Script K99announce run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds
Reading rsfmon.log.2...
Reading rsfmon.log.1...
Reading rsfmon.log.0...
Reading rsfmon.log...
Delay before service start - 6 seconds
Service startup #2: Jan 31 17:18:39 (rsfmon.log)
Fping test complete for vip4: 2 seconds
Script S01announce run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 1 seconds (event notify)
Script S14res_drives run time: 0 seconds (create initial res_drives)
Script S15zfs_mhdc run time: 4 seconds (place reservations)
Reservations taken in 4 seconds.
Script S20zfs run time: 3 seconds (import pool + LUs)
Zpool import completed status 0, in 2 seconds
Comstar mapping restored in 0 seconds
Script S21res_drives run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted run time: 0 seconds (event notify)
Script S99announce run time: 0 seconds (write to log file)
Service start scripts took 9 seconds
Plumb in VIP interface: 0 seconds
Total run time for service start: 9 seconds
Service shutdown #2: Jan 31 17:18:56 (rsfmon.log)
Script K01announce run time: 1 seconds (write to log file)
Script K02ApplianceStarted run time: 0 seconds
Script K02ApplianceStopping run time: 0 seconds (event notify)
Script K79res_drives run time: 0 seconds (nothing)
Script K80zfs run time: 1 seconds (export pool + LUs)
Comstar mapping removed in 0 seconds
Zpool export completed in 0 seconds
Script K85zfs_mhdc run time: 1 seconds (release reservations)
Reservations released in 0 seconds.
Script K86res_drives run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 1 seconds
Script K98ApplianceStopped run time: 0 seconds (event notify)
Script K99announce run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds
Service startup #3: Jan 31 17:19:01 (rsfmon.log)
Fping test complete for vip4: 2 seconds
Script S01announce run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives run time: 1 seconds (create initial res_drives)
Script S15zfs_mhdc run time: 4 seconds (place reservations)
Reservations taken in 4 seconds.
Script S20zfs run time: 2 seconds (import pool + LUs)
Zpool import completed status 0, in 2 seconds
Comstar mapping restored in 0 seconds
Script S21res_drives run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted run time: 1 seconds (event notify)
Script S99announce run time: 0 seconds (write to log file)
Service start scripts took 9 seconds
Plumb in VIP interface: 0 seconds
Total run time for service start: 9 seconds
Service shutdown #3: Jan 31 17:27:32 (rsfmon.log)
Script K01announce run time: 1 seconds (write to log file)
Script K02ApplianceStarted run time: 0 seconds
Script K02ApplianceStopping run time: 0 seconds (event notify)
Script K79res_drives run time: 0 seconds (nothing)
Script K80zfs run time: 1 seconds (export pool + LUs)
Comstar mapping removed in 0 seconds
Zpool export completed in 0 seconds
Script K85zfs_mhdc run time: 1 seconds (release reservations)
Reservations released in 0 seconds.
Script K86res_drives run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 1 seconds
Script K98ApplianceStopped run time: 0 seconds (event notify)
Script K99announce run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds