Failover
Why is my failover taking a long time?
There are many factors that can influence failover time. If a failover is taking longer than expected, the rsfmon logs should be checked to find which part(s) of the failover is taking the time.
The main RSF-1 log file is /opt/HAC/RSF-1/log/rsfmon.log
. Each line in the log is timestamped, so it is possible to work out how much time each stage is taking. The log file will also contain a line telling you how long a service start or stop took in total:
[23440 Jan 31 17:18:50] [pool4 rsfexec] Total run time for service start: 9 seconds
Log file parser
Log files can be complex to read - especially when multiple services are starting or stopping simultaneously. To help with diagnosis, the tool /opt/HAC/RSF-1/bin/rsflog-summary.sh
can be used to provide a summary of the activity stored in the log files.
Here is an example of running the log summary tool:
# /opt/HAC/RSF-1/bin/rsflog-summary.sh -h Usage: rsflog-summary.sh [-l|--log <logfile|logdir>] [service]... Options: -l --log <log>: Specify a log file or log directory to search. If a file is given, only that file will be searched. If a directory is given, that directory will be searched for rsfmon.log and rsfmon.log.[0-9]. The default is to use the standard RSF-1 log directory /opt/HAC/RSF-1/log -h --help: Show this help text and exit The config file /opt/HAC/RSF-1/etc/config will be scanned for services to search for unless a list has been provided on the command line. # /opt/HAC/RSF-1/bin/rsflog-summary.sh Services to search for: cpw_pool pool4 Searching log files in /opt/HAC/RSF-1/log: rsfmon.log.9 rsfmon.log.8 rsfmon.log.7 rsfmon.log.6 rsfmon.log.5 rsfmon.log.4 rsfmon.log.3 rsfmon.log.2 rsfmon.log.1 rsfmon.log.0 rsfmon.log ############################################### Service startups and shutdowns for cpw_pool: ############################################### Reading rsfmon.log.9... Reading rsfmon.log.8... Reading rsfmon.log.7... Reading rsfmon.log.6... Reading rsfmon.log.5... Reading rsfmon.log.4... Reading rsfmon.log.3... Reading rsfmon.log.2... Reading rsfmon.log.1... Reading rsfmon.log.0... Reading rsfmon.log... Delay before service start - 6 seconds Service startup #1: Feb 04 10:59:25 (rsfmon.log) Fping test complete for testvip: 2 seconds Script S01announce run time: 0 seconds (write to log file) Script S02ApplianceStarting run time: 0 seconds (event notify) Script S14res_drives run time: 6 seconds (create initial res_drives) Script S15zfs_mhdc run time: 5 seconds (place reservations) Reservations taken in 4 seconds. Script S20zfs run time: 3 seconds (import pool + LUs) Zpool import completed status 0, in 3 seconds Comstar mapping restored in 0 seconds Script S21res_drives run time: 1 seconds (refresh res_drives) Script S98ApplianceStarted run time: 1 seconds (event notify) Script S99announce run time: 1 seconds (write to log file) Service start scripts took 17 seconds Plumb in VIP interface: 0 seconds Total run time for service start: 17 seconds Delay before service start - 6 seconds Service startup #2: Feb 04 11:06:26 (rsfmon.log) Fping test complete for testvip: 2 seconds Script S01announce run time: 0 seconds (write to log file) Reservations released in 0 seconds. Script S02ApplianceStarting run time: 1 seconds (event notify) Script S14res_drives run time: 6 seconds (create initial res_drives) Script S15zfs_mhdc run time: 4 seconds (place reservations) Reservations taken in 4 seconds. Script S20zfs run time: 4 seconds (import pool + LUs) Zpool import completed status 0, in 3 seconds Comstar mapping restored in 0 seconds Script S21res_drives run time: 0 seconds (refresh res_drives) Script S98ApplianceStarted run time: 1 seconds (event notify) Script S99announce run time: 0 seconds (write to log file) Service start scripts took 16 seconds Plumb in VIP interface: 0 seconds Total run time for service start: 16 seconds ############################################### Service startups and shutdowns for pool4: ############################################### Reading rsfmon.log.9... Reading rsfmon.log.8... Reading rsfmon.log.7... Reading rsfmon.log.6... Reading rsfmon.log.5... Reading rsfmon.log.4... Reading rsfmon.log.3... Delay before service start - 18 seconds Service startup #1: Jan 29 15:29:37 (rsfmon.log.3) Fping test complete for vip4: 2 seconds Script S01announce run time: 0 seconds (write to log file) Script S02ApplianceStarting run time: 0 seconds (event notify) Script S14res_drives run time: 6 seconds (create initial res_drives) Script S15zfs_mhdc run time: 5 seconds (place reservations) Reservations taken in 4 seconds. Script S20zfs run time: 3 seconds (import pool + LUs) Zpool import completed status 0, in 2 seconds Comstar mapping restored in 0 seconds Script S21res_drives run time: 0 seconds (refresh res_drives) Script S98ApplianceStarted run time: 1 seconds (event notify) Script S99announce run time: 0 seconds (write to log file) Service start scripts took 15 seconds Plumb in VIP interface: 0 seconds Total run time for service start: 15 seconds Service shutdown #1: Jan 29 15:39:24 (rsfmon.log.3) Script K01announce run time: 0 seconds (write to log file) Script K02ApplianceStarted run time: 0 seconds Script K02ApplianceStopping run time: 1 seconds (event notify) Script K79res_drives run time: 0 seconds (nothing) Script K80zfs run time: 1 seconds (export pool + LUs) Comstar mapping removed in 0 seconds Zpool export completed in 1 seconds Script K85zfs_mhdc run time: 1 seconds (release reservations) Reservations released in 0 seconds. Script K86res_drives run time: 0 seconds (nothing) Script K98ApplianceStarting run time: 0 seconds Script K98ApplianceStopped run time: 1 seconds (event notify) Script K99announce run time: 0 seconds (write to log file) Service stop scripts took 4 seconds Total run time for service stop: 4 seconds Reading rsfmon.log.2... Reading rsfmon.log.1... Reading rsfmon.log.0... Reading rsfmon.log... Delay before service start - 6 seconds Service startup #2: Jan 31 17:18:39 (rsfmon.log) Fping test complete for vip4: 2 seconds Script S01announce run time: 0 seconds (write to log file) Script S02ApplianceStarting run time: 1 seconds (event notify) Script S14res_drives run time: 0 seconds (create initial res_drives) Script S15zfs_mhdc run time: 4 seconds (place reservations) Reservations taken in 4 seconds. Script S20zfs run time: 3 seconds (import pool + LUs) Zpool import completed status 0, in 2 seconds Comstar mapping restored in 0 seconds Script S21res_drives run time: 1 seconds (refresh res_drives) Script S98ApplianceStarted run time: 0 seconds (event notify) Script S99announce run time: 0 seconds (write to log file) Service start scripts took 9 seconds Plumb in VIP interface: 0 seconds Total run time for service start: 9 seconds Service shutdown #2: Jan 31 17:18:56 (rsfmon.log) Script K01announce run time: 1 seconds (write to log file) Script K02ApplianceStarted run time: 0 seconds Script K02ApplianceStopping run time: 0 seconds (event notify) Script K79res_drives run time: 0 seconds (nothing) Script K80zfs run time: 1 seconds (export pool + LUs) Comstar mapping removed in 0 seconds Zpool export completed in 0 seconds Script K85zfs_mhdc run time: 1 seconds (release reservations) Reservations released in 0 seconds. Script K86res_drives run time: 0 seconds (nothing) Script K98ApplianceStarting run time: 1 seconds Script K98ApplianceStopped run time: 0 seconds (event notify) Script K99announce run time: 0 seconds (write to log file) Service stop scripts took 4 seconds Total run time for service stop: 4 seconds Service startup #3: Jan 31 17:19:01 (rsfmon.log) Fping test complete for vip4: 2 seconds Script S01announce run time: 0 seconds (write to log file) Script S02ApplianceStarting run time: 0 seconds (event notify) Script S14res_drives run time: 1 seconds (create initial res_drives) Script S15zfs_mhdc run time: 4 seconds (place reservations) Reservations taken in 4 seconds. Script S20zfs run time: 2 seconds (import pool + LUs) Zpool import completed status 0, in 2 seconds Comstar mapping restored in 0 seconds Script S21res_drives run time: 1 seconds (refresh res_drives) Script S98ApplianceStarted run time: 1 seconds (event notify) Script S99announce run time: 0 seconds (write to log file) Service start scripts took 9 seconds Plumb in VIP interface: 0 seconds Total run time for service start: 9 seconds Service shutdown #3: Jan 31 17:27:32 (rsfmon.log) Script K01announce run time: 1 seconds (write to log file) Script K02ApplianceStarted run time: 0 seconds Script K02ApplianceStopping run time: 0 seconds (event notify) Script K79res_drives run time: 0 seconds (nothing) Script K80zfs run time: 1 seconds (export pool + LUs) Comstar mapping removed in 0 seconds Zpool export completed in 0 seconds Script K85zfs_mhdc run time: 1 seconds (release reservations) Reservations released in 0 seconds. Script K86res_drives run time: 0 seconds (nothing) Script K98ApplianceStarting run time: 1 seconds Script K98ApplianceStopped run time: 0 seconds (event notify) Script K99announce run time: 0 seconds (write to log file) Service stop scripts took 4 seconds Total run time for service stop: 4 seconds