TANTI TECHNOLOGIES: Complete an HACMP failover test for AIX

Complete an HACMP failover test for AIX

• 1 Introduction
• 2 Steps
o 2.1 On the production system
o 2.2 Configuration changes Made to environment
o 2.3 Example
o 2.4 Important
• 3 Conclusion
************************************************
Introduction
Once I ensured the system was configured correctly. Before I started testing the cluster failover, I performed a manual failover to test all the scripts that have been written work according to the configured system needs.

Steps
The following are the steps that you should take to do a manual failover test of your application.
On the production system
1. Configure the service address on the primary adapter.
2. Varyon all shared volume groups.
3. Mount all shared filesystems.
4. Execute your application start script.
5. Test the application.
6. Execute your application stop script.
7. Unmount all shared file systems.
8. Varyoff all the shared volume groups.
9. Configure the production boot address on the primary adapter.
If the test is successful on the production server, you should now move to the backup server and proceed with the following steps:
1. Configure the production service address on the standby adapter.
2. Varyon all shared volume groups.
3. Mount all shared file systems.
4. Execute your application start script.
5. Test the application.
6. Execute your application stop script.
7. Unmount all shared file systems.
8. Varyoff all the shared volume groups.
9. Reset the standby address on the standby adapter.
I completed the manual failover of your cluster. Doing a manual failover test ensured I had more control over the failover steps. Once I noticed errors in the configuration, I then corrected the failures, by bring down cluster nodes and performing a synchronized verification test. Before proceeding to the next steps: Therefore, the manual failover test helps you to trouble shoot any application. Problems before performing the automatic HACMP failover test.

Configuration changes Made to environment
1. Tune the system using I/O pacing.
1. I/O pacing is required for an HACMP cluster to behave correctly during large disk writes, and it is strongly recommended if you anticipate large blocks of disk writes on your HACMP cluster. These marks are, by default, set to zero (disabling I/O pacing) when AIX is installed. While the most efficient high- and low-water marks vary from system to system, an initial high-water mark of 33 and a low-water mark of 24 provide a good starting point.
These settings only slightly reduce write times and consistently generate correct failover behavior from HACMP for AIX. If a process tries to write to a file at the high-water mark, it must wait until enough I/O operations have finished to make the low-water mark.
1.
1. The way to correctly configure Highwater Mark and Low w water mark is as formula below. The default values for the disk-I/O pacing high-water and low-water marks (maxpout and minpout parameters) may cause severe performance problems on Caché production systems. These default values may significantly hinder Caché Write daemon performance by inappropriately putting the Write daemon to sleep causing prolonged Write daemon cycles.
If you are using HACMP clusters, I/O pacing is automatically enabled. If your system is not part of an HACMP cluster, set both the high- (maxpout) and low- (minpout) water marks to 0 (zero) to disable I/O pacing.
View and change the current settings for the I/O pacing high-water and low-water marks by issuing the smitty chgsys command.
Inter Systems currently recommends the following IBM calculation for determining the appropriate high-water mark:
• high-water mark = (4 * n) + 1
• Where n = the maximum number of spindles any one file (database, journal, or WIJ) spans across. Set the low-water mark to 50%-75% of the high-water mark.
Example
For example, a CACHE.DAT database file is stored on a storage array, and the LUN (or file system) where it resides consists of 16 spindles/drives. Calculate:
• High-water mark = (4 * 16) + 1 = 65
Low-water mark = between (.50 * 65) and (.75 * 65) = between 33 and 49
1. Increase the syncd frequency.
1. Edited the /sbin/rc.boot file to increase the syncd frequency from its default
value of 60 seconds to either 30, 20, or 10 seconds. Increasing the frequency forces more frequent I/O flushes and reduces the likelihood of triggering the dead man switch due to heavy I/O traffic.
1. Corrected Inittab entry on b node.
2. Increases Swapspace on both nodes to 2.5 gb.
1. Calculation used: Current before change = 512.
1. vmstat > avg /256 = a node:278212 /256 = 1086.75
= b node 331696 /256 = 1295.63
1.
1. Reason for increase due to (DMS) Dead Man Switch Errors:
2. If you see an increase in the following parameters, increase the values for better Caché performance: This will have to be monitored since TNG are collection stats on these boxes. I would appreciate a report as of 1 week before and 1 week after to observer system performance. Then a course of action can be assisted as to the below if needed. I recommend when increasing these parameters from the default values:
Important
Change both the current and the reboot values, and check the vmstat output regularly because I/O patterns may change over time (hours, days, or weeks).
1. Increase the current value by 50%.
2. Check the vmstat output.
3. Run vmstat twice, two minutes apart.
4. If the field is still increasing, increase again by the same amount; continue this step until the field stops increasing between vmstat reports.
• pending disk I/Os blocked with no pbuf
• paging space I/Os blocked with no psbuf
• filesystem I/Os blocked with no fsbuf
• client filesystem I/Os blocked with no fsbuf
• external pager filesystem I/Os blocked with no fsbuf

TANTI TECHNOLOGIES

Tanti Technology

Sunday, 19 June 2011

Complete an HACMP failover test for AIX

No comments:

Post a Comment