TANTI TECHNOLOGIES: HACMP QUICK REFERENCE AND INTERVIEW PREPERATION

HACMP BASICS

HACMP CAN BE CONFIGURED IN 3 WAYS.

1. Rotating

2. Cascading

3. Mutual Failover

Cascading resource group:

Upon node failure, a cascading resource group falls over to the available node with the next priority in The node priority list.Upon node reintegration into the cluster, a cascading resource group falls back to its home node by default.

Cascading without fallback

Thisoption, this means whenever a primary node fails, the package will failover to the next available node in the list and when the primary node comes online then the package will not fallback automatically. We need to move package to its home node at a convenient time.

Rotating resource group:

This is almost similar to Cascading without fallback, whenever package failover to the standby nodes it will never fallback to the primary node automatically, we need to move it manually at our convenience.

Mutual takeover:

Mutual takeover option, which means both the nodes in this type are active-active mode. Whenever fail over happens the package on the failed node will move to the other active node and will run with already existing package. Once the failed node comes online we can move the package manually to that node.

USEFUL HACMP COMMANDS

clstat - show cluster state and substate; needs clinfo.

cldump - SNMP-based tool to show cluster state

cldisp - similar to cldump, perl script to show cluster state.

cltopinfo - list the local view of the cluster topology.

clshowsrv -a - list the local view of the cluster subsystems.

clfindres (-s) - locate the resource groups and display status.

clRGinfo -v - locate the resource groups and display status.

clcycle - rotate some of the log files.

cl_ping - a cluster ping program with more arguments.

clrsh - cluster rsh program that take cluster node names as argument.

clgetactivenodes - which nodes are active?

get_local_nodename - what is the name of the local node?

clconfig - check the HACMP ODM.

clRGmove - online/offline or move resource groups.

cldare - sync/fix the cluster.

cllsgrp - list the resource groups.

clsnapshotinfo - create a large snapshot of the hacmp configuration.

cllscf - list the network configuration of an hacmp cluster.

clshowres - show the resource group configuration.

cllsif - show network interface information.

cllsres - show short resource group information.

lssrc -ls clstrmgrES - list the cluster manager state.

lssrc -ls topsvcs - show heartbeat information.

cllsnode - list a node centric overview of the hacmp configuration.

STEPS TO CONFIGURE HACMP:

1. Install the nodes, make sure the redundancy is maintained for power supplies, n/w and

fiber n/ws. Then Install AIX on the nodes.

2. Install all the HACMP filesets except HAview and HATivoli.

Install all the RSCT filesets from the AIX base CD.Make sure that the AIX, HACMP patches and server code are at the latest level (ideallyrecommended).

4. Check for fileset bos.clvm to be present on both the nodes. This is required to make the

VGs enhanced concurrent capable.

5. V.IMP: Reboot both the nodes after installing the HACMP filesets.

6. Configure shared storage on both the nodes. Also in case of a disk heartbeat, assign a

1GB shared storage LUN on both nodes.

7. Create the required VGs only on the first node. The VGs can be either normal VGs or

Enhanced concurrent VGs. Assign particular major number to each VGs while creating

the VGs. Record the major no. information.

To check the Majar no. use the command:

ls –lrt /dev grep

Mount automatically at system restart should be set to NO.

8. Varyon the VGs that was just created.

9. V.IMP: Create log LV on each VG first before creating any new LV. Give a unique

name to logLV.

Destroy the content of logLV by: logform /dev/loglvname

Repeat this step for all VGs that were created.

10. Create all the necessary LVs on each VG.

11. Create all the necessary file systems on each LV created…..you can create mountpts

as per the requirement of the customer,

Mount automatically at system restart should be set to NO.

12. umount all the filesystems and varyoff all the VGs.

13. chvg –an All VGs will be set to do not mount automatically at

System restart.

14. Go to node 2 and run cfgmgr –v to import the shared volumes.

15. Import all the VGs on node 2

use smitty importvg import with the same major number as assigned on node

16. Run chvg –an for all VGs on node 2.

17. V.IMP: Identify the boot1, boot2, service ip and persistent ip for both the nodes and make the entry

in the /etc/hosts.

18. Define cluster name.

19. Define the cluster nodes. #smitty hacmp -> Extended Configuration -> Extended topology configuration -> Configure an HACMP node - > Add a node to an HACMP cluster Define both the nodes on after the other.

20. Discover HACMP config: This will import for both nodes all the node info, boot ips,

service ips from the /etc/hosts

smitty hacmp -> Extended configurations -> Discover hacmp related information

Step21 Adding Communication interface

Add HACMP communication interfaces. (Ether interfaces.)

smitty hacmp -> Extended Configuration -> Extended Topology Configuration ->

Configure HACMP networks -> Add a network to the HACMP cluster.

Select ether and Press enter.

Then select diskhb and Press enter. Diskhb is your non-tcpip heartbeat.

step 22 Adding device for Disk Heart Beat

Include the interfaces/devices in the ether n/w and diskhb already defined.

smitty hacmp -> Extended Configuration -> Extended Topology Configuration ->

Configure HACMP communication interfaces/devices -> Add communication Interfaces/devices.

step23 Adding boot IP & Disk heart beat information

Include all the four boot ips (2 for each nodes) in this ether interface already defined.Then include the disk for heartbeat on both the nodes in the diskhb already defined

step 24 Adding persistent IP

Add the persistent IPs:

smitty hacmp -> Extended Configuration -> Extended Topology Configuration ->

Configure HACMP persistent nodes IP label/Addresses

step 25 Adding Persistent IP labels

Add a persistent ip label for both nodes.

step 26 Defining IP labels

Define the service IP labels for both nodes.

smitty hacmp -> Extended Configuration -> Extended Resource Configuration ->

HACMP extended resource configuration -> Configure HACMP service IP label

step 27 Adding Resource Group

Add Resource Groups:

smitty hacmp -> Extended Configuration -> Extended Resource Configuration ->

HACMP extended resource group configuration

Continue similarly for all the resource groups.

The node selected first while defining the resource group will be the primary owner of

that resource group. The node after that is secondary node.

Make sure you set primary node correctly for each resource group. Also set the failover/fallback policies as per the requirement of the setup step 28 Setting attributes of Resource group

Set attributes of the resource groups already defined:Here you have to actually assign the resources to the resource groups.

smitty hacmp -> Extended Configuration -> Extended Resource Configuration ->

step 29 Adding IP label & RG owned by Node

Add the service IP label for the owner node and also the VGs owned by the owner node

Of this resource group.

step 30 & 31 Synchronize & start Cluster

Synchronize the cluster:

This will sync the info from one node to second node.

Smitty cl_sync

That’s it. Now you are ready to start the cluster.

Smitty clstart

You can start the cluster together on both nodes or start individually on each node.

Step 32 & 33 Check for cluster Stabilize & VG varied on

Wait for the cluster to stabilize. You can check when the cluster is up by following

commands

a. netstat –i

b. ifconfig –a : look-out for service ip. It will show on each node if the cluster is up.

Check whether the VGs under cluster’s RGs are varied-ON and the filesystems in the

VGs are mounted after the cluster start.

Here test1vg and test2vg are VGs which are varied-ON when the cluster is started and

Filesystems /test2 and /test3 are mounted when the cluster starts.

/test2 and /test3 are in test2vg which is part of the RG which is owned by this node.

32. Perform all the tests such as resource take-over, node failure, n/w failure and verify

the cluster before releasing the system to the customer.

Specifying the default gateway on a specific interface

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this: First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration.

Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the

connection, as soon as you drop the current default gateway.

Now you need to determine where your current default gateway is configured. You can do this by

typing: lsattr -El inet0 and netstat -nr. The lsattr command will show you the current default gateway

route and the netstat command will show you the interface it is configured on. You can also check the

ODM: odmget -q"attribute=route" CuAt.

Now, delete the default gateway like this:

lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW

chdev -l inet0 -a delroute=${GW}

If you would now use the route command to specifiy the default gateway on a specific interface, like this:

route add 0 [ip address of default gateway: xxx.xxx.xxx.254] -if enX

A better solution is to use the chdev command:

chdev -l inet0 -a addroute=net,-hopcount,0,,0,[ip address of default gateway]

This will set the default gateway to the first interface available.

To specify the interface use:

chdev -l inet0 -a addroute=net,-hopcount,0,if,enX,,0,[ip address of default gateway]

Substitute the correct interface for enX in the command above.

If you previously used the route add command, and after that you use chdev to enter the default

gateway, then this will fail. You have to delete it first by using route delete 0, and then give the chdev command.

Afterwards, check with lsattr -El inet0 and odmget -q"attribute=route" CuAt if the new default gateway

is properly configured. And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again!

TESTING HACMP: HA FAILOVER SCENARIOS

1. Graceful

For graceful failover, you can run “smitty clstop” then select graceful option. This will not change

anything except stopping the cluster on that node.

Note: If you stop the cluster, check the status using lssrc –g cluster, sometimes clstrmgrES daemon will

take long time to stop, DO NOT KILL THIS DAEMON.It will stop automatically after a while.

You can do this on both the nodes

2. Takeover

For takeover, run “smitty clstop” with takeover option, this will stop the cluster on that node and the standby node will take over the pakage

You can do this on both the nodes

3. Soft Pakckage Failover

Run smitty cm_hacmp_resource_group_and_application_management_menu >>>Move a Resource Group to Another Node >>>>select the package name and node name

>>>enter

This will move the package from that node to the node that you have selected in the above menu.

This method will give lot of troubles in HA 4.5 whereas it runs good on HA 5.2 unless we have any apps startup issues.

You can do this on both the nodes

4. Failover Network Adapter(s):

For this type of testing , run “ifconfig enx down” , then package IP will failover to primary adapter. You can not even see any outage or anything.

We can manually (ifconfig enx up) bring it back to original adapter , but better to reboot the server to bring the package back to the original node

5. Hardware Failure (crash):

This is a standard type of testing; run the command “reboot –q” then the node will godown without

stopping any apps and come up immediately. The package will failover to the standby node with in min os downtime (Even tough HA failover is fast, some apps will take long time to start

HACMP DISK HEARTBEAT BASICS

This example consists of a two-node cluster with shared ESS vpath devices. If

more than two nodes exist in your cluster, you will need N number or non-ip heartbeat networks.

Where N represents the number of nodes in the cluster. (i.e. three node cluster requires 3 non-ip

heartbeat networks). This creates a heartbeat ring. It’s worth noting that one should not confuse concurrent volume groups with concurrent resource groups. And note, there is a difference between concurrent volume groups and enhanced concurrent volume groups. A concurrent resource group is one which may be active on more than one node at a type. A concurrent volume group also shares the characteristic that it may be active on more than one node at a time. This is also true for an enhanced concurrent VG; however, in a non-concurrent resource group, the enhanced concurrent VG, while it may be active and not have a SCSI reserve residing on the

disk, it’s data is only normally accessed by one system at a time.

Pre-Reqs

In this document, it is assumed that the shared storage devices are already made available and

configured to AIX, and that the proper levels of RSCT and HACMP are already installed. Since utilizing enhanced-concurrent volume groups, it is also necessary to make sure that bos.clvm.enh is installed. This is not normally installed as part of a HACMP installation via the installp command.

Disk Heartbeat Details

This provides the ability to use existing shared disks, regardless of disk type, to provide a serial network like heartbeat path. A benefit of this is that one need not dedicate the integrated serial ports for HACMP heartbeats (if supported on the subject systems) or purchase an 8-port asynchronous adapter. This feature utilizes a special area on the disk previously reserved for “Concurrent Capable” volume groups (traditionally only for SSA disks). Since AIX 5.2 dropped support for the SSA concurrent volume groups, this fit makes it available for use. This also means that the disk chosen for serial heartbeat can be part of a data volume group. (Note Performance Concerns below) The disk heart beating code went into the 2.2.1.30 version of RSCT. Some recommended APARs bring that to 2.2.1.31. If you've got that level installed, and HACMP 5.1, you can use disk heart beating. The relevant file to look for is /usr/sbin/rsct/bin/hats_diskhb_nim. Though it is supported mainly through To use disk heartbeats, no node can issue a SCSI reserve for the disk. This is because both nodes using it for heart beating must be able to read and write to that disk. It is sufficient that the disk be in an enhanced concurrent volume group to meet this requirement. (It should also be possible to use a disk that is in no volume group for disk heart beating. RSCT certainly won't care; but HACMP SMIT panels
may not be particularly helpful in setting this up.)

Configuring Disk Heartbeat

As mentioned previously, disk heartbeat utilizes enhanced-concurrent volume groups. If starting with a new configuration of disks, you will want to create enhanced-concurrent volume groups, either manually, or by utilizing C-SPOC. My example shows using C-SPOC which is the best practice to use here. If you plan to use an existing volume group for disk heartbeats that is not enhanced concurrent, then you will have to convert them to such using the chvg command. We recommend that the VG be active on only one node, and that the application not be running when making this change run chvg –C vgname to change the VG to enhanced concurrent mode. Vary it off, then run importvg –L vgname on the other node to make it aware that the vg is now enhanced

To be able to use C-SPOC successfully, it is required that some basic IP based topology already exists, and that the storage devices have their PVIDs in both system’s ODMs. This can be verified by running lspv on each system. If a PVID does not exist on each system, it is necessary to run chdev -l -a pv=yes on each system. This will allow C-SPOC to match up the device(s) as known shared storage devices. In this example, vpath0 on GT40 is the same virtual disk as vpath3 on SL55. Use C-SPOC to create an Enhanced Concurrent volume group. In the following example, since vpath devices are being used, the following smit screen paths were used.

Go to HACMP Concurrent Logical Volume smitty cl_admin Create a Concurrent Volume Group with Data Concurrent Volume Groups Management Path Devices and press Enter

Choose the appropriate nodes, and then choose the appropriate shared storage devices based on pvids (vpath0 and vpath3 in this example). Choose a name for the VG , desired PP size, make sure that Enhanced Concurrent Mode is set to true and press Enter. (enhconcvg in this example). This will create the shared enhanced-concurrent vg needed for our disk heartbeat. It’s a good idea to verify via lspv once this has completed to make sure the device and vg is show

appropriately as follows:

GT40#/ lspv

vpath0 000a7f5af78e0cf4 enhconcvg

SL55#/lspv

vpath3 000a7f5af78e0cf4 enhconcvg

Creating Disk Heartbeat Devices and Network There are two different ways to do this. Since we have already created the enhanced concurrent vg, we

Can use the discovery method (1) and let HA find it for us. Or we can do this manually via the Predefined devices method (2). Following is an example of each.

1) Creating via Discover Method:

Extended Enter smitty hacmp Press Discover HACMP-related Information from Configured

Nodes Configuration Enter

This will run automatically and create a clip_config file that contains the information it has discovered.

Once completed, go back to the Extended Configuration menu and chose:

Extended Topology Add Communication Configure HACMP Communication

Interfaces/Devices Configuration Add Discovered Communication Interface and Interfaces/Devices

Choose appropriate devices (ex. vpath0 and Communication Devices Devices vpath3)

Select Point-to-Point Pair of Discovered Communication Devices to Add

Move cursor to desired item and press F7. Use arrow keys to scroll.

ONE OR MORE items can be selected.

Press Enter AFTER making all selections.

# Node Device Device Path Pvid

> nodeGT40 vpath0 /dev/vpath0 000a7f5af78

> nodeSL55 vpath3 /dev/vpath3 000a7f5af78

Note: Base HA 5.1 appears to have a problem when using the Discovered Devices this method. If you get

this error: "ERROR: Invalid node name 000a7f5af78e0cf4".

Then you will need apar IY51594. Otherwise you will have to create via the Pre-Defined Devices method.

Once corrected, this section will be completed

2) Creating via Pre-Defined Devices Method

When using this method, it is necessary to create a diskhb network first, then assign the disk-node pair

devices to the network. Create the diskhb network as follows:

Extended Topology Extended Configuration smitty hacmp Add a Network to the HACMP cluster

Configure HACMP Networks Configuration Enter desired network name (ex. disknet1)—press choose

diskhb Enter

Extended Topology Extended Configuration smitty hacmp Add Configure HACMP Communication

Interfaces/Devices Configuration Add Pre-Defined Communication Interfaces and Communication

Interfaces/Devices Devices

Choose your diskhb Network Name Communication Devices

Add a Communication Device

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Device Name [GT40_hboverdisk]

* Network Type diskhb

* Network Name disknet1

* Device Path [/dev/vpath0]

* Node Name [GT40]

For Device Name, that is a unique name you can chose. It will show up in your topology under this

name, much like serial heartbeat and ttys have in the past.

For the Device Path, you want to put in /dev/. Then choose the corresponding node for this device and

device name (ex. GT40). Then press Enter.

You will repeat this process for the other node (ex. SL55) and the other device (vpath3). This will

complete both devices for the diskhb network.

Testing Disk Heartbeat Connectivity

Once the device and network definitions have been created, it is a good idea to test it and make sure

communications is working properly. If the volume group is varied on in normal mode on one of the

nodes, the test will probably not work.

/usr/sbin/rsct/bin/dhb_read is used to test the validity of a diskhb connection. The usage of dhb_read is

as follows:

dhb_read -p devicename //dump diskhb sector contents

dhb_read -p devicename -r //receive data over diskhb network

dhb_read -p devicename -t //transmit data over diskhb network

To test that disknet1, in the example configuration, can communicate from nodeB(ex. SL55) to nodeA

(ex. GT40), you would run the following commands:

On nodeA, enter:

dhb_read -p rvpath0 -r

On nodeB, enter:

dhb_read -p rvpath3 -t

Note: That the device name is raw device as designated with the “r” proceeding the device name.

If the link from nodeB to nodeA is operational, both nodes will display:

Link operating normally.

You can run this again and swap which node transmits and which one receives. To make the network

active, it is necessary to sync up the cluster. Since the volume group has not been added to the resource

group, we will sync up once instead of twice.

Add Shared Disk as a Shared Resource

In most cases you would have your diskhb device on a shared data vg. It is necessary to add that vg into

your resource group and synchronize the cluster.

Extended Extended Configuration smitty hacmp Resource Configuration >

Change/Show Extended Resource Group Configuration and press Enter. Resources and Attributes for a

Resource Group

Choose the appropriate resource group, enter the new vg (enhconcvg) into the volume group list and

press Enter.

Return to the top of the Extended Configuration menu and synchronize the cluster.

Monitor Disk Heartbeat

Once the cluster is up and running, you can monitor the activity of the disk (actually all) heartbeats via

lssrc -ls topsvcs. An example of the output follows:

Subsystem Group PID Status

topsvcs topsvcs 32108 active

Network Name Indx Defd Mbrs St Adapter ID Group ID

disknet1 [ 3] 2 2 S 255.255.10.0 255.255.10.1

disknet1 [ 3] rvpath3 0x86cd1b02 0x86cd1b4f

HB Interval = 2 secs. Sensitivity = 4 missed beats

Missed HBs: Total: 0 Current group: 0

Packets sent : 229 ICMP 0 Errors: 0 No mbuf: 0

Packets received: 217 ICMP 0 Dropped: 0

NIM's PID: 28724

Be aware that there is a grace period for heartbeats to start processing. This is normally around 60

seconds. So if you run this command quickly after starting the cluster, you may not see anything at all

until heartbeat processing is started after the grace period time has elapsed.

HACMP LOG FILES

/usr/sbin/cluster/etc/rhosts --- to accept incoming communication from clcomdES (cluster

communucation enahanced security)

/usr/es/sbin/cluster/etc/rhosts

Note: If there is an unresolvable label in the /usr/es/sbin/cluster/etc/rhosts file,

then all clcomdES connections from remote nodes will be denied.

cluster manager clstrmgrES

cluster lock Daemon (clockdES)

cluster multi peer extension communication daemon (clsmuxpdES)

The clcomdES is used for cluster configuration operations such as cluster synchronisation

cluster management (C-SPoC) * Dynamic re-configuration DARE configuration. (DARE ) operation.

For clcomdES there should be atleast 20 MB free space in /var file system.

/var/hacmp/clcomd/clcomd.log --it requires 2 MB

/var/hacmp/clcomd/clcomdiag.log --it requires 18MB

Additional 1 MB required for

/var/hacmp/odmcache directory

clverfify.log also present in /var directory

/var/hacmp/clverify/current//* contains log for mcurrent execution of clverify

/var/hacmp/clverify/pass//* contains logs from the last passed verification

/var/hacmp/clverify/pass.prev//* contains log from the second last passed verification

TANTI TECHNOLOGIES

Tanti Technology

Monday, 11 November 2013

HACMP QUICK REFERENCE AND INTERVIEW PREPERATION

No comments:

Post a Comment