Few points about HACMP verification and
Synchronization which I think few have got some doubts.
Verifying and synchronizing your HACMP
cluster assures you that all resources used by HACMP areconfigured
appropriately and that rules regarding resource ownership and resource takeover
are in agreement across all nodes. You should verify and synchronize your
cluster configuration aftermaking any change within a cluster. For example, any
change to the hardware operating system, node configuration, or cluster
configuration.
Whenever you configure, reconfigure, or update a
cluster, run the cluster verification procedure to ensure that all nodes agree
on the cluster topology, network configuration, and the ownership and takeover
of HACMP resources. If the verification succeeds, the configuration can be
synchronized.Synchronization takes effect immediately on an active cluster. A
dynamic reconfiguration event isrun and the changes are committed to the active
cluster.
Note :
If you are using the SMIT Initialization
and Standard Configuration path, synchronization automatically follows
a successful verification. If you are using the Extended Configuration
path, you have more options for types of verification. If you are using
the Problem Determination Tools path, you can choose whether to synchronize or
not.
Typically, the log is reported to /var/hacmp/clverify/clverify.log
Running Cluster Verification
After making a change to the cluster, you can
perform cluster verification in several ways.
These methods include:
Automatic verification:
You can automatically verify your cluster:
Each
time you start cluster services on a node
Each
time a node rejoins the cluster-
Every
24 hours.
By
default, automatic verification is enabled to run at midnight.
Manual verification:
Using the SMIT interface,
you
can either verify the complete configuration,
or
only the changes made since the last time the utility was run.
Typically,
you should run verification whenever you add or change anything in your
cluster
configuration. For detailed instructions, see Verifying the HACMP configuration
using
SMIT.
Automatic Verification :
You can Disable this Automatic
verification during Cluster Startup under
Extended Configuration >> Extended
Cluster Service Settings >>>>>>>> BUT
DONT DO IT IF NOT ADVICED.
Understading Verification Process
The phases of the verification and synchronization
process are as follows:
Verification
Snapshot (optional)
Synchronization.
Phase one: Verification
During the verification process the default
system configuration directory (DCD) is compared
with the active configuration. On an inactive cluster
node, the verification process compares
the local DCD across all nodes. On an active
cluster node, verification propagates a copy of
the active configuration to the joining nodes.
If a node that was once previously synchronized
has a DCD that does not match the ACD of an already
active cluster node, the ACD of an active node
is propagated to the joining node. This new information
does not replace the DCD of the joining nodes;
it is stored in a temporary directory for the purpose
of running verification against it.
HACMP displays progress indicators as the
verification is performed.
Note: When you attempt to start a node that has
an invalid cluster configuration, HACMP transfers a
valid configuration database data structure to
it, which may consume 1-2 MB of disk space. If the
verification phase fails, cluster services will
not start.
Phase two: (Optional) Snapshot
A snapshot is only taken if a node request to
start requires an updated configuration. During the
snapshot phase of verification, HACMP records
the current cluster configuration to a snapshot file
- for backup purposes. HACMP names this snapshot
file according to the date of the snapshot and the
name of the cluster. Only one snapshot is
created per day. If a snapshot file exists and its filename
contains the current date, it will not be
overwritten.
This snapshot is written to the
/usr/es/sbin/cluster/snapshots/ directory.
The snapshot filename uses the syntax
MM-DD-YYYY-ClusterName -autosnap.odm. For example, a snapshot
taken on April 2, 2006 on a cluster hacluster01
would be named usr/es/sbin/cluster/snapshots/04-02
-06hacluster01-autosnap.odm.
Phase three: Synchronization
During the synchronization phase of
verification, HACMP propagates information to all cluster nodes
. For an inactive cluster node, the DCD is
propagated to the DCD of the other nodes. For an active
cluster node, the ACD is propagated to the DCD.
If the process succeeds, all nodes are
synchronized and cluster services start. If synchronization
fails, cluster services do not start and HACMP
issues an error.
Conditions that can trigger Corrective Action :
https://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.hacmp.admngd/ha_admin
_trigger_corrective.htm
This topic discusses conditions that can trigger
a corrective action.
HACMP shared volume group time stamps are not
up-to-date on a node
If the shared volume group time stamp file does
not exist on a node, or the time stamp files do not match on all nodes, the
corrective action ensures that all nodes have the latest up-to-date VGDA time
stamp for the volume group and imports the volume group on all cluster nodes
where the shared volume group was out of sync with the latest volume group
changes. The corrective action ensures that volume groups whose definitions
have changed will be properly imported on a node that does not have the latest
definition.
The /etc/hosts file on a node does not contain
all HACMP-managed IP addresses
If an IP label is missing, the corrective action
modifies the file to add the entry and saves a copy of the old version to
/etc/hosts.date. If a backup file already exists for that day, no additional
backups are made for that day.
Verification does the following:
If the /etc/hosts entry exists but is commented
out, verification adds a new entry; comment lines are ignored.
If the label specified in the HACMP
Configuration does not exist in /etc/hosts , but the IP address is defined in
/etc/hosts, the label is added to the existing /etc/hosts entry. If the label
is different between /etc/hosts and the HACMP configuration, then verification
reports a different error message; no corrective action is taken.
If the entry does not exist, meaning both the IP
address and the label are missing from /etc/hosts, then the entry is added.
This corrective action takes place on a node-by-node basis. If different nodes
report different IP labels for the same IP address, verification catches these cases
and reports an error. However, this error is unrelated to this corrective
action. Inconsistent definitions of an IP label defined to HACMP are not
corrected.
SSA concurrent volume groups need unique SSA
node numbers
If verification finds that the SSA node numbers
are not unique, the corrective action changes the number of one of the nodes
where the number is not unique. See the Installation Guide for more information
on SSA configuration.
A file system is not created on a node, although
disks are available
If a file system has not been created on one of
the cluster nodes, but the volume group is available, the corrective action
creates the mount point and file system. The file system must be part of a
resource group for this action to take place. In addition, the following
conditions must be met:
This is a shared volume group.
The volume group must already exist on at least
one node.
One or more node(s) that participate in the
resource group where the file system is defined must already have the file system
created.
The file system must already exist within the
logical volume on the volume group in such a way that simply re-importing that
volume group would acquire the necessary file system information.
The mount point directory must already exist on
the node where the file system does not exist.
The corrective action handles only those mount
points that are on a shared volume group, such that exporting and re-importing
of the volume group will acquire the missing file systems available on that
volume group. The volume group is varied off on the remote node(s), or the
cluster is down and the volume group is then varied off if it is currently
varied on, prior to executing this corrective action.
If Mount All File Systems is specified in the
resource group, the node with the latest time stamp is used to compare the list
of file systems that exists on that node with other nodes in the cluster. If
any node is missing a file system, then HACMP imports the file system.
Disks are available, but the volume group has
not been imported to a node
If the disks are available but the volume group
has not been imported to a node that participates in a resource group where the
volume group is defined, then the corrective action imports the volume group.
The corrective action gets the information
regarding the disks and the volume group major number from a node that already
has the volume group available. If the major number is unavailable on a node,
the next available number is used.
The corrective action is only performed under
the following conditions:
The cluster is down.
The volume group is varied off if it is
currently varied on.
The volume group is defined as a resource in a
resource group.
The major number and associated PVIDS for the
disks can be acquired from a cluster node that participates in the resource
group where the volume group is defined.
Note: This functionality will not turn off the
auto varyon flag if the volume group has the attribute set. A separate
corrective action handles auto varyon.
Shared volume groups configured as part of an
HACMP resource group have their automatic varyon attribute set to Yes.
If verification finds that a shared volume group
inadvertently has the auto varyon attribute set to Yes on any node, the
corrective action automatically sets the attribute to No on that node.
Required /etc/services entries are missing on a
node.
If a required entry is commented out, missing,
or invalid in /etc/services on a node, the corrective action adds it. Required
entries are:
Name Port Protocol
topsvcs 6178 udp
grpsvcs 6179 udp
clinfo_deadman 6176 udp
clcomd 6191 tcp
Required HACMP snmpd entries are missing on a
node
If a required entry is commented out, missing,
or invalid on a node, the corrective action adds it.
Note: The default version of the snmpd.conf file
for AIX® is snmpdv3.conf.
In /etc/snmpdv3.conf or /etc/snmpd.conf, the
required HACMP snmpd entry is:
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
# HACMP/ES for AIX clsmuxpd
In /etc snmpd.peers, the required HACMP snmpd entry
is:
clsmuxpd 1.3.6.1.4.1.2.3.1.2.1.5
"clsmuxpd_password" # HACMP/ES for AIX clsmuxpd
If changes are required to the /etc/snmpd.peers
or snmpd[v3].conf file, HACMP creates a backup of the original file. A copy of
the pre-existing version is saved prior to making modifications in the file
/etc/snmpd.{peers | conf}.date. If a backup has already been made of the
original file, then no additional backups are made.
HACMP makes one backup per day for each snmpd
configuration file. As a result, running verification a number of times in one
day only produces one backup file for each file modified. If no configuration
files are changed, HACMP does not make a backup.
Required RSCT network options settings
HACMP requires that the nonlocsrcroute, ipsrcroutesend,
ipsrcrouterecv, and ipsrcrouteforward network options be set to 1; these are
set by RSCT's topsvcs startup script. The corrective action run on inactive
cluster nodes ensures these options are not disabled and are set correctly.
Required HACMP network options setting
The corrective action ensures that the value of
each of the following network options is consistent across all nodes in a
running cluster (out-of-sync setting on any node is corrected):
tcp_pmtu_discover
udp_pmtu_discover
ipignoreredirects
Required routerevalidate network option setting
Changing hardware and IP addresses within HACMP
changes and deletes routes. Because AIX caches routes, setting the
routerevalidate network option is required as follows:
no -o routerevalidate=1
This setting ensures the maintenance of
communication between cluster nodes. Verification run with corrective action
automatically adjusts this setting for nodes in a running cluster.
Note: No corrective actions take place during a
dynamic reconfiguration event.
Corrective actions when using IPv6
If you configure an IPv6 address, the
verification process can perform 2 more corrective actions:
Neighbor discovery (ND). Network interfaces must
support this protocol which is specific to IPv6. The underlying network
interface card is checked for compatibility with ND and the ND related daemons
will be started.
Configuration of Link Local addresses (LL). A
special link local (LL) address is required for every network interface that
will be used with IPv6 addresses. If a LL address is not present the autoconf6
program will be run to configure one.
No comments:
Post a Comment