Tanti Technology

My photo
Bangalore, karnataka, India
Multi-platform UNIX systems consultant and administrator in mutualized and virtualized environments I have 4.5+ years experience in AIX system Administration field. This site will be helpful for system administrator in their day to day activities.Your comments on posts are welcome.This blog is all about IBM AIX Unix flavour. This blog will be used by System admins who will be using AIX in their work life. It can also be used for those newbies who want to get certifications in AIX Administration. This blog will be updated frequently to help the system admins and other new learners. DISCLAIMER: Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility. If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.

Sunday 19 June 2011

HACMP topology & usefull commands

HACMP topology & usefull commands
Hacmp can be configured in 3 ways.

1. Rotating
2. Cascading
3. Mutual Failover

The cascading and rotating resource groups are the “classic”, pre-HA 5.1 types. The new “custom” type of resource group has been introduced in HA 5.1 onwards.


Cascading resource group:
Upon node failure, a cascading resource group falls over to the available node with the next priority in the node priority list.
Upon node reintegration into the cluster, a cascading resource group falls back to its home node by default.

Cascading without fallback
Thisoption, this means whenever a primary node fails, the package will failover to the next available node in the list and when the primary node comes online then the package will not fallback automatically. We need to move package to its home node at a convenient time.

Rotating resource group:
This is almost similar to Cascading without fallback, whenever package failover to the standby nodes it will never fallback to the primary node automatically, we need to move it manually at our convenience.

Mutual takeover:
Mutual takeover option, which means both the nodes in this type are active-active mode. Whenever fail over happens the package on the failed node will move to the other active node and will run with already existing package. Once the failed node comes online we can move the package manually to that node.

Useful HACMP commands

clstat - show cluster state and substate; needs clinfo.
cldump - SNMP-based tool to show cluster state
cldisp - similar to cldump, perl script to show cluster state.
cltopinfo - list the local view of the cluster topology.
clshowsrv -a - list the local view of the cluster subsystems.
clfindres (-s) - locate the resource groups and display status.
clRGinfo -v - locate the resource groups and display status.
clcycle - rotate some of the log files.
cl_ping - a cluster ping program with more arguments.
clrsh - cluster rsh program that take cluster node names as argument.
clgetactivenodes - which nodes are active?
get_local_nodename - what is the name of the local node?
clconfig - check the HACMP ODM.
clRGmove - online/offline or move resource groups.
cldare - sync/fix the cluster.
cllsgrp - list the resource groups.
clsnapshotinfo - create a large snapshot of the hacmp configuration.
cllscf - list the network configuration of an hacmp cluster.
clshowres - show the resource group configuration.
cllsif - show network interface information.
cllsres - show short resource group information.
lssrc -ls clstrmgrES - list the cluster manager state.
lssrc -ls topsvcs - show heartbeat information.
cllsnode - list a node centric overview of the hacmp configuration.
lsattr -El inet0 and netstat -nr. The lsattr command will show you the current default

Now, delete the default gateway like this:
lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW
chdev -l inet0 -a delroute=${GW}

Specifying the default gateway on a specific interface in HACMP
Specifying the default gateway on a specific interface

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this:

First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration.

Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway.

Now you need to determine where your current default gateway is configured. You can do this by typing: lsattr -El inet0 and netstat -nr. The lsattr command will show you the current default gateway route and the netstat command will show you the interface it is configured on. You can also check the ODM: odmget -q"attribute=route" CuAt.

Now, delete the default gateway like this:
lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW
chdev -l inet0 -a delroute=${GW}

If you would now use the route command to specifiy the default gateway on a specific interface, like this:
route add 0 [ip address of default gateway: xxx.xxx.xxx.254] -if enX
You will have a working entry for the default gateway. But... the route command does not change anything in the ODM. As soon as your system reboots; the default gateway is gone again. Not a good idea.

A better solution is to use the chdev command:
chdev -l inet0 -a addroute=net,-hopcount,0,,0,[ip address of default gateway]
This will set the default gateway to the first interface available.

To specify the interface use:
chdev -l inet0 -a addroute=net,-hopcount,0,if,enX,,0,[ip address of default gateway]
Substitute the correct interface for enX in the command above.

If you previously used the route add command, and after that you use chdev to enter the default gateway, then this will fail. You have to delete it first by using route delete 0, and then give the chdev command.

Afterwards, check with lsattr -El inet0 and odmget -q"attribute=route" CuAt if the new default gateway is properly configured. And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again!

Steps 1 to 17 to configure HACMP
Steps to configure HACMP:

1. Install the nodes, make sure the redundancy is maintained for power supplies, n/w and
fiber n/ws. Then Install AIX on the nodes.
2. Install all the HACMP filesets except HAview and HATivoli.
Install all the RSCT filesets from the AIX base CD.
Make sure that the AIX, HACMP patches and server code are at the latest level (ideally
recommended).
4. Check for fileset bos.clvm to be present on both the nodes. This is required to make the
VGs enhanced concurrent capable.
5. V.IMP: Reboot both the nodes after installing the HACMP filesets.
6. Configure shared storage on both the nodes. Also in case of a disk heartbeat, assign a
1GB shared storage LUN on both nodes.
7. Create the required VGs only on the first node. The VGs can be either normal VGs or
Enhanced concurrent VGs. Assign particular major number to each VGs while creating
the VGs. Record the major no. information.
To check the Majar no. use the command:
ls –lrt /dev grep
Mount automatically at system restart should be set to NO.
8. Varyon the VGs that was just created.
9. V.IMP: Create log LV on each VG first before creating any new LV. Give a unique
name to logLV.
Destroy the content of logLV by: logform /dev/loglvname
Repeat this step for all VGs that were created.
10. Create all the necessary LVs on each VG.
11. Create all the necessary file systems on each LV created…..you can create mount pts
as per the requirement of the customer,
Mount automatically at system restart should be set to NO.
12. umount all the filesystems and varyoff all the VGs.

13. chvg –an All VGs will be set to do not mount automatically at _---
System restart.
14. Go to node 2 and run cfgmgr –v to import the shared volumes.
15. Import all the VGs on node 2
import with the same major number as assigned on node _use smitty importvg -----
16. Run chvg –an for all VGs on node 2.
17. V.IMP: Identify the boot1, boot2, service ip and persistent ip for both the nodes
and make the entry in the /etc/hosts.

HACMP cluster

HACMP cluster
SANDEEP TANTI


Table of Contents
1. definition
1.1. What is HACMP?
1.2. What is SNMP?
1.3. What are the ways of restarting?
2. daemons
2.1. What are the demons HACMP?
2.2. How to know the status of demons HACMP?
3. Logs
3.1. Where are the logs HACMP?
3.2. Recall orders mistake?
3.3. get the error notify?
3.4. Events know how the entrant to the cluster?
4. Identification
4.1. How to identify groups of resources?
4.2. Determine how the resources of a group?
4.3. How to check the location of resources?
4.4. How to display information on all the nodes?
4.5. How to test a disk?
4.6. Finding filesystems mounted with a group voclume?
4.7. How to tell if the volume sharedvg group is currently online?
4.8. How to know the processes that occupy a shared filesystem?
4.9. What are the controls configuration for HACMP know?
4.10. How about the link between physical disks and logical in SSA?
4.11. How to identify network?
5. outages
5.1. How to plant a service?
5.2. How to replace a disk?
5.3. How to change your network card?
6. configuration
6.1. Completing / etc / hosts?
6.2. How do I change the order of the resolutions of names?
6.3. How to create shared disks?
6.4. How to create an alias address?

1. definition
1.1. What is HACMP?
HACMP is the abbreviation of High Availability Cluster Multi-Processing.
1.2. What is SNMP?
Simple Network Management Protocol.
1.3. What are the ways of restarting?
1. HACMP forced stops but resources are conserved
2. HACMP graceful stops but no rocking
3. HACMP takeover stops and there is rocking


2. daemons
2.1. What are the demons HACMP?
1. cluster manager - clstrmgr
2. SMUX peer cluster daemon-clsmuxpd
3. Cluster information services clinfo
4. cluster manager-lock cllockd
5. cluster topology services daemon-topsvcs
6. cluster group services daemon-grpsvcs
7. Global cluster server daemon-grplsm
8. Cluster event management daemon-emsvcs

2.2. How to know the status of demons HACMP?
# lssrc-g cluster
lssrc-g # topsvrc or lssrc-s topsvcs
lssrc-g # grpsvrc or lssrc-s grpsvcs
lssrc-g # grpsvrc or lssrc-s grpgplsm


3. Logs
3.1. Where are the logs HACMP?
1. / var / adm / cluster.log
2. / tmp / hacmp.out
3. / tmp / emuhacmp.out
4. AIX system error log
5. / usr / sbin / cluster / history / clutser.mmdd
6. / tmp / cm.log
7. / tmp / csproc.log

3.2. Recall orders mistake?
# t-errpt
# errpt-a | grep msg

3.3. get the error notify?
# odmget errnotify | grep-i-p x25> foo
# omdadd toto
The modifciations located in the file foo
note: to remove odmdelete
3.4. Know how the events coming to the cluster?
EVENT # grep / tmp / hacmp.out
# cldiag


4. Identification
4.1. How to identify groups of resources?
# clfindres
# clfindres-s

4.2. Determine how the resources of a group?
# clshowres

4.3. How to check the location of resources?
# netstat-i
# df
lsvg #-o

4.4. How to display information on all the nodes?
1. clstat for the console
2. xclstat mode for X11

4.5. How to test a disk?
# dd if = / dev/pdisk0 of = / dev / null bs = 4096
Then test the code return
4.6. Finding filesystems mounted with a group voclume?
Use the script that is among Matilda script.
4.7. How to tell if the volume sharedvg group is currently online?
# if lsvg-o | grep-q-w sharedvg; then
echo sharedvg is online
else
echo sharedvg is offline
fi

4.8. How to know the processes that occupy a shared filesystem?
# fuser-k / dev / sharedlv

4.9. What are the controls configuration for HACMP know?
1. lsattr-El ent0
2. lsattr-El scsi0
3. odmget CnAt | grep-p en0
4. Cc-disk lsdev
5. lsdev-Cc SSAR
6. Cc lsdev-scsi
7. lspv
8. lsvg-o

4.10. How about the link between physical disks and logical in SSA?
# ssaxlate-l hdisk5
# ssaxlate-l pdisk7

4.11. How to identify network?
Determination of roads
# netstat-rn
Determination addresses hardware
# netstat-rn


5. outages
5.1. How to plant a service?
# Cat / etc / file> / dev / kmem

5.2. How to replace a disk?
On the first node:
# rmdev-dl hdisk3
cfgmgr-l scsi2
exportvg sharelvg
recreatevg it sharelvg hdisk3
Attention, the name of LV and FS has changed. We must rename then edit / etc / filesystem <. On the other node: # rmdev-dl hdisk3 # cfgmgr-l scsi2 # importvg it sharelvg hdisk3 c # chdev-l-a hdisk3 pv = clear The pvil passes NONE status. # chdev-l-a hdisk3 pv = yes # recreatevg it sharelvg hdisk3 5.3. How to change your network card? # ifconfig down en0 # rmdev-l en0 # rmdev-l ent0 # chdev-the ent0 '-a media_speed ='100_Full_Duplex' # mkdev-l ent0 # mkdev-l en0 # ifconfig up en0 Attention, never with Auto HACMP. 6. configuration 6.1. Completing / etc / hosts? / etc / hosts is ideal for configurations with more than 25 knots. It must contain all the IP addresses of nodes in the cluster. 6.2. How do I change the order of the resolutions of names? > Edit / etc / netsvc.conf so it looks like this:
= local hosts, nis, bind, dns
The order is of course one you have chosen. Here, the room is the first, second nis ...
You can also delete one way.
NSORDER attention variable premium on the file / etc / netsvc.conf.
For HACMP take into account NIS and DNS requires that the corresponding services are launched.
# Smit HACMP
then cluster configuration, cluster resources, configure run time parameters
6.3. How to create shared disks?
mklv # 's jfslog' sharelvg 1-y 'jfslog2'
# logform / dev/jfslog2
# mklv Does' jfs' sharelvg-y 'mylv' 1
CRFs # 's jfs'-g' sharelvg 'd' mylv '-m / mydata-a log = jfslog2

6.4. How to create an alias address?
ifconfig en0 alias 10.6.100.10
To avoid problems, we must be careful not to use the address to boot.

HACMP Basics

HACMP Basics
SANDEEP TANTI

History
IBM's HACMP exists for almost 15 years. It's not actually an IBM product, they bought it from CLAM, which was later renamed to Availant and is now called LakeViewTech. Until august 2006, all development of HACMP was done by CLAM. Nowadays IBM does it's own development of HACMP in Austin, Poughkeepsie and Bangalore

IBM's high availability solution for AIX, High Availability Cluster Multi Processing (HACMP), consists of two components:

•High Availability: The process of ensuring an application is available for use through the use of duplicated and/or shared resources (eliminating Single Points Of Failure – SPOF's)

.Cluster Multi-Processing: Multiple applications running on the same nodes with shared or concurrent access to the data.

A high availability solution based on HACMP provides automated failure detection, diagnosis, application recovery and node reintegration. With an appropriate application, HACMP can also provide concurrent access to the data for parallel processing applications, thus offering excellent horizontal scalability.

What needs to be protected? Ultimately, the goal of any IT solution in a critical environment is to provide continuous service and data protection.

The High Availability is just one building block in achieving the continuous operation goal. The High Availability is based on the availability hardware, software (OS and its components), application and network components.

The main objective of the HACMP is to eliminate Single Points of Failure (SPOF's)

“…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs)…”


Eliminate Single Point of Failure (SPOF)
Cluster Eliminated as a single point of failure

Node Using multiple nodes
Power Source Using Multiple circuits or uninterruptible
Network/adapter Using redundant network adapters
Network Using multiple networks to connect nodes.
TCP/IP Subsystem Using non-IP networks to connect adjoining nodes & clients
Disk adapter Using redundant disk adapter or multiple adapters
Disk Using multiple disks with mirroring or RAID
Application Add node for takeover; configure application monitor
Administrator Add backup or every very detailed operations guide
Site Add additional site.


Cluster Components

Here are the recommended practices for important cluster components.


Nodes

HACMP supports clusters of up to 32 nodes, with any combination of active and standby nodes. While it
is possible to have all nodes in the cluster running applications (a configuration referred to as "mutual
takeover"), the most reliable and available clusters have at least one standby node - one node that is normally
not running any applications, but is available to take them over in the event of a failure on an active
node.

Additionally, it is important to pay attention to environmental considerations. Nodes should not have a
common power supply - which may happen if they are placed in a single rack. Similarly, building a cluster
of nodes that are actually logical partitions (LPARs) with a single footprint is useful as a test cluster, but
should not be considered for availability of production applications.
Nodes should be chosen that have sufficient I/O slots to install redundant network and disk adapters.
That is, twice as many slots as would be required for single node operation. This naturally suggests that
processors with small numbers of slots should be avoided. Use of nodes without redundant adapters
should not be considered best practice. Blades are an outstanding example of this. And, just as every cluster
resource should have a backup, the root volume group in each node should be mirrored, or be on a

RAID device.
Nodes should also be chosen so that when the production applications are run at peak load, there are still
sufficient CPU cycles and I/O bandwidth to allow HACMP to operate. The production application
should be carefully benchmarked (preferable) or modeled (if benchmarking is not feasible) and nodes chosen
so that they will not exceed 85% busy, even under the heaviest expected load.
Note that the takeover node should be sized to accommodate all possible workloads: if there is a single
standby backing up multiple primaries, it must be capable of servicing multiple workloads. On hardware
that supports dynamic LPAR operations, HACMP can be configured to allocate processors and memory to
a takeover node before applications are started. However, these resources must actually be available, or
acquirable through Capacity Upgrade on Demand. The worst case situation – e.g., all the applications on
a single node – must be understood and planned for.

Networks

HACMP is a network centric application. HACMP networks not only provide client access to the applications
but are used to detect and diagnose node, network and adapter failures. To do this, HACMP uses
RSCT which sends heartbeats (UDP packets) over ALL defined networks. By gathering heartbeat information
on multiple nodes, HACMP can determine what type of failure has occurred and initiate the appropriate
recovery action. Being able to distinguish between certain failures, for example the failure of a network
and the failure of a node, requires a second network! Although this additional network can be “IP
based” it is possible that the entire IP subsystem could fail within a given node. Therefore, in addition
there should be at least one, ideally two, non-IP networks. Failure to implement a non-IP network can potentially
lead to a Partitioned cluster, sometimes referred to as 'Split Brain' Syndrome. This situation can
occur if the IP network(s) between nodes becomes severed or in some cases congested. Since each node is
in fact, still very alive, HACMP would conclude the other nodes are down and initiate a takeover. After
takeover has occurred the application(s) potentially could be running simultaneously on both nodes. If the
shared disks are also online to both nodes, then the result could lead to data divergence (massive data corruption).
This is a situation which must be avoided at all costs.

The most convenient way of configuring non-IP networks is to use Disk Heartbeating as it removes the
problems of distance with rs232 serial networks. Disk heartbeat networks only require a small disk or
LUN. Be careful not to put application data on these disks. Although, it is possible to do so, you don't want
any conflict with the disk heartbeat mechanism!

Adapters

As stated above, each network defined to HACMP should have at least two adapters per node. While it is
possible to build a cluster with fewer, the reaction to adapter failures is more severe: the resource group
must be moved to another node. AIX provides support for Etherchannel, a facility that can used to aggregate
adapters (increase bandwidth) and provide network resilience. Etherchannel is particularly useful for
fast responses to adapter / switch failures. This must be set up with some care in an HACMP cluster.
When done properly, this provides the highest level of availability against adapter failure. Refer to the IBM
techdocs website: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101785 for further
details.
Many System p TM servers contain built-in Ethernet adapters. If the nodes are physically close together, it
is possible to use the built-in Ethernet adapters on two nodes and a "cross-over" Ethernet cable (sometimes
referred to as a "data transfer" cable) to build an inexpensive Ethernet network between two nodes for
heart beating. Note that this is not a substitute for a non-IP network.
Some adapters provide multiple ports. One port on such an adapter should not be used to back up another
port on that adapter, since the adapter card itself is a common point of failure. The same thing is true
of the built-in Ethernet adapters in most System p servers and currently available blades: the ports have a
common adapter. When the built-in Ethernet adapter can be used, best practice is to provide an additional
adapter in the node, with the two backing up each other.
Be aware of network detection settings for the cluster and consider tuning these values. In HACMP terms,
these are referred to as NIM values. There are four settings per network type which can be used : slow,
normal, fast and custom. With the default setting of normal for a standard Ethernet network, the network
failure detection time would be approximately 20 seconds. With todays switched network technology this
is a large amount of time. By switching to a fast setting the detection time would be reduced by 50% (10
seconds) which in most cases would be more acceptable. Be careful however, when using custom settings,
as setting these values too low can cause false takeovers to occur. These settings can be viewed using a variety
of techniques including : lssrc –ls topsvcs command (from a node which is active) or odmget
HACMPnim |grep –p ether and smitty hacmp.

Applications
The most important part of making an application run well in an HACMP cluster is understanding the
application's requirements. This is particularly important when designing the Resource Group policy behavior
and dependencies. For high availability to be achieved, the application must have the ability to
stop and start cleanly and not explicitly prompt for interactive input. Some applications tend to bond to a
particular OS characteristic such as a uname, serial number or IP address. In most situations, these problems
can be overcome. The vast majority of commercial software products which run under AIX are well
suited to be clustered with HACMP.

Application Data Location
Where should application binaries and configuration data reside? There are many arguments to this discussion.
Generally, keep all the application binaries and data were possible on the shared disk, as it is easy
to forget to update it on all cluster nodes when it changes. This can prevent the application from starting or
working correctly, when it is run on a backup node. However, the correct answer is not fixed. Many application
vendors have suggestions on how to set up the applications in a cluster, but these are recommendations.
Just when it seems to be clear cut as to how to implement an application, someone thinks of a new
set of circumstances. Here are some rules of thumb:
If the application is packaged in LPP format, it is usually installed on the local file systems in rootvg. This
behavior can be overcome, by bffcreate’ing the packages to disk and restoring them with the preview option.
This action will show the install paths, then symbolic links can be created prior to install which point
to the shared storage area. If the application is to be used on multiple nodes with different data or configuration,
then the application and configuration data would probably be on local disks and the data sets on
shared disk with application scripts altering the configuration files during fallover. Also, remember the
HACMP File Collections facility can be used to keep the relevant configuration files in sync across the cluster.
This is particularly useful for applications which are installed locally.

Start/Stop Scripts
Application start scripts should not assume the status of the environment. Intelligent programming should
correct any irregular conditions that may occur. The cluster manager spawns theses scripts off in a separate
job in the background and carries on processing. Some things a start script should do are:
First, check that the application is not currently running! This is especially crucial for v5.4 users as
resource groups can be placed into an unmanaged state (forced down action, in previous versions).
Using the default startup options, HACMP will rerun the application start script which may cause
problems if the application is actually running. A simple and effective solution is to check the state
of the application on startup. If the application is found to be running just simply end the start script
with exit 0.
Verify the environment. Are all the disks, file systems, and IP labels available?
If different commands are to be run on different nodes, store the executing HOSTNAME to variable.
Check the state of the data. Does it require recovery? Always assume the data is in an unknown state
since the conditions that occurred to cause the takeover cannot be assumed.
Are there prerequisite services that must be running? Is it feasible to start all prerequisite services
from within the start script? Is there an inter-resource group dependency or resource group sequencing
that can guarantee the previous resource group has started correctly? HACMP v5.2 and later has
facilities to implement checks on resource group dependencies including collocation rules in
HACMP v5.3.
Finally, when the environment looks right, start the application. If the environment is not correct and
error recovery procedures cannot fix the problem, ensure there are adequate alerts (email, SMS,
SMTP traps etc) sent out via the network to the appropriate support administrators.
Stop scripts are different from start scripts in that most applications have a documented start-up routine
and not necessarily a stop routine. The assumption is once the application is started why stop it? Relying
on a failure of a node to stop an application will be effective, but to use some of the more advanced features
of HACMP the requirement exists to stop an application cleanly. Some of the issues to avoid are:


Be sure to terminate any child or spawned processes that may be using the disk resources. Consider
implementing child resource groups.
Verify that the application is stopped to the point that the file system is free to be unmounted. The
fuser command may be used to verify that the file system is free.
In some cases it may be necessary to double check that the application vendor’s stop script did actually
stop all the processes, and occasionally it may be necessary to forcibly terminate some processes.
Clearly the goal is to return the machine to the state it was in before the application start script was run.
Failure to exit the stop script with a zero return code as this will stop cluster processing. * Note: This is not the case with start scripts!
Remember, most vendor stop/starts scripts are not designed to be cluster proof! A useful tip is to have stop
and start script verbosely output using the same format to the /tmp/hacmp.out file. This can be achieved
by including the following line in the header of the script: set -x && PS4="${0##*/}"'[$LINENO]


HACMP
HACMP Daemons
HACMP Log files
HACMP Startup and Shutdown
HACMP Version 5.x
What is new in HACMP 5.x
Cluster Communication Daemon
Heart Beating
Forced Varyon of Volume Groups
Custom Resource Group
Application Monitoring
Resource Group Tasks

HACMP Daemon

01. clstrmgr
02. clinfo
03. clmuxpd
04. cllockd

HACMP Log files

/tmp/hacmp.out: It records the output generated by the event scripts as they execute. When checking the /tmp/hacmp.out file, search for EVENT FAILED messages. These messages indicate that a failure has occurred. Then, starting from the failure message, read back through the log file to determine exactly what went wrong.
The /tmp/hacmp.out file is a standard text file. The system creates a new hacmp.out log file every day and retains the last seven copies. Each copy is identified by a number appended to the file name. The most recent log file is named /tmp/hacmp.out; the oldest version of the file is named /tmp/hacmp.out.7
/usr/es/adm/cluster.log: It is the main HACMP log file. HACMP error messages and messages about HACMP-related events are appended to this log with the time and date at which they occurred
/usr/es/sbin/cluster/history/cluster.mmddyyyy: It contains time-stamped, formatted messages generated by HACMP scripts. The system creates a cluster history file whenever cluster events occur, identifying each file by the file name extension mmddyyyy, where mm indicates the month, dd indicates the day, and yyyy indicates the year.
/tmp/cspoc.log: It contains time-stamped, formatted messages generated by HACMP C-SPOC commands. The /tmp/cspoc.log file resides on the node that invokes the C-SPOC command.
/tmp/emuhacmp.out: It records the output generated by the event emulator scripts as they
execute. The /tmp/emuhacmp.out file resides on the node from which the event emulator is
invoked.

HACMP Startup and shutdown

HACMP startup option:
Cluster to re-aquire resources: If cluster services were stopped with the forced option, hacmp expects all cluster resources on this node to be in the same state when cluster services are restarted. If you have changed the state of any resources while cluster services were forced down, you can use this option to have hacmp reacquire resources during startup.
HACMP Shutdown Modes:
Graceful: Local machine shuts itself gracefully. Remote machine interpret this as a graceful down and do not takeover resources
Takeover: Local machine shuts itself down gracefully. Remote machine interpret this as a non-graceful down and takeover resources
Forced: Local machine shuts down cluster services without releasing any resources. Remote machine do not take over any resources. This mode is use ful for system maintenence.

HACMP 5.x

New in AIX 5.1

• SMIT Standard and Extended configuration paths (procedures)
• Automated configuration discovery
• Custom resource groups
• Non IP networks based on heartbeating over disks
• Fast disk takeover
• Forced varyon of volume groups
• Heartbeating over IP aliases
• Heartbeating over disks
• Heartbeat monitoring of service IP addresses/labels on takeover node(
• Now there is only HACMP/ES,based on IBM Reliable Scalable Cluster Technology
• Improved security, by using cluster communication daemon
• Improved performance for cluster customization and synchronization
• Fast disk takeover
• GPFS integration
• Cluster verification enhancements
New In AIX 5.2
• Custom only resource groups
• Cluster configuration auto correction
• Cluster file collections
• Automatic cluster verification
• Application startup monitoring and multiple application monitors
• Cluster lock manager dropped
• Resource Monitoring and Control (RMC) subsystem replaces Event Management

HACMP 5.3 Limits

• 32 nodes in a cluster
• 64 resource group in a cluster
• 256 IP addresses known to HACMP (Service and boot IP lables)
• RSCT limit: 48 heartbeat rings

Cluster Communication Daemon

The Cluster Communication Daemon, clcomdES, provides secure remote command execution and HACMP ODM configuration file updates by using the principle of the "least privilege".
The cluster communication daemon (clcomdES) has the following characteristics:
• Since cluster communication does not require the standard AIX \"r\" commands, the dependency on the /.rhosts file has been removed. Thus, even in \"standard\" security mode, the cluster security has been enhanced.
• Provides reliable caching mechanism for other node's ODM copies on the local node (the node from which the configuration changes and synchronization are performed).
• Limits the commands which can be executed as root on remote nodes (only the commands in /usr/es/sbin/cluster run as root).
• clcomdES is started from /etc/inittab and is managed by the system resource controller (SRC) subsystem.
• Provides its own heartbeat mechanism, and discovers active cluster nodes (even if cluster manager or RSCT is not running).
• Uses HACMP ODM classes and the /usr/es/sbin/cluster/rhosts file to determine legitimate partners.

Heartbeating

Starting with HACMP V5.1, heartbeating is exclusively based on RSCT topology services
The heartbeat via disk (diskhb) is a new feature introduced in HACMP V5.1, with a proposal to provide additional protection against cluster partitioning and simplified non-IP network configuration. This type of network can use any type of shared disk storage (Fibre Channel, SCSI, or SSA), as long as the disk used for exchanging KA messages is part of an AIX enhanced concurrent volume group. The disks used for heartbeat networks are not exclusively dedicated for this purpose; they can be used to store application shared data

Forced varyon of volume groups

HACMP V5.1 provides a new facility, the forced varyon of a volume group option on a node. You should use a forced varyon option only for volume groups that have mirrored logical volumes, and use caution when using this facility to avoid creating a partitioned cluster.
When using a forced varyon of volume groups option in a takeover situation, HACMP first tries a normal varyonvg. If this attempt fails due to lack of quorum, HACMP checks the integrity of the data to ensure that there is at least one available copy of all data in the volume group before trying to force the volume online. If there is, it runs varyonvg -f; if not, the volume group remains offline and the resource group results in an error state.

Custom Resource groups

Startup preferences
• Online On Home Node Only: At node startup, the RG will only be brought online on the highest priority node. This behavior is equivalent to cascading RG behavior.
• Online On First Available Node: At node startup, the RG will be brought online on the first node activated. This behavior is equivalent to that of a rotating RG or a cascading RG with inactive takeover. If a settling time is configured, it will affect RGs with this behavior.
• Online On All Available Nodes: The RG should be online on all nodes in the RG. This behavior is equivalent to concurrent RG behavior. This startup preference will override certain fall-over and fall-back preferences.
Fallover preferences
• Fallover To Next Priority Node In The List: The RG will fall over to the nextavailable node in the node list. This behavior is equivalent to that of cascading and rotating RGs.
• Fallover Using Dynamic Node Priority: The RG will fall over based on DNP calculations. The resource group must specify a DNP policy.
• Bring Offline (On Error Node Only): The RG will not fall over on error; it will simply be brought offline. This behavior is most appropriate for concurrent-like RGs.
The settling time specifies how long HACMP waits for a higher priority node (to join the cluster) to activate a custom resource group that is currently offline on that node. If you set the settling time, HACMP waits for the duration of the settling time interval to see if a higher priority node may join the cluster, rather than simply activating the resource group on the first possible node that reintegrates into the cluster.
Fallback preferences
• Fallback To Higher Priority Node: The RG will fall back to a higher priority node if one becomes available. This behavior is equivalent to cascading RG behavior. A fall-back timer will influence this behavior.
• Never Fallback: The resource group will stay where it is, even if a higher priority node comes online. This behavior is equivalent to rotating RG behavior.
A delayed fall-back timer lets a custom resource group fall back to its higher priority node at a specified time. This lets you plan for outages for maintenance associated with this resource group.
You can specify the following types of delayed fall-back timers for a custom resource group:
• Daily
• Weekly
• Monthly
• Yearly
• On a specific date

Application Monitoring

HACMP can also monitor applications in one of the following two ways:
• Application process monitoring: Detects the death of a process, using RSCT event management capability.
• Application custom monitoring: Monitors the health of an application based on a monitoring method (program or script) that you define.
When application monitoring is active, HACMP behaves
• For application process monitoring, a kernel hook informs manager that the monitored process has died, and HACMP application recovery process.
For the recovery action to take place, you must provide and restart the application (the application start/stop application server definition may be used). HACMP tries to restart the application and waits for the a specified number of times, before sending an notification actually moving the entire RG to a different node (next priority list).
• For custom application monitoring (custom method), cleanup and restart methods, you must also provide
used for performing periodic application tests.

Resource Group Tasks

To list the resource groups configured for a cluster
# cllsgrp
To list the details of of a resource group
# clshowres
To bring RG1 offline on Node3
# clRGmove -g RG1 -n node3 -d <--- -d for down) To bring CrucialRG online on Node3 # clRGmove -g CrucialRG -n node3 -u To check the current resource status # clfindres or # clRGinfo To find out the current cluster stat and obtain informatin about cluster # cldump Obtaining information via SNMP from Node: err3qci0... _____________________________________________________________________________ Cluster Name: erpqa1 Cluster State: UP Cluster Substate: STABLE _____________________________________________________________________________ Node Name: err3qci0 State: UP Network Name: corp_ether_01 State: UP Address: 10.0.5.2 Label: r3qcibt1cp State: UP Address: 10.0.6.2 Label: r3qcibt2cp State: UP Address: 10.253.1.75 Label: sapr3qci State: UP Network Name: prvt_ether_01 State: UP Address: 10.0.7.2 Label: r3qcibt1pt State: UP Address: 10.0.8.2 Label: r3qcibt2pt State: UP Address: 192.168.200.79 Label: psapr3qci State: UP Network Name: ser_rs232_01 State: Node Name: err3qdb0 State: UP Network Name: corp_ether_01 State: UP Address: 10.0.5.1 Label: r3qdbbt1cp State: UP Address: 10.0.6.1 Label: r3qdbbt2cp State: UP Address: 10.253.1.55 Label: sapr3qdb State: UP Network Name: prvt_ether_01 State: UP Address: 10.0.7.1 Label: r3qdbbt1pt State: UP Address: 10.0.8.1 Label: r3qdbbt2pt State: UP Address: 192.168.200.8 Label: psapr3qdb State: UP Network Name: ser_rs232_01 State: UP Address: Label: r3qdb_ser State: UP Cluster Name: erpqa1 Resource Group Name: SapCI_RG Startup Policy: Online On Home Node Only Fallover Policy: Fallover To Next Priority Node In The List Fallback Policy: Never Fallback Site Policy: ignore Priority Override Information: Primary Instance POL: Node Group State ---------------------------- --------------- err3qci0 ONLINE err3qdb0 OFFLINE Resource Group Name: OraDB_RG Startup Policy: Online On Home Node Only Fallover Policy: Fallover To Next Priority Node In The List Fallback Policy: Never Fallback Site Policy: ignore Priority Override Information: Primary Instance POL: Node Group State ---------------------------- --------------- err3qdb0 ONLINE err3qci0 OFFLINE Syncronizing the VG info in HACMP if cluster is already running: 01. In the system where the VG changes are made, break the reserve on disks using varyonvg command # varyonvg -b -u

02. Import the VG in the system where the VG info need to be updated. Use the -n and -F flag to not to vary on the VG

# importvg -V -y -n -F

03. Varyon the VG without the SCSI reserves
# varyonvg -b -u

04. Change the VG not to caryon automatically
# chvg -an -Qy


05. Varyoff the VG
# varyoffvg

06. Put the SCSI reserves back in the primary server
# varyonvg


Some useful HACMP Commands
To list all the app servers configured including start and stop script
# cllsserv
OraDB_APP /usr/local/bin/dbstart /usr/local/bin/dbstop
SapCI_APP /usr/local/bin/sapstart /usr/local/bin/sapstop
To list the application monitoring configured on a cluster
# cllsappmon
OraDB_Mon user
SapCI_Mon user
To get the detailed information about application monitoring
# cllsappmon
# cllsappmon -h OraDB_Mon
#name type MONITOR_METHOD MONITOR_INTERVAL INVOCATION HUNG_MONITOR_SIGNA
STABILIZATION_INTERVAL FAILURE_ACTION RESTART_COUNT RESTART_INTERVAL RESTART_METHOD
NOTIFY_METHOD CLEANUP_METHOD PROCESSES PROCESS_OWNER INSTANCE_COUNT RESOURCE_TO_MONITOR
OraDB_Mon user /usr/local/bin/dbmonitor 30 longrunning 9 180 fallover
1 600 /usr/local/bin/dbstart /usr/local/bin/dbstop
To clear a hacmp logs
# clclear


HACMP Upgrading options
01. Rolling Migration
02. Snapshot Migration
To apply the online worksheet
/usr/es/sbin/cluster/utilities/cl_opsconfig

HACMP Tips I - Files and Scripts

1. Where is the rhosts file located for HACMP ?

Location: /usr/es/sbin/cluster/etc/rhosts
Used By: clcomd daemon to validate the addresses of the incoming connections
Updated By:
It is updated automatically by clcomd daemon during the first connection.
But we should update it manually incase of configuring the cluster on an unsecured network.

2. What happened to ~/.rhosts file in the current version of HACMP ?

~/.rhosts is only needed during the migration from pre-5.1 versions of hacmp.
Once migration is completed, we should remove the file if no other applications need rsh.
From HACMP V5.1, inter-node communication for cluster services is handled by clcomd daemon.

3. What is the entry added to /etc/inittab for to IP Address Takeover ?

harc:2:wait:/usr/es/sbin/cluster/etc/harc.net # HACMP network startup

4. What is the entry added to /etc/inittab file due auto-start of HACMP ?
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init

5. What is the script used to start cluster services ?

/usr/es/sbin/cluster/etc/rc.cluster

6. rc.cluster calls a script internally to start the cluster services. What is that ?

/usr/es/sbin/cluster/utilities/clstart

7. What is the equivalent script for clstart in CSPOC ?

/usr/es/sbin/cluster/sbin/cl_clstart

8. What is the script used to stop cluster services ?

/usr/es/sbin/cluster/utitilies/clstop

9. What is the equivalent script for clstop in CSPOC ?

/usr/es/sbin/cluster/sbin/cl_clstop

10. What happens when clstrmgr daemon terminates abmornally ?

/usr/es/sbin/ckuster/utilities/clexit.rs script ahlts the system.
You can change the default behavior of the clexit.rc script by configuring
/usr/es/sbin/cluster/etc/hacmp.term

11. What script is invoked by clinfo daemon incase of a network or node event ?

/usr/es/sbin/cluster/etc/clinfo.rc





HACMP Tips II - Utility Commands

Below mentioned utility commands are available under /usr/es/sbin/cluster/utilities.
If you need, please add it to your PATH variable.

1. To list cluster and node topology information :

# cltopinfo (or) cllscf

2. To show the config for the nodes :

# cltopinfo -n

3. To show all networks configured in the cluster :

# cltopinfo -w

4. To show resources defined for all groups :

# clshowres

5. To show resources defined to selected the group :

# clshowres -g

6. To list all resource groups :

# cllsgrp

7. To list all file systems :

# cllsfs

8. To list the service IPs configured for a node :

# cllsip nodename

9. To show the whole cluster configuration :

# cldump

10. To show adapter information :

# cllsif

11. To show network information :

# cllsnw

12. To show the status of resource groups :

# clfindres

13. To list all resources :

# cllsres

14. To list all tape resources :

# cllstape

15. To list all nodes in a cluster :

# cllsnode

16. To list all application servers alongwith their start and stop scripts :

# cllsserv

17. To list all logical volumes in a cluster :

# cllslv

18. To list all IP networks in a cluster :

# cllsipnw

19. To list all alive network interfaces :

# cllsaliveif
CSPOC commands are located under /usr/es/sbin/cluster/sbin. If you need, please add this directory to your PATH.

1. To create a user in a cluster :

# cl_mkuser

2. To change/set passwd for a user in a cluster :

# cl_chpasswd

3. To change a user's attribute in a cluster :

# cl_chuser

4. To remove a user in a cluster :

# cl_rmuser

5. To list users in a cluster :

# cl_lsuser

6. To create a group in a cluster :

# cl_mkgroup

7. To change attributes of a group :

# cl_lsgroup

8. To remove a group in a cluster :

# cl_rmgroup

9. To create a shared VG in a cluster :

# cl_mkvg

10. To change the attributes of a shared VG :

# cl_chvg

11. To extend a VG (add a PV to a VG) :

# cl_extendvg

12. To reduce a VG (remove a PV from a VG) :

# cl_reducevg

13. To mirror a VG :

# cl_mirrorvg

14. To unmirror a VG :

# cl_unmirrorvg

15. To list VG's in a cluster :

# cl_lsvg

16. To sync a VG :

# cl_syncvg

17. To import a volume group :

# cl_importvg

18. To import a VG into a list of nodes :

# cl_updatevg

19. To activate/varyon a VG :

# cl_activate_vgs VG_name

20. To deactivate/varyoff a VG :

# cl_deactivate_vgs VG_name

21. To create a LV :

# cl_mklv

22. To change the attributes of a LV :

# cl_chlv

23. To list a LV :

# cl_lslv

24. To remove a LV :

# cl_rmlv

25. To make copies for a LV :

# cl_mklvcopy

26. To remove copies for a LV :

# cl_rmlvcopy

27. To extend a LV :

# cl_extendlv

28. To create a file system in a cluster :

# cl_crfs

29. To create a LV followed by a FS :

# cl_crlvfs

30. To change the attribute of a FS :

# cl_chfs

31. To lsit file systems :

# cl_lsfs

32. To remove a FS :

# cl_rmfs

33. To show JFS2 file systems with all attributes :

# cl_lsjfs2

34. To list JFS2 filesysems and their resource groups :

# cl_showfs2

35. To activate/mount a file system :

# cl_activate_fs /filesystem_mountpoint

36. To activate/mount a NFS file system :

# cl_activate_nfs retry NFS_Hostname /filesystem_mountpoint

37. To deactivate/unmount a file system :

# cl_deactivate_fs /filesystem_mountpoint

38. To deactivate/unmount a NFS file system :

# cl_deactivate_nfs /filesystem_mountpoint

39. To export(NFS) a file system :

# cl_export_fs hostname /filesystem_mountpoint

40. To list the process numbers using the NFS directory :

# cl_nfskill -u /nfs_mountpoint

41. To kill the processes using the NFS directory :

# cl_nfskill -k /nfs_mountpoint

Here are my Q&A

1. What are the different kind of failures HACMP will
answer(respond) ?
ANS:
a) Node Failure
b) Network Failure
c) Network Adapter Failure

For other failures like disk, application we have to configure
seperately using LVM, application monitoring scripts, etc..

Be clear that HACMP is just fault resilience and not fault tolerant
like mainframes. They cant go for mainframe becoz of its high cost.
Thats the reason people are going for ha clsuters.


2. List some of Cluster Topology objects?
ANS:
a) Node
b) Network (IP and Non-IP)
c) Network Adapter
d) Physical Volumes


3. List some of Cluster Resources ?
ANS:
a) Application Server
b) Volume Groups
c) Logical Volumes
d) File Systems
e) Service IP Label/Addresses
f) Tape resources
g) Communication Links


4. Do HACMP detect VG mirror failures ? If not how to make it(VG)
redundant or how to findout/sort out the mirror failures?
ANS: HACMP dont detect VG failures. This has to be implemented using
AIX LVM Soft Mirroring or else in SAN side.


5. List the steps required to configure a cluster ?
ANS:

a) Plan AIX, HACMP levels, Cluster configuration, network diagram,
etc..

b) Install AIX, fixes

c) Configure AIX
- Storage (Adapters, VG, LV, File Ssytems)
- Network (IP Interfaces, /etc/hosts, non-IP networks and devices)
- Application Start and stop scripts

d) Install HACMP file sets and fixes in all the cluster nodes. Then
reboot all the ndoes in the clusters

e) Configure HACMP Environment
- Topology (Cluster, node names, HACMP IP and non-ip networks)
- Resources (Application Server, Service Label, VG, File System, NFS)
- Resource Groups (Identify name, nodes, policies)

f) Synchronize and test the cluster

g) Tune the system and HACMP based on test result
- syncd frequency
- Basic VMM Tuning
- Failure detection rate
- I/O Pacing

h) Start HACMP Services



6. List out some of the HACMP log files ?
ANS:
a) /usr/es/adm/cluster.log - Messages from scripts and daemons (Date
Time Node Subsystem PID Message)
b) /tmp/hacmp.out - Output from configuration, start and stop event
scripts
c) /usr/es/sbin/cluster/history/cluster.mmddyy
d) /tmp/clrmgr.debug - Cluster manager activity
e) /tmp/clappmon..log - Application monitor logs
f) /var/ha/log/top*,/var/ha/log/grpsvcs* - RSCT Logs
g) /var/hacmp/clcomd/clcomd.log - communications daemon log
h) /var/hacmp/clverify - Previous successful and unsuccessful
verification attempts
i) /var/hacmp/log/cl_testtool.log - Cluster test tool logs


7. What are the 3 policies related to a resource group ?

ANS:
a) Start up - Online On Home Node Only, Online On First Available
Node, Online Using Node Distribution Policy, Online On All Available
Nodes.

b) Fallover - Fallover To Next Priority Node In The List, Fallover
Using Dynamic Node Priority, Bring Offline (On Error Node Only).

c) Fall back - Fallback To Higher Priority Node In The List, Never
Fallback


8. Expand the following :
ANS:
a) HACMP - High Availability Cluster Multi Proccessing
b) RG - Resource Group
c) C-SPOC - Central Single Point of Contact
d) SPOF - Single Point of Failure
e) ODM - Object Data Manager
f) SRC - System Resource Controller
g) RSCT - Reliable Scalable Cluster Technology


9. How to list the info on heartbeat data ?
ANS:
# lssrc -ls topsvcs


10. How to list out the info on cluster manager and DNP Information ?
ANS: # lssrc -ls clstrmgrES


11. What is the HA daemon that gets started by /etc/inittab ?
ANS: clcomd gets started by init process. It has an entry in /etc/
inittab


12. How will you start cluster services in a node? Give the command as
well as smitty fastpath.
ANS:
To Start Cluster Services:
Command: #/usr/es/sbin/cluster/etc/rc.cluster (Check the options
available for tis command)
Smitty Fast Path: clstart

To Stop Cluster Services:
Command: /usr/es/sbin/cluster/utilities/clstop
Smitty Fast Path: clstop


13. How many network adapters are required/recommended in a node
belonging to a cluster ?
ANS:
Minimum 2 network adapters are required per node. This is required to
manage network adapter failure event.


14. For a 2 node cluster (with 1 RG) with 2 N/W adapters for each
node, how many IP Label /Address are required. Give some example ?
ANS:
Lets consider a very much used cluster configuration.

Cluster cluster_DB with 2 nodes nodea and nodeb.
Nodea have 2 network adapters with nodea_boot ip label and nodea_stdby
ip label on en0 and en1 resepctively.
And Nodeb have 2 network adapters with nodeb_bootip and nodeb_stdbyip
on en0 and en1 resepctively.
This cluster have a VG, Service IP grouped in a resource group.
Minimum 1 service IP is required for a RG.

When we start RG in nodea, ndoea_bootip on en0 will bet replaced/
aliased by the service ip.


15. How can we achieve non-ip network (for hearbeat) ?
ANS:
Non-IP Network can be achieved thru any of the following ways
a) Serial/rs232 Connection (using /dev/ttyx devices) - Widely used in
old clusters
b) Disk based heart beat (over an ECM(VG) disk) - Widely used in
recent clusters. Becoz People want to eliminate those lengthy serial
cables.
c) Target Mode SCSI - Not widely used
d) Target Mode SSA - Not widely used


16. What are the different ways to set up achieve IP Address
Takeover ?
ANS:
a) IP Address Takeover via IP Alias
b) IP Address Takeover via IP Replacement


17. Is a non-ip network requried for a cluster? Say Yes/No. Also
justify your answer
ANS: Yes. To avoid split-brain problem.


18. How many service IP addresses we can have for a single resoruce
group ?
ANS: Not sure. Have to check in smitty screen.


19. What is the difference between communication interface and
communication device? Also list their usage.
ANS: Dont know how to explain. Below line should answer:
/dev/en0 is a communication interface whereas /dev/tty1 is a
communication device.
/edv/en0 is used for IP Network and /dev/tty1 is used for non-ip
network. I mean tty1 is used only for heardtbeat.


20. Persistent IP Label/Address is a floating IP Label. True/False.
Justify your answer
ANS: No. It resides on a single ndoe and dont move to another node.


21. Which of the following IP Label is stored in AIX ODM.

a) Service IP
b) Boot IP
c) Stand-by IP
d) Persistent IP
Ans: Only Boot IP and Stand-by IP are stored in AIX ODM.



22. If we use a SAN disk for a heartbeat, what type of VG it should
belong? Normal, Big, Scalable, Enhanced Concurrent Mode Vg ?
ANS: ECM (Enhanced Concurrent Mode) Volume Group


23. While stopping cluster services, what are the different type of
shutdown modes available? Do justify.
ANS:
a) graceful
b) graceful with takeover
c) forced

24. How will you view the cluster status ?
ANS: #/usr/es/sbin/cluster/clstat


25. How to list out the RG Status?
ANS: #/usr/es/sbin/cluster/sbin/utilities/clRGinfo


26. What are the ways to eliminate Single Points of Failure ?
ANS:
a) Node : Using multiple nodes
b) Power Source : Using multiple circuits or uninterruptible power
supplies
c) Network Adapters : Using redundant network adapters
d) Network : Using multiple networks to connect nodes
e) TCP/IP Subsystem : Using non-IP networks to connect adjoining nodes
and clients
f) Disk Adapter : Using redundant disk adapter or multipath hardware
g) Disk : Using multiple disks with mirroring or raid
h) Application : Add node for takeover; configure application monitor
i) Administrator : Add backup or very detailed operations
guide
j) Site : Add additional site

Dont assume that HACMP will eliminate all SPOF. We have to plan to
eliminate all kindaa SPOF including UPS and AC for the Data Center.


27. What is the max. # of nodes we can configure in a single cluster ?
ANS: Max. we can have 32 nodes in a cluster


28. What is the max. # of resoruce groups we can configure in a single
cluster ?
ANS: Max. we can have 64 resource groups in a cluster


29. What is the max. # of IP address can be known to a single
cluster ?
ANS: Max. 265 IP addresses/labels can be known to a cluster


30. Which of the following disk technologies are supported by HACMP ?
ANS:
a) SCSI
b) SSA
c) SAN


31. Which command list the cluster topology ?

ANS: /usr/es/sbin/cluster/utilities/cltopinfo
Its a widely used command by HACMP admin to view the cluster topology
configuration


32. Which command sync's the cluster ?
ANS: #cldare -rtV normal


33. What is the latest version of HACMP and what versions of AIX it
supports ?
ANS: HACMP 5.4 is the latest version of HACMP. This version supports
only from AIX 5.2


34. How to test the disk hearbeat in a cluster ?

ANS:
To test the disk heartbeat link on nodes A and B, where hdisk1 is the
heartbeat path:
On Node A, #dhb_read -p hdisk1 -r
On Node B, #dhb_read -p hdisk1 -t

If the link is active, you see this message on both nodes:
Link operating normally.


35. List the daemons running for HA cluster.

ANS:
clcomd - STarted during boot thru /etc/inittab
clstrmgrES - Started during clstart
clsmuxpdES - Started during clstart. This daemon is not available
from HACMP 5.3; SNMP server functions are included in clstrmgrES
itself.
clinfoES - Started during clstart

36. What is the command used to move RG online ?

ANS: cldare and clRGmove
37. Does HACMP work on different operating systems?
Yes. HACMP is tightly integrated with the AIX 5L operating system and System p servers allowing for a rich set of features which are not available with any other combination of operating system and hardware. HACMP V5 introduces support for the Linux operating system on POWER servers. HACMP for Linux supports a subset of the features available on AIX 5L, however this mutli-platform support provides a common availability infrastructure for your entire enterprise.

38. What applications work with HACMP?
All popular applications work with HACMP including DB2, Oracle, SAP, WebSphere, etc. HACMP provides Smart Assist agents to let you quickly and easily configure HACMP with specific applications. HACMP includes flexible configuration parameters that let you easily set it up for just about any application there is.

39. Does HACMP support dynamic LPAR, CUoD, On/Off CoD, or CBU?
HACMP supports Dynamic Logical Partitioning, Capacity Upgrade on Demand, On/Off Capacity on Demand and Capacity Backup Upgrade.

40. If a server has LPAR capability, can two or more LPARs be configured with unique instances of HACMP running on them without incurring additional license charges?
Yes. HACMP is a server product that has one charge unit: number of processors on which HACMP will be installed or run. Regardless of how many LPARs or instances of AIX 5L that run in the server, you are charged based on the number of active processors in the server that is running HACMP. Note that HACMP configurations containing multiple LPARs within a single server may represent a potential single point-of-failure. To avoid this, it is recommended that the backup for an LPAR be an LPAR on a different server or a standalone server.
41 .Does HACMP support non-IBM hardware or operating systems?
Yes. HACMP for AIX 5L supports the hardware and operating systems as specified in the manual where HACMP V5.4 includes support for Red Hat and SUSE Linux.
HACMP - Configuration - Contd
HACMP - Configuration - Contd

~Go back to config HA commn Interfaces., --> Add commn Int., --> Add discovered --> select devices --> add devices

~Again Extended topology --> Add persistent IPs --> select node1
select the n/w and persistent IP.

~Lets go back to extended config --> Extended resource config --> config HA service IP label --> add the service IP (here prod_svc,dev_svc)

~Now lets go back to extended resource config --> HA extended RG config --> Add RG --> give the RG name (here test_rg).

~Now apply the service IPs and VGs we defined in the RG basket.

~Lets go back and select change/show attributes for RG --> select RG (here test_rg) -->select the service IP and VG in their respective fields.

Now lets verify our config.
HACMP - Configuration - Verification
HACMP - Configuration - Verification

~Go to Extended topology --> extended verification and synchronization --> select verify --> correct errors should be yes --> press enter

~After verification is successful continue with the synch.

~Then go to C-SPOC (smitty cl_admin) --> manage HA services --> start cluster services --> start clinfo daemon should be true (or we can enter smitty clstart).

~To check whether cluster is running or not check with lssrc -g cluster when doing a failover always check /tmp/hacmp.out with
# tail -f /tmp/hacmp.out
HACMP - Disk HeartBeat
HACMP - Disk HeartBeat

~cd to /usr/sbin/rsct/bin (not needed if path is already added)

~First execute # ./dhb_read -p hdisk2 -r on one node where hdisk2 is your heartbeat disk.

~Then execute # ./dhb_read -p hdisk2 -t on the other node.

~If they are working normally we should get a message Link operating normally.
HACMP Installation--Pre-Installation Tasks
~irst do smitty tcpip and install the communicatoin devices en0 and en1 so that we can establish communication between the two nodes.

~Then update the /etc/hosts file with the non-service IPs,persistent IPs and service IPs.

~Install bos.adt,bos.compat and any cluster.* filesets in AIX CD1. Also install rsct and bos.clvm filesets from CD3.
High Availability and Hardware Availability for HACMP
High availability is sometimes confused with simple hardware availability. Fault tolerant,redundant systems (such as RAID) and dynamic switching technologies (such as DLPAR)provide recovery of certain hardware failures, but do not provide the full scope of error detection and recovery required to keep a complex application highly available.
A modern, complex application requires access to all of these components:
• Nodes (CPU, memory)
• Network interfaces (including external devices in the network topology)
• Disk or storage devices.

Recent surveys of the causes of downtime show that actual hardware failures account for only
a small percentage of unplanned outages. Other contributing factors include:
• Operator errors
• Environmental problems
• Application and operating system errors.

Reliable and recoverable hardware simply cannot protect against failures of all these different
aspects of the configuration. Keeping these varied elements—and therefore the application—highly available requires:

• Thorough and complete planning of the physical and logical procedures for access and operation of the resources on which the application depends. These procedures help to avoid failures in the first place.
• A monitoring and recovery package that automates the detection and recovery from errors.
• A well-controlled process for maintaining the hardware and software aspects of the cluster configuration while keeping the application available.
HACMP - Configuration
HACMP - Configuration

~smitty hacmp --> extended configuration --> extended topology-->config a HACMP cluster --> Add/change/show a HA cluster

~Here give the cluster name (here test_cl).

~Press F3 to go back

~Again Extended topology --> config a HA node --> Add a node to a HA cluster

~Here give a node name (here aix1) and a communication path (here aix1_nsvc1).

~Similarly add another node.
Add a n/w config HA n/w Again Extended topology
add ether,rs232 and then SCSI one by one.

Discover devices....hacmp attempts to discover other devices based on the info provided by us.~Now go back to extended config

~Again Extended topology --> config HA commn., Interfaces --> Add a commn., Interface --> Add discovered --> commn Int., -->select All --> select en0,en1

Complete an HACMP failover test for AIX

Complete an HACMP failover test for AIX

• 1 Introduction
• 2 Steps
o 2.1 On the production system
o 2.2 Configuration changes Made to environment
o 2.3 Example
o 2.4 Important
• 3 Conclusion
************************************************
Introduction
Once I ensured the system was configured correctly. Before I started testing the cluster failover, I performed a manual failover to test all the scripts that have been written work according to the configured system needs.

Steps
The following are the steps that you should take to do a manual failover test of your application.
On the production system
1. Configure the service address on the primary adapter.
2. Varyon all shared volume groups.
3. Mount all shared filesystems.
4. Execute your application start script.
5. Test the application.
6. Execute your application stop script.
7. Unmount all shared file systems.
8. Varyoff all the shared volume groups.
9. Configure the production boot address on the primary adapter.
If the test is successful on the production server, you should now move to the backup server and proceed with the following steps:
1. Configure the production service address on the standby adapter.
2. Varyon all shared volume groups.
3. Mount all shared file systems.
4. Execute your application start script.
5. Test the application.
6. Execute your application stop script.
7. Unmount all shared file systems.
8. Varyoff all the shared volume groups.
9. Reset the standby address on the standby adapter.
I completed the manual failover of your cluster. Doing a manual failover test ensured I had more control over the failover steps. Once I noticed errors in the configuration, I then corrected the failures, by bring down cluster nodes and performing a synchronized verification test. Before proceeding to the next steps: Therefore, the manual failover test helps you to trouble shoot any application. Problems before performing the automatic HACMP failover test.

Configuration changes Made to environment
1. Tune the system using I/O pacing.
1. I/O pacing is required for an HACMP cluster to behave correctly during large disk writes, and it is strongly recommended if you anticipate large blocks of disk writes on your HACMP cluster. These marks are, by default, set to zero (disabling I/O pacing) when AIX is installed. While the most efficient high- and low-water marks vary from system to system, an initial high-water mark of 33 and a low-water mark of 24 provide a good starting point.
These settings only slightly reduce write times and consistently generate correct failover behavior from HACMP for AIX. If a process tries to write to a file at the high-water mark, it must wait until enough I/O operations have finished to make the low-water mark.
1.
1. The way to correctly configure Highwater Mark and Low w water mark is as formula below. The default values for the disk-I/O pacing high-water and low-water marks (maxpout and minpout parameters) may cause severe performance problems on Caché production systems. These default values may significantly hinder Caché Write daemon performance by inappropriately putting the Write daemon to sleep causing prolonged Write daemon cycles.
If you are using HACMP clusters, I/O pacing is automatically enabled. If your system is not part of an HACMP cluster, set both the high- (maxpout) and low- (minpout) water marks to 0 (zero) to disable I/O pacing.
View and change the current settings for the I/O pacing high-water and low-water marks by issuing the smitty chgsys command.
Inter Systems currently recommends the following IBM calculation for determining the appropriate high-water mark:
• high-water mark = (4 * n) + 1
• Where n = the maximum number of spindles any one file (database, journal, or WIJ) spans across. Set the low-water mark to 50%-75% of the high-water mark.
Example
For example, a CACHE.DAT database file is stored on a storage array, and the LUN (or file system) where it resides consists of 16 spindles/drives. Calculate:
• High-water mark = (4 * 16) + 1 = 65
Low-water mark = between (.50 * 65) and (.75 * 65) = between 33 and 49
1. Increase the syncd frequency.
1. Edited the /sbin/rc.boot file to increase the syncd frequency from its default
value of 60 seconds to either 30, 20, or 10 seconds. Increasing the frequency forces more frequent I/O flushes and reduces the likelihood of triggering the dead man switch due to heavy I/O traffic.
1. Corrected Inittab entry on b node.
2. Increases Swapspace on both nodes to 2.5 gb.
1. Calculation used: Current before change = 512.
1. vmstat > avg /256 = a node:278212 /256 = 1086.75
= b node 331696 /256 = 1295.63
1.
1. Reason for increase due to (DMS) Dead Man Switch Errors:
2. If you see an increase in the following parameters, increase the values for better Caché performance: This will have to be monitored since TNG are collection stats on these boxes. I would appreciate a report as of 1 week before and 1 week after to observer system performance. Then a course of action can be assisted as to the below if needed. I recommend when increasing these parameters from the default values:
Important
Change both the current and the reboot values, and check the vmstat output regularly because I/O patterns may change over time (hours, days, or weeks).
1. Increase the current value by 50%.
2. Check the vmstat output.
3. Run vmstat twice, two minutes apart.
4. If the field is still increasing, increase again by the same amount; continue this step until the field stops increasing between vmstat reports.
• pending disk I/Os blocked with no pbuf
• paging space I/Os blocked with no psbuf
• filesystem I/Os blocked with no fsbuf
• client filesystem I/Os blocked with no fsbuf
• external pager filesystem I/Os blocked with no fsbuf

Health Checking Commands

Health Checking Commands

1.Userids that are defined as having Security & System Administrative Authority:
- Users with access to the 'root' user account
- Usuries in the 'system' or 'security' groups
Is there a process in place to verify that these users have a valid business need?
grep -i system /etc/group
grep -i security /etc/group
grep -i root /etc/group
2.Do only approved users have root authority?
lsuser -a sugroups ALL | grep "sugroups=suroot" | pg
3.Do only approved users have access to the system group?
lsuser -a groups ALL | grep system | pg
4.Are only approved users’ members of the security group?
lsuser -a groups ALL | grep security | pg
5. If Anonymous write is allowed, the -u option on /etc/ftpd must be used.
Is this true?
NA -Anonymous not used.
6. Does only one uid of 0 exist? lsuser -a id ALL | grep "id=0" | pg
7. TO check whether NFS is running in the server? ps –ef | grep nfsd , after this we need to check more /etc/exports
8. To check whether anonymous users are present in /etc/passwd?
grep – i anonymous /etc/passwd
9. To log in to the server with other id using ssh :- ssh - i id server name
10. POP daemons must be configured to require users to authenticate.Confirm POP daemons not used grep pop /etc/inetd.conf
11. NNTP must be configured to require authentication and identification of all users if any of the newsgroups on the server are classified confidential.Confirm NNTP not used.
grep nntp /etc/inetd.conf
12. Confirm TFTP disabled. grep tftp /etc/inetd.conf
13. NIS maps must not be used to store confidential data, including user passwords or other authentication credentials in any form. Confirm NIS maps not used.
lssrc -g yp & lslpp -l | grep -i nis
14. The following file may be world-writeable :socket (s) Is this the case?
find / -type s | more & then ls -ld file name
15. The following file may be world-writeable :named pipe (p) Is this the case?
find / -type p | more & then ls –ld filename
16. The following file may be world-writeable :block special file (b) Is this the case?
find / -type b | more & then ls –ld filename
17. The following file may be world-writeable :character special file (c) Is this the case?
find / -type c | more & then ls –ld filename
18. The following file may be world-writeable :symbolic links (l) Is this the case?
find / -type l | more & then ls –ld filename
19. To find if any users have non expiring passwords.
'lsuser -a maxage ALL|grep 0'
20. Confirm that the Business Use Notice is displayed to users during the identification and authentication process ?
cat /etc/motd'
21. The Common Desktop Environment must use dthello program to display the Business Use Notice. Is this the case?
ls –ld /usr/dt/bin/dthello
22. To get a list of users with UMASK not equal to x77
'lsuser -a umask ALL|grep -v "=77"
23. For OSR File type *.o , the setting for other must be r-x or more stringent. Is this the case?
Find / -name *.a –print
24. A password must be assigned to the ‘root’ userid is this the case?
grep root /etc/passwd
25.To check rlogin for root.
lsuser –a rlogin root
26. Does only one uid of 0 exist
lsuser -a id ALL | grep "id=0" | pg
27. Do only approved users have root authority
lsuser -a sugroups ALL | grep "sugroups=suroot" | pg
28. Do only approved users have access to the system group?
Command Tips lsuser -a groups ALL | grep system | pg
29. Are only approved users members of the security group
lsuser -a groups ALL | grep security | pg
30.To check anonymous FTP
ftp 0 21 ; name – anonymous
(or)
Grep anonymous /etc/passwd
31. For User default home directory /$HOME the permission setting is not defined.
Is this the case?
all IN and GB users home directory permission should be 700 . apart from normal users we should not change database users directories and the application users directories. we need to confirm with them. ls -ld /$HOME
32. .netrc files, file permissions must grant access only to the owner of the file.
Is this the case?
ls -lrt /.netrc
33. /.rhosts must have read and write access only by root. Is this the case?
ls –lrt /.rhosts

aix-cmd

AIX Command Crib Sheet
OS LEVEL : AIX
DATE : 20/6/2011
VERSION : aix 5.3
----------------------------------------------------------------------------
MISCELLENEOUS
----------------------------------------------------------------------------
http://www.rs6000.ibm.com/cgi-bin/ds_form Web based man pages
oslevel Returns operating system level
whence (program) Returns full path of program
whereis (program) Returms full path of program
what (program) Displays identifying info from the executable
like version number, when compiled.
lslpp -L all list all installed software
lslpp -L (program set name) Check if software installed
lslpp -f Lists filesets vs packages
lslpp -ha Lists installation history of filesets
instfix -ik (fix number eg IX66617) Checks id fix is installed
instfix -ik 4330-02_AIX_ML
compress -c file.txt > file.Z Create a compressed file.
uuencode (infile) (extract-file-name) > (output file)
Converts a binary file to an ASCII file for transfer by modem or email
uudecode (encoded file)
Extracts a binary file from encoded file and calls it the extract-file-name
examples :-
uuencode maymap maymap > maymap.enc
uuencode maymap.enc
od -c /tmp Displays contents of the /tmp directory file
ls -i Lists files with their inode numbers
echo * Lists files, can be used if ls is corrupt/missing
alog -o -t boot View the boot log
chtz (timezone eg GMT0BST) Changes the timezone in /etc/environment file
chlang (language eg En_GB) Changes the language in /etc/environment file
ar -v -t (archive file) List contents of an archive
ar -v -x (archive file) Extracts the archive
ar -v -t /usr/lib/libC-r.a Lists contents of the libC_r.a library
find /source -print | cpio -pdm /target
Copying directories using cpio, creates /target/source directory.
dump -nTv (binary executable) Displays the contents of an executable file
dump -c Displays string information
dump -o Displays object file headers
dump -l Displays line numbers
dump -s Displays the text section
snap -ao /dev/rmt0 Create a snapshot onto tape
snap -ad (directory) Create a snapshot into a named directory other
than the default (/tmp/ibmsupt)
/usr/dt/bin/dtconfig -d Disables desktop logins
/usr/dt/bin/dtconfig -e Enables desktop logins
/var/dt/Xpid PID of the dtlogin process
--------------------------------------------------------------------------------
TERMINALS
--------------------------------------------------------------------------------
tty Displays what the tty/pty number of the terminal is.
termdef reports the termtype setup in smit for the tty port
that termdef is run on.
chdev -l (device eg tty1) -a term=vt100 Sets tty to a vt100 terminal type
penable tty0 adds getty line into /etc/inittab for tty0 and starts getty
pdisable tty0 disables the getty line and disables getty
penable / pdisable -a option is for all
stty erase ^? Set backspace key for vt100 terminals
stty erase ^H Set backspace key for wyse50 terminals
lscons Displays the console device
chcons -a login=enable (device eg /dev/tty1) Changes the console device

Create ttys on ports 0 to 7 on adapter sa2 :-
for i in 0 1 2 3 4 5 6 7
do
mkdev -c tty1 -t tty -s rs232 -p sa2 -w$i -a login=enable -a term=vt100
done
portmir -t /dev/tty0 Mirror current terminal onto /dev/tty0
portmir -o Turns off port mirroring
--------------------------------------------------------------------------------
NETWORK
--------------------------------------------------------------------------------
host (ip or hostname) Resolves a hostname / ip address
hostname Displays hostname
hostname (hostname) Sets the hostname until next reboot
chdev -l (device name) -a hostname=(hostname) Changes hostname permanently
chdev -l inet0 -a hostname=thomas
ifconfig (device name) Displays network card settings
ifconfig (device name) up Turns on network card
ifconfig (device name) down Turns off network card
ifconfig (device name) detach Removes the network card from the
network interface list
ifconfig en0 inet 194.35.52.1 netmask 255.255.255.0 up
ifconfig lo0 alias 195.60.60.1 Create alias ip address for loopback
route (add/delete) (-net/-host) (destination) (gateway)
Adds or deletes routes to other networks or hosts, does not update
the ODM database and will be lost at reboot.
route add -net 194.60.89.0 194.60.90.4
lsattr -EHl inet0 Displays routes set in ODM and hostname
odmget -q "name=inet0" CuAt Displays routes set in ODM and hostname
refresh -s inetd Refresh inetd after changes to inetd.conf
kill -1 (inetd PID) Refresh inetd after changes to inted.conf
netstat -i Displays interface statistics
entstat -d (ethernet adapter eg en0) Displays ethernet statistics
arp -a Displays ip to mac address table from arp cache
no -a Displays network options use -o to set individual options or
-d to set individual options to default.
no -o option=value (this value is reset at reboot)
no -o "ipforwarding=1"
traceroute (name or ipaddress) Displays all the hops from source to
destination supplied.
ping -R (name or ipaddress) Same as traceroute except repeats.
--------------------------------------------------------------------------------
N.F.S.
--------------------------------------------------------------------------------
exportfs Lists all exported filesystems
exportfs -a Exports all fs's in /etc/exports file
exportfs -u (filesystem) Un-exports a filesystem
mknfs Configures and starts NFS services
rmnfs Stops and un-configures NFS services
mknfsexp -d /directory Creates an NFS export directory
mknfsmnt Creates an NFS mount directory
mount hostname:/filesystem /mount-point Mount an NFS filesystem
nfso -a Display NFS Options
nfso -o option=value Set an NFS Option
nfso -o nfs_use_reserved_port=1
--------------------------------------------------------------------------------
BACKUPS
--------------------------------------------------------------------------------
MKSYSB
------
mkszfile -f Creates /image.data file (4.x onwards)
mkszfile -X Creates /fs.size file (3.x)
mksysb (device eg /dev/rmt0)
CPIO ARCHIVE
------------
find (filesystem) -print | cpio -ocv > (filename or device)
eg find ./usr/ -print | cpio -ocv > /dev/rmt0
CPIO RESTORE
------------
cpio -ict < (filename or device) | more Lists archive cpio -icdv < (filename or device) cpio -icdv < (filename or device) ("files or directories to restore") eg cpio -icdv < /dev/rmt0 "tcpip/*" Restore directory and contents cpio -icdv < /dev/rmt0 "*resolve.conf" Restore a named file TAR ARCHIVE ----------- tar -cvf (filename or device) ("files or directories to archive") eg tar -cvf /dev/rmt0 "/usr/*" TAR RESTORE ----------- tar -tvf (filename or device) Lists archive tar -xvf (filename or device) Restore all tar -xvf (filename or device) ("files or directories to restore") use -p option for restoring with orginal permissions eg tar -xvf /dev/rmt0 "tcpip" Restore directory and contents tar -xvf /dev/rmt0 "tcpip/resolve.conf" Restore a named file AIX ARCHIVE ----------- find (filesystem) -print | backup -iqvf (filename or device) Backup by filename. eg find /usr/ -print | backup -iqvf /dev/rmt0 backup -(backup level 0 to 9) -f (filename or device) ("filesystem") Backup by inode. eg backup -0 -f /dev/rmt0 "/usr" -u option updates /etc/dumpdates file AIX RESTORE ----------- restore -qTvf (filename or device) Lists archive restore -qvxf (filename or device) Restores all restore -qvxf (filename or device) ("files or directories to restore") (use -d for restore directories) restore -qvxf /dev/rmt0.1 "./etc/passwd" Restore /etc/passwd file restore -s4 -qTvf /dev/rmt0.1 Lists contents of a mksysb tape BACKUPS ACROSS A NETWORK ------------------------ To run the backup on a local machine (cpio) and backup on the remote machine's (remhost) tape drive (/dev/rmt0) find /data -print | cpio -ocv | dd obs=32k | rsh remhost \ "dd ibs=32k obs=64k of=/dev/rmt0" To restore/read the backup (cpio) on the remote machine dd ibs=64k if=/dev/rmt0 | cpio -icvt To restore/read the backup (cpio) on the local machine from the remote machine's (remhost) tape drive (/dev/rmt0) rsh remhost "dd ibs=64k obs=32k if=/dev/rmt0" | dd ibs=32k \ | cpio -icvt To run the backup (cpio) on a remote machine (remhost) and backup to the local machines tape drive (/dev/rmt0) rsh remhost "find /data -print | cpio -icv | dd ibs=32k" \ | dd ibs=32k obs=64k of=/dev/rmt0 -------------------------------------------------------------------------------- Copying diskettes and tape -------------------------------------------------------------------------------- COPYING DISKETTES ----------------- dd if=/dev/fd0 of=(filename) bs=36b dd if=(filename) of=/dev/fd0 bs=36b conv=sync or flcopy COPYING TAPES ------------- dd if=/dev/rmt0 of=(filename) dd if=(filename) of=/dev/rmt0 or tcopy -------------------------------------------------------------------------------- VI Commands -------------------------------------------------------------------------------- :g/xxx/s//yyy/ global change where xxx is to be changed by yyy sed 's(ctrl v ctrl m)g//g' old.filename > new.filename
Strips out ^M characters from ascii files that have been transferred as binary.
To enter crontrol characters type ctrl v then ctrl ? where ? is whatever
ctrl character you need.
--------------------------------------------------------------------------------
DEVICES
--------------------------------------------------------------------------------
lscfg lists all installed devices
lscfg -v lists all installed devices in detail
lscfg -vl (device name) lists device details
bootinfo -b reports last device the system booted from
bootinfo -k reports keyswitch position
1=secure, 2=service, 3=normal
bootinfo -r reports amount of memory (/ by 1024)
bootinfo -s (disk device) reports size of disk drive
bootinfo -T reports type of machine ie rspc
lsattr -El sys0 -a realmem reports amount of useable memory
mknod (device) c (major no) (minor no) Creates a /dev/ device file.
mknod /dev/null1 c 2 3
lsdev -C lists all customised devices ie installed
lsdev -P lists all pre-defined devices ie supported
lsdev -(C or P) -c (class) -t (type) -s (subtype)
chdev -l (device) -a (attribute)=(new value) Change a device attribute
chdev -l sys0 -a maxuproc=80
lsattr -EH -l (device) -D Lists the defaults in the pre-defined db
lsattr -EH -l sys0 -a modelname
rmdev -l (device) Change device state from available to defined
rmdev -l (device) -d Delete the device
rmdev -l (device) -SR S stops device, R unconfigures child devices
lsresource -l (device) Displays bus resource attributes of a device.
Power Management (PCI machines)
-------------------------------
pmctrl -a Displays the Power Management state
rmdev -l pmc0 Unconfigure Power Management
mkdev -l pmc0 Configure Power Management
--------------------------------------------------------------------------------
TAPE DRIVES
--------------------------------------------------------------------------------
rmt0.x where x = A + B + C
A = density 0 = high 4 = low
B = retension 0 = no 2 = yes
C = rewind 0 = no 1 = yes
tctl -f (tape device) fsf (No) Skips forward (No) tape markers
tctl -f (tape device) bsf (No) Skips back (No) tape markers
tctl -f (tape device) rewind Rewind the tape
tctl -f (tape device) offline Eject the tape
tctl -f (tape device) status Show status of tape drive
chdev -l rmt0 -a block_size=512 changes block size to 512 bytes
(4mm = 1024, 8mm = variable but
1024 recommended)
bootinfo -e answer of 1 = machine can boot from a tape drive
answer of 0 = machine CANNOT boot from tape drive
diag -c -d (tape device) Hardware reset a tape drive.
tapechk (No of files) Checks Number of files on tape.
< /dev/rmt0 Rewinds the tape !!! -------------------------------------------------------------------------------- PRINTERS / PRINT QUEUES -------------------------------------------------------------------------------- splp (device) Displays/changes printer driver settings splp /dev/lp0 export $LPDEST="pqname" Set default printer queue for login session lsvirprt Lists/changes virtual printer attributes. rmvirprt -q queuename -d queuedevice Removes a virtual printer qpri -#(job No) -a(new priority) Change a queue job priority. qhld -#(job No) Put a hold on hold qhld -r #(job No) Release a held job qchk -A Status of jobs in queues lpstat lpstat -p(queue) Status of jobs in a named queue qcan -x (job No) Cancel a job from a queue cancel (job No) enq -U -P(queue) Enable a queue enable (queue) enq -D -P(queue) Disable a queue disable (queue) qmov -m(new queue) -#(job No) Move a job to another queue startsrc -s qdaemon Start qdaemon sub-system lssrc -s qdaemon List status of qdaemon sub-system stop -s qdaemon Stop qdaemon sub-system -------------------------------------------------------------------------------- FILE SYSTEMS -------------------------------------------------------------------------------- Physical Volumes (PV's) ----------------------- lspv Lists all physical volumes (hard disks) lspv (pv) Lists the physical volume details lspv -l (pv) Lists the logical volumes on the physical volume lspv -p (pv) Lists the physical partition usage for that PV chdev -l (pv) -a pv=yes Makes a new hdisk a pysical volume. chpv -v r (pv) Removes a disk from the system. chpv -v a (pv) Adds the removed disk back into the system. chpv -a y (pv) Changes pv allocatable state to YES chpv -a n (pv) Changes pv allocatable state to NO migratepv (old pv) (new pv) Moves all LV's from one PV to another PV, both PV's must be in the same volume group. Volume Groups (VG's) -------------------- lsvg Lists all volume groups lsvg (vg) Lists the volume group details lsvg -l (vg) Lists all logical volumes in the volume group lsvg -p (vg) Lists all physical volumes in the volume group lsvg -o Lists all varied on volume groups varyonvg (vg) Vary On a volume group varyonvg -f (vg) Forces the varyon process varyonvg -s (vg) Vary on a VG in maintenance mode. LV commands can be used on VG, but LV,s cannot be opened for I/O. varyoffvg (vg) Vary Off a volume group synclvodm (vg) Tries to resync VGDA, LV control blocks and ODM. mkvg -y(vg) -s(PP size) (pv) Create a volume group mkvg -y datavg -s 4 hdisk1 reducevg -d (vg) (pv) Removes a volume group reducevg (vg) (PVID) Removes the PVID disk reference from the VGDA when a disk has vanished without the reducevg (vg) (pv) command being run first. extendvg (vg) (new pv) Adds another PV into a VG. exportvg (vg) Exports the volume group eg deletes it! Note : Cannot export a VG if it has active paging space, turn off paging, reboot before exporting VG. Exporting removes entries from filesystems file but does not remove the mount points. chvg -a y (vg) Auto Vary On a volume group at system start. lqueryvg -Atp (pv) Details volume group info for the hard disk. importvg -y (vg name) (pv) Import a volume group from a disk. importvg (pv) Same as above but VG will be called vg00 etc. chvg -Q (y/n) (vg name) Turns on/off Quorum checking on a vg. Logical Volumes (LV's) ---------------------- lslv (lv) Lists the logical volume details lslv -l (lv) Lists the physical volume which the LV is on mklv (vg) (No of PP's) (pv Name optional) Create a logical volume mklv -y (lv) (PP's) (pv name optional) Creates a named logical volume chlv -n (new lv) (old lv) Rename a logical volume extendlv (lv) (extra No of PP's) Increase the size of an LV rmlv (lv) Remove a logical volume mklv/extendlv -a = PP alocation policy -am = middle -ac = center -ae = edge -aie = inner edge -aim = inner middle migratepv -l (lv) (old pv) (new pv) Move a logical volume between physical volumes. Both physical volumes must be in the same volume group ! mklv -y (lv) -t jfslog (vg) (No of PP's) (pv Name optional) Creates a JFSlog logical volume. logform (/dev/lv) Initialises an LV for use as an JFSlog getlvcb -AT (lv) Displays Logical Volume Control Block information File Systems (FS's) ------------------- lsfs Lists all filesystems lsfs -q (fs) Lists the file system details mount Lists all the mounted filesystems mount (fs or lv) Mounts a named filesystem mount -a Mounts all filesystems mount all mount -r -v cdrfs /dev/cd0 /cdrom mounts cd0 drive over /cdrom crfs -v jfs -d(lv) -m(mount point) -A yes Will create a file system on the whole of the logical volume, adds entry into /etc/filesystems and will create mount point directory if it does not exist. crfs -v jfs -g(vg) -m(mount point) -a size=(size of fs) -A yes Will create a logical volume on the volume group and create the file system on the logical volume. All at the size stated. Will add entry into /etc/filesystems and will create the mount point directory if it does not exist. chfs -A yes (fs) Change file system to Auto mount in /etc/filesystems chfs -a size=(new fs size)(fs) Change file system size rmfs (fs) Removes the file system and will also remove the LV if there are no onther file systems on it. defrag -q (fs) Reports the fragment status of the file system. defragfs -r (fs) Runs in report only defrag mode (no action). defragfs (fs) Defragments a file system. fsck (fs) Verify a file system, the file system must be unmounted! fsck (-y or -n) (fs) Pre-answer questions either yes or no ! fsck -p (fs) Will restore primary superblock from backup copy if the superblock is corrupt. Mirroring --------- mklv -y (lv) -c(copies 2 or 3) (vg) (No of PP's) (PV Name optional) Creates a mirrored named logical volume. mklvcopy -s n (lv) (copies 2 or 3) (pv) Creates a copy of a logical volume onto another physical volume. The physical volume MUST be in the same volume group as the orginal logical volume ! rmlvcopy (lv) (copies 1 or 2) Removes logical volume copies. rmlvcopy (lv) (copies 1 or 2) (pv) From this pv only! syncvg -p (pv) Synchronize logical partion copies syncvg -l (lv) syncvg -v (vg) mirrorvg (vg) (pv) Mirrors the all the logical volumes in a volume group onto a new physical volume. New physical volume must already be part of the volume group. -------------------------------------------------------------------------------- BOOT LOGICAL VOLUME (BLV) -------------------------------------------------------------------------------- bootlist -m (normal or service) -o displays bootlist bootlist -m (normal or service) (list of devices) change bootlist bootinfo -b Identifies the bootable disk bootinfo -t Specifies type of boot bosboot -a -d (/dev/pv) Creates a complete boot image on a physical volume. mkboot -c -d (/dev/pv) Zero's out the boot records on the physical volume. savebase -d (/dev/pv) Saves customised ODM info onto the boot device. -------------------------------------------------------------------------------- SYSTEM DUMP -------------------------------------------------------------------------------- sysdumpdev -l Lists current dump destination. sysdumpdev -e Estimates dumpsize of the current system in bytes. sysdumpdev -L Displays information about the previous dump. sysdumpstart -p Starts a dump and writes to the primary dump device. sysdumpstart -s Starts a dump and writes to the secondary dump device. (MCA machine can also dump if key is in service position and the reset button is pressed) sysdumpdev -p (dump device) -P Sets the default dump device, permanently Analyse dump file :- echo "stat\n status\n t -m" | crash /var/adm/ras/vmcore.0 -------------------------------------------------------------------------------- PAGING SPACE (PS's) -------------------------------------------------------------------------------- lsps -a Lists out all paging space lsps -s Displays total paging and total useage lsps (ps) mkps -s(No of 4M blocks) -n -a (vg) mkps -s(No of 4M blocks) -n -a (vg) (pv) -n = don't activate/swapon now -a = activate/swapon at reboot chps -a n (ps) Turns off paging space. chps -s(No of 4M blocks) (ps) Increases paging space. chlv -n (new name) (old name) Change paging space name rmps (ps) Remove paging space. PS must have been turned off and then the system rebooted before it can be removed. Note : Need to change the swapon entry in /sbin/rc.boot script if you are changing the default paging space from /dev/hd6. You also need to do a "bosboot -a -d /dev/hdiskx" before the reboot. /etc/swapspaces File that lists all paging space devices that are activated/swapon during reboot. -------------------------------------------------------------------------------- SCHEDULING -------------------------------------------------------------------------------- crontab -l List out crontab entrys crontab -e Edit crontab entrys crontab -l > (filename) Output crontab entrys to a file
crontab (filename) Enter a crontab from a file
crontab -r Removes all crontab entrys
crontab -v Displays crontab submission time.
/var/adm/cron/cron.allow File containing users allowed crontab use.
/var/adm/cron/cron.deny File containing users denied crontab use.
/var/adm/cron/crontab Directory containing users crontab entries.
at (now + 2 minutes, 13:05, etc) {return} Schedule a job using at
Command or schell script {return}
{CTRL D}
at -l
atq Lists out jobs scheduled to run via at command
at -r (at job No)
atrm (at job No) Removes an at job scheduled to run.
/var/adm/cron/at.allow File containing users allowed at use.
/var/adm/cron/at.deny File containing users denied at use.
/var/adm/cron/atjobs Directory containing users at entries.
--------------------------------------------------------------------------------
SECURITY
--------------------------------------------------------------------------------
groups Lists out the groups that the user is a member of
setgroups Shows user and process groups
chmod abcd (filename) Changes files/directory permissions
Where a is (4 SUID) + (2 SGID) + (1 SVTX)
b is (4 read) + (2 write) + (1 execute) permissions for owner
c is (4 read) + (2 write) + (1 execute) permissions for group
d is (4 read) + (2 write) + (1 execute) permissions for others

-rwxrwxrwx -rwxrwxrwx -rwxrwxrwx
||| ||| |||
- - -
| | |
Owner Group Others
-rwSrwxrwx = SUID -rwxrwSrwx = SGID drwxrwxrwt = SVTX

chown (new owner) (filename) Changes file/directory owners
chgrp (new group) (filename) Changes file/directory groups
chown (new owner).(new group) (filename) Do both !!!
umask Displays umask settings
umask abc Changes users umask settings
where ( 7 - a = new file read permissions)
( 7 - b = new file write permissions)
( 7 - c = new file execute permissions)
eg umask 022 = new file permissions of 755 = read write and execute for owner
read ----- and execute for group
read ----- and execute for other
mrgpwd > file.txt Creates a standard password file in file.txt
passwd Change current user password
pwdadm (username) Change a users password
pwdck -t ALL Verifies the correctness of local authentication
lsgroup ALL Lists all groups on the system
mkgroup (new group) Creates a group
chgroup (attribute) (group) Change a group attribute
rmgroup (group) Removes a group
--------------------------------------------------------------------------------
USERS
--------------------------------------------------------------------------------
passwd -f Change current users gecos (user description)
passwd -s Change current users shell
chfn (username) Changes users gecos
chsh (username) (shell) Changes users shell
env Displays values of environment variables
printenv
id Displays current user's uid and gid details
id (user) Displays user uid and gid details
whoami Displays current user details
who am i (or who -m)
who Displays details of all users currently logged in.
w
who -b Displays system reboot time
uptime Displays number of users logged in, time since last
reboot, and the machine load averages.
lslicense Displays number of current user licensese
chlicense -u (number) Changes the number of user licenses
lsuser ALL Lists all users details
lsuser (username) Lists details for user
lsuser -a(attribute) (username or ALL) Lists user attributes
lsuser -a home ALL
mkuser -a(attributes) (newuser) Add a new user
chuser (attributes) (user) Change a user
chuser login=false (user) Lock a user account
rmuser -p (user) Removes a user and all entries in security files
usrck -t ALL Checks all the user entires are okay.
fuser -u (logical volume) Displays processes using the files in that LV
lsattr -D -l sys0 -a maxuproc Displays max number of processes per user
chdev -l sys0 -a maxuproc=(number) Changes max number of processes per user
--------------------------------------------------------------------------------
REMOTE USERS
--------------------------------------------------------------------------------
ruser -a -f (user) Adds entry into /etc/ftpusers file
ruser -a -p (host) Adds entry into /etc/host.lpd file
ruser -a -r (host) Adds entry into /etc/hosts.equiv file
ruser -d -f (user) Deletes entry in /etc/ftpusers file
ruser -d -p (host) Deletes entry in /etc/host.lpd file
ruser -d -r (host) Deletes entry in /etc/hosts.equiv file
ruser -s -F Shows all entries in /etc/ftpusers file
ruser -s -P Shows all entries in /etc/host.lpd file
ruser -s -R Shows all entries in /etc/hosts.equiv file
ruser -X -F Deletes all entries in /etc/ftpusers file
ruser -X -P Deletes all entries in /etc/host.lpd file
ruser -X -R Deletes all entries in /etc/hosts.equiv file
--------------------------------------------------------------------------------
INITTAB
--------------------------------------------------------------------------------
telinit S Switches to single user mode.
telinit 2 Switches to multi user mode.
telinit q Re-examines /etc/inittab
lsitab -a Lists all entries in inittab
lsitab (ident eg tty1) Lists the tty1 entry in inittab
mkitab ("details") Creates a new inittab entry
chitab ("details") Ammends an existing inittab entry
rmitab (ident eg tty1) Removes an inittab entry.
chitab "tty1:2:respawn:/usr/bin/getty /dev/tty1"
--------------------------------------------------------------------------------
ODM
--------------------------------------------------------------------------------
odmget -q "name=lp1" CuDv |more Gets lp1 info from pre-defined database.
odmget -q "name-lp1" CuAt |more Gets lp1 info from customised database.
odmdelete -o CuAt -q "name=lp1" Deletes lp1 info from customised db.
odmget -q "name=lp1" CuAt > lp1.CuAt Export ODM info to text file.
odmadd < lp1.CuAt Import ODM info from text file.
--------------------------------------------------------------------------------
ERROR LOGGING
--------------------------------------------------------------------------------
/usr/lib/errdemon -l Displays errorlog attributes.
/usr/lib/errdemon Starts error logging.
/usr/lib/errstop Stops error logging.
errpt Displays summary errorlog report.
errpt -a Displays detailed errorlog report.
errpt -j (identifier) Displays singe errorlog report.
Note : errorlog classes are H=Hardware S=Software O=Information V=Undetermined
errclear (days) Deletes all error classes in the errorlog.
errclear -d (class) (days) Deletes all error class entries in errlog.
Note : The errclear command will delete all entries older than the numbers of
days specified in the days paramenter. To delete ALL entries used 0.
errlogger "message up to 230 chrs"
Enters an operator notifaction message into the errorlog.
--------------------------------------------------------------------------------
PERFORMANCE MONITORING
--------------------------------------------------------------------------------
vmstat (drive) (interval) (count) Reports virtual memory statistics.
vmstat hdisk0 5 20
vmstat -s Diplays number of paging events since system start.
vmstat -f Diplays number of forks since system start.
vmstat -i Diplays number of interupts by device since system start.
iostat (drive) (interval) (count) Reports i/o and cpu statistics.
iostat hdisk0 5 20
iostat -d (drive) (interval) (count) Limits report to drive statistics.
iostat -t (interval) (count) Limits report to tty statistics.
sar -u -P ALL 10 10 Displays %usr %sys %wio %idle for all processors
--------------------------------------------------------------------------------
DOS DISKETTES
--------------------------------------------------------------------------------
dosdir Reads directory listing of a diskette
dosdir (directory) Reads directory listing of a named directory
dosread -D/dev/fd0 C41.TXT c41.txt Gets C41.TXT from diskette drive fd0
dosread -D/dev/fd0 DIRECTORY/C41.TXT c41.txt
(-D option can be dropped if using fd0)
doswrite -D/dev/fd0 (unixfile) (dosfile) Writes a file to diskette
dosdel (dosfile) Deletes a dos file on diskette
dosformat Formats the diskette
--------------------------------------------------------------------------------
SENDMAIL
--------------------------------------------------------------------------------
sendmail -bi Creates new aliase db from /etc/aliase file.
newaliases
sendmail -bp Displays the contents of the mail queue
mailq
sendmail -q Processe the sendmail queue NOW
sendmail -bt -d0.4 < /dev/null
Prints out sendmail version, compile defines and system information
refresh -s sendmail Restart sendmail
kill -l (sendmail PID)
--------------------------------------------------------------------------------
SP / PSSP
--------------------------------------------------------------------------------
dsh (command) Runs the command on all the nodes
Efence Diplays which node are currently fenced
Efence (node number) Fences the node
Eunfence (node number) Unfences the node
Estart Starts the switch
spmon -q Starts SP monitor in gui
spmon -d -G Diag info, lists LED and switch info for all nodes
spmon -L frame1/node3 Displays LED for node 3 in frame 1
spmon -p off frame1/node3 Powers off the node
spmon -p on frame1/node3 Powers on the node
spled Diplays all the nodes LED's in a updating gui
s1term -w (frame number) (node number) Opens serial terminal (read and write)
s1term (frame number) (node number) Opens serial terminal (read only)
Example :-
s1term 1 1 Opens a serial terminal to console port on frame 1 node 1
which is read only. When rebooting a node use read only.
splstdata -e Lists site environment database information
-d Displays df command from each node
-n Lists node configuration
-h Diplays lscfg command from each node
-s Lists switch node information
-b Lists boot/installation information
-a Lists LAN database information
-i Displays netstat -in command from each node