Tanti Technology

My photo
Bangalore, karnataka, India
Multi-platform UNIX systems consultant and administrator in mutualized and virtualized environments I have 4.5+ years experience in AIX system Administration field. This site will be helpful for system administrator in their day to day activities.Your comments on posts are welcome.This blog is all about IBM AIX Unix flavour. This blog will be used by System admins who will be using AIX in their work life. It can also be used for those newbies who want to get certifications in AIX Administration. This blog will be updated frequently to help the system admins and other new learners. DISCLAIMER: Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility. If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.

Monday, 21 October 2013

hacmp basic concept

HACMP

HACMP stands for High Availability Cluster Multiprocessing

Hacmp provides two types of environment

1. serial (high availability): this is used  to make the application highlt available
by using the shared disk and duplicate resources .  also the access to data  will be serially
means one node at a time.
2. parallel (cluster multiprocessing ) : this type of environment is used when the application is
 online on many nodes  and allthe nodes  can  concurrently access the data. for this type of environment
failover is not required.




why to use hacmp?

1. eliminating single point of failure
2. elminating planned and unplanned downtime.


hacmp ensures that the application should be highly available and accessible
 even if hardware ,software or any system management failure is there.


what is single point of failure?


1. node
2. power source
3. network adapter
4. network
5. disk adapter


WHAT TYPE OF FAILURE HACMP MONITORS

1. NODE and OS failure by using the redundant nodes
2. NIC failure by using redundant NIC cards
3. protection against network failure by using redundant network.

 SHARED EXTERNAL DISK DRIVES:

A  shared external disk drive is a disk connected to multiple nodes.

HACMP supports the following type of access to shared disk.

Non-concurrent access:  only one node can access  the data at one time. it uses serial environment.
                                      if the node that is using the disk fails then failover happens
                                    and other node  becomes the active node and accesses the data.

concurrent access:  the shared disks are actively connected to more than one node and data
                              can be accesses simultaneously . in this case parellel environment is used
                              and there is no need of failover.




A  hacmp  cluster comprises of 

1. topology (nodes, NIC, network etc)
2. resources (things need to make highly available)



HACMP Topology components:

1. nodes
2. networks
3. communication devices
4. communication interfaces


what  the term node refer  in HACMP?

A node is a standalone server having hacmp software and member of the cluster.


In a HACMP cluster 32 nodes can be configured.

Network

In HACMP two types of network is defined.

1 . IP-Network
2.  Non-IP network

IP Network uses the TCP/IP protocol suite for communication between the nodes.

NON-IP network  is  used for monitoring the status of the cluster and is strongly recommended.
                                           This is because if there is network failure ,and only IP-network is used ,  it is diffcult for the active node to figure out whether it is  node or network failure  and this can lead to confusion and leads to  cluster partitioning or node isolation. .
                                            Non-ip network is used for differentiating between node and network failure.
 

Communication Interface:

In HACMP environment ,we are calling each  Network interface which is having a ip_address and an ip-label associated with it.  on the nodes which uses  IP network  for communication . we call

 en0 (network interface)
        ip-address (192.168.10.4)
       ip-label (abhi_boot1)


Communication devices:

In HACMP environment  , the devices used to configure the NON-IP  Network is called communication devices.
              it can be RS232( /dev/tty)  device , disk (/dev/hdisk#) etc

     communication devices provide point to point serial  connection normally used for heartbeating .


What does resources means in HACMP? 

 As per my understandings,  HACMP  is completely designed  for making the application  highly  availabilite.

 So the things required to make the application run on the node is considered as HACMP resources.

take and example of  application , what are the things required  for making it highly available:

1. how to control(start/stop) the application  : ( when some hardware error accurs ,resources need to move    from one node to another , then hacmp must be aware how to start and stop the application that's why we are providing start/stop script   )

2. how to access the application(to access you need ip  and can move from one node to another )
3. common storage is required(if one node is down then other can access)
4. sometime nfs is required 
 

Resouces can be:

1. application server
2. service- ip
3. volume group
4. filesystem
5. NFS mounts:
6. NFS  exports

Application server : The application that has to be made highly available along with it's start/stop script is termed as application server in hacmp environment.

Service-ip/label:  every application should have an IP address assigned to access it and that ip should be highly available

volume group: if the application requires shared storage, this storage should be coming from this VG .

filesystem : filesystems required for making the application highly available

NFS mount :  NFS filesystems required for making the application highly available

NFS exports: NFS filesystem that need to be exported for running the application




What is Resource Group ?

Resource group is a collection all the resouces that are needed to make and application highly available.
                                                  
while defining resouce group ,we need to specify  polices(startup/failover/fallover)  that controls the resource group  action when  it detects  any failure.


As earlier , i have written that resource group is controlled by the policies.
the RG policies are :

1   Startup policy                     (used when cluster starts up.)
 2. Fallover policy                 (if node failure is there, it determines which node to takeover)
 3. Fallback policies               (when the higher priority node comes up, then this policy decides whether to
                                             move the RG to higher priority node or node)

startup policy:

 online on home node only :      The resource group is brought online only on its home (highest priority  
                                                   node defined ).if it is unavailable RG  will not start automatically.
available on first available node : The resource group comes online on the first participating node that  
                                                  becomes available.it's just like rotating RG.
using distribution policy :              here only one RG is active on one node. if one node is having an active RG
                                                  ,it will goto second if it is not having any active RG.
on all available nodes:                  This policy is used when you need that all nodes  can access the same RG
                                                   concurrently.


Fallover/failover policy:

Fallover to next priority node in the list :     at the time of failure,RG is moved to the next high priority node in
                                                                  the RG definition
Fallover using dynamic node priority : if using this option,  then the fallover will  happen according to the
                                                         selected criteria from the below methods
                                                                            1. highest_mem_free
                                                                            2. highext_idle_cpu
                                                                            3. lowest_disk_busy
Bring offline(on error node) : this is used with concurrent RG.since  RG is active on all nodes concurrently,
                                             if one node goes down, then RG on this node should become online for the maintenance of that node..

Fallback policy


fallback to higher priority node in the list:        when the highest priority node comes up, the RG is
                                                                     automatically moved to it.
Never fallback :                                             it will not fallback to higher priority node automatically, you
                                                                     need to move the RG manually.


No comments:

Post a Comment