Tanti Technology

My photo
Bangalore, karnataka, India
Multi-platform UNIX systems consultant and administrator in mutualized and virtualized environments I have 4.5+ years experience in AIX system Administration field. This site will be helpful for system administrator in their day to day activities.Your comments on posts are welcome.This blog is all about IBM AIX Unix flavour. This blog will be used by System admins who will be using AIX in their work life. It can also be used for those newbies who want to get certifications in AIX Administration. This blog will be updated frequently to help the system admins and other new learners. DISCLAIMER: Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility. If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.

Monday 14 April 2014

Perform NIM operation without booting nim client machine to SMS mode:


Perform NIM operation without booting machine to SMS mode:

If you have any confusion or problem in finding out the ethernet adapter to select during the migration or any operation where you need to select the adapter for performing ping test or to start the installation through network, you can use the below command.




1. Find out the gateway of the servers where you need to perform the operation using below command:

netstat -nr
Destination        Gateway                                              
default             x.x.x.x --> gateway of client  

2.  In "ifconfig -a" o/p, select the interface/adapter(entX) which has an IP, which is reachable from your laptop.(which you use to do network boot)

3. Set the bootlist in such a way, so that after the server reboot the it boots through network.

bootlist -m normal entX  gateway=(GW of Client)  bserver=(NIM IP) client=(CLIENT IP)

Advantages:

1. You may not end up with shutting down the wrong server from HMC GUI.
2. You can overcome the ping test, where mostly people give wrong ip details.
3. Also helps if one has less experience about the SMS menu

B700F120 LED server hung



AIX server migrated from AIX5.3 to AIX6.1 on 9117-mmb frame and the server hangs at the LED B700F120


B700F120

B700F120

Explanation

Platform firmware detected an error

 

Response

The platform is unable to support the requested architecture options requested by the operating system via ibm,client-architecture-support interface.

 

Solution

6100-01 is not supported on a 9117-mmb. So if installing the AIX operating

 

system on a 9117-mmb frame, please make sure you go with one of these options.

AIX 6.1 with the 6100-04 Technology Level and Service Pack 3, or later.
AIX 6.1 with the 6100-03 Technology Level and Service Pack 5, or later.
AIX 6.1 with the 6100-02 Technology Level and Service Pack 8, or later.

 If you have used lpp and spot of 6100-01 level, please update the lpp and spot to the above mentioned TL levels and perform the migration.

                      By doing so, you can bypass this LED hung of server during the reboot of the server after the OS migration

How to list files in alphabetical order in a directory



We know how to list the files by sorting with time, and using long list options, but anyone knows how to list based on the alphabetical order.

Here is the simple command to do that :



cd to directory and then

#  ls -ltr | awk ' { print $9 }' | sort -d

This will list files based on alphabet order.

find oslevel of the clone rootvg



We usually know how to find the oslevel of a server, but if we want to find the oslevel of the cloned rootvg that exists in the disk, then its very simple.

Using the below blvset command, you can find the oslevel in the clone vg/disk.





/usr/lpp/bosinst/blvset -d /dev/hdisk0  -g level

      Where hdisk0 is part of the cloned rootvg as below.

# lspv | grep rootvg
 hdisk0          00z301dce111df74                    altinst_rootvg
 hdisk1          00c503d45e50dr04                    rootvg          active
  • If we need few more details on the cloned disk, then below command will be useful. If we need TL level just need to check the date when clone performed and we know when the TL or SP upgrade happened on the server. so based on that we can decide the TL level.
 /usr/lpp/bosinst/blvset -d /dev/hdisk0  -g menu
locale: C C C C C C
console:
blvname: hd5
targ_dev: /dev/hdisk0
root_fsattr: hd4:jfs2:hd8
timestamp: Fri Jun 28 12:57:06 2013
padstring: 7.1 pad string:@%$#~!~~!~#$%@

dlpar not working between from hmc



Usually, if DLPAR fails due to below reason, its always problem with the rmc connection, which depends on RSCT daemons.

Error :

"A RMC network connection to the partition is not present. Verify the network settings on the partition and the HMC. If you select OK you will have to restart the partition for the resource changes to take effect."




Solution :

The very easy best thing, that we can do for making DLPAR to work is restart RMC along with the rsct group daemons, with the help of below three commands and wait for some 5 min, before you again test the dlpar capability.

# /usr/sbin/rsct/bin/rmcctrl -z
# /usr/sbin/rsct/bin/rmcctrl -A
# /usr/sbin/rsct/bin/rmcctrl -p


# lssrc -a | grep rsct
 
  - IBM.DRM should be in active state

Go back to the HMC restricted shell command prompt

# lspartition -dlpar
the partition shows correct hostname & IP
Active<1> and Decaps value 0x3f
The above values mean that the partition is capable of a dlpar operation.

If above things don`t work even you wait for 5 to 10 min, please execute the above commands followed by therecfgct command. Below is the command.

 
/usr/sbin/rsct/install/bin/recfgct

Wait for sometime and give a try. It should work for sure if the problem is with the RMC, rsct related otherwise the problem may be because of firewall between your hmc and the server.

How to delete a file using inode


If the part of file name has some invalid charecters then we may face difficulty in deleting that file, so the best option to delete the file would be using inode number.

Below is the command for the same:

 # find /directory -inum [inode-number] -exec rm -i {} \;

Script to perform FTP in background



Usually to transfer files from Fix Central to the servers, while downloading to your server it takes lot of time, so during this period we should make sure your network not to disconnect.

To avoid this problem, we can use a script to run this file transfer in background so that, there will be no need for us to be attentive towards the ftp transfer.


Here is the scripttttt




#!/bin/sh
USERNAME="anonymous"
PASSWORD="anonymous"
SERVER="FTP SITE SERVER NAME"
 
local directory to pickup *.tar.gz file
FILE="/ecc/hsb/H60987058"
 

# login to remote server
ftp -n -i $SERVER <user $USERNAME $PASSWORD
cd $FILE
mget *.*
quit
EOF


- Just put these entries in a file and can be executed as below in background.

Save the content to the file  --> auto-ftp.ksh
Command :   nohup auto-ftp.ksh > /tmp/ftp.out &

- Note: The command has to be executed in a directory where you need to place the download the files!!!

how to increase the queue depth on the vio client


To change the queue_depth on the hdisk device:

To change the queue depth values, the hdisk should be free from I/O operataions. So



1. stop I/O on the device hence unmount all the filesystems.
- varyoffvg group name
> or  rmdev -l hdisk


2. To change the queue_depth

- chdev -l hdisk -a queue_depth=

- varyonvg or cfgmgr -l hdisk

3.  You can change the ODM using the -P option and rebooting.

- chdev -l hdiskX -a queue_depth=20 -P

     shutdown -Fr

0516-404 allocp: This system cannot fulfill the allocation



Issue:  volume group mirroring is giving error.

Error :

0516-404 allocp: This system cannot fulfill the allocation  request.    There are not enough free partitionsor not enough physical  volumes to keep strictness and satisfy allocation requests.  The command should be retried with different allocation characteristics.  
0516-1517 mklvcopy: Failed to create a valid partition allocation. 
0516-842 mklvcopy: Unable to make logical partition copies for logical volume.
0516-1199 mirrorvg: Failed to create logical partition copies for logical  volume volume group.
0516-1200 mirrorvg: Failed to mirror the volume group.



In such cases, UPPER BOUND of lv`s has to be considered.

- Find out using lslv lvname, the UPPER BOUND value should be atleast equal to the number of disks, if not it gives the above error.

#  lslv testlv
LOGICAL VOLUME:    testlv                 VOLUME GROUP:   testvg
LV IDENTIFIER:      00c502df00004c00000001233479719d.6    PERMISSION:     read/write
VG STATE:           active/complete          LV STATE:       opened/syncd
TYPE:               jfs2                      WRITE VERIFY:   off
MAX LPs:            512                    PP SIZE:        128 megabyte(s)
COPIES:             1                         SCHED POLICY:   parallel
LPs:                296                         PPs:            296
STALE PPs:          0                       BB POLICY:      relocatable
INTER-POLICY:       minimum                RELOCATABLE:    yes
INTRA-POLICY:       middle                 UPPER BOUND:   128 --> Min = Equal to no.of disks.

                                                                                   --> Max = Depends on VG Type
MOUNT POINT:        /testfs             LABEL:          /testfs
DEVICE UID:         0                      DEVICE GID:     0
DEVICE PERMISSIONS: 432
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?:     NO
INFINITE RETRY:     no




Also when we add disks to the VG this upper bound value should be checked once, for all the lv`s in the volume group and make sure it is atleast equal to the number of disks in the volume group and max value depends on the type of VG. i.e it will be the max disks that can be part of a vg.


To change the UPPER BOUND value for an LV :

chlv -u (value to be changed)  lvname

In simple words, only if the lv is allowed to have upper bound atleast equal to number of disks, then all the lv basedoperating system commands will work with out any issues, be it lvcopy, vg mirror etc..,

Understanding the test(t) factor.



 Test factor 

For an existing volume group in AIX server, if we are adding any disk to it
( Typically extending the vg), on the new disk that we add everytime in a Volume Group there will be

(size of disk in MB )/ (PP Size) = # of PPs. ( desired )

If this number is greater than the 1016 limit we will need to change the t-factor.


If the above condition is met, where you have more than 1016 PP count, then this test or t factor come into picture. Hence we need to first find out the t factor value, that we should use.


Formula for calculating factor in chvg -t: 




factor * 1016 = desired # of PPs on the new disk
factor = # off PPs / 1016  ---> always round up this value.
Use the above obtained value in chvg command i.e,
#chvg -t vgname

Example:


Consider we have a VG in a AIX server with below details.

1. TOTAL PPs = 511
2. PP Size = 32 MB
3. Number of PVs = 1 and
4. The size of the disk that exists part of the volume group is 16384 MB.
5. Now i want to add a disk of size 32768 MB

  • First i need to find how many PPs will be hosted on the new disk that i am adding
32768 / 32 =  1024 --> this is more than 1016 which is the limit.
  • Now calculate the test(t) factor.
        factor * 1016 = 1024
        factor = 1024 / 1016 -> 1.007874016  -> If we round it up its "2" .
So use the value "2" in chvg command : "chvg -t 2 vgname"

Note: Increasing the t-factor decreases the number of PV's you can have in the volume group.

Power HA migration


powerha migration detailed document

If anyone has any problems or need any help to perform the powerha migration, you can refer to the below document, which clearly illustrates how to perform the migration.

PROCEDURE :


Prework : (atleast 2 days before the actual migration)
(1)Check current version of HACMP is up and the cluster is stable.
# odmget HACMPcluster
# lssrc –ls clstrmgrES | grep state
(2) Verify the existing cluster and correct the problem if any error found
(3) Take the mksysb backup and also the copy of the below important files
     Save a copy of these files:
         /.rhosts
        /etc/hosts
          /etc/exports
        /etc/inittab
    Save a copy of these files
          /usr/es/sbin/cluster/netmon.cf
          /usr/es/sbin/cluster/etc/exports
          /usr/es/sbin/cluster/etc/rhosts
        /tmp/hacmp.log


 

(4) Take snapshot and save it to locally /tmp, and another copy in safest place like NIM server
(5)Take the clone for the rootvg disk on each server on the spare disk(extra luns should be mapped for taking the clone if it is not present) to ensure safer backout.If any issue comes, the server will be booted with this disk.
(6)Download all the requisite filesets, for the HA migration
Migration Procedure
Lets assume the one pair of participating cluster node as node1 and node2
Step 1: Stop Cluster Services on a Node Hosting the Application ( node1)
stop cluster services on node node 1 using the graceful with takeover option. (this is known as stopping cluster services and moving the resources groups to other nodes - .i.e node2 )
1.Enter smit hacmp
2.Use the System Management (C-SPOC) > Manage HACMP Services > Stop Cluster Services SMIT menu to stop cluster services.
3.Select takeover for Shutdown mode.
4.Select local node only and press Enter.
Now resource group containing the MQ to fall over to node node2 and verify it by
1] clRGinfo /usr/es/sbin/cluster/utilities/clRGinfo
2] execute command hostname
3] check with application team that application is running & all MQ channels are available
Step 2: Install the HACMP software
On node node1, install HA 6.1 filesets, which converts the previous HACMP configuration database (ODMs) to the desired format. This installation process uses the cl_convert utility and creates the /tmp/clconvert.log file. A previously created version of the file is overwritten.
To install the HACMP software:
#cd to the directory that contains filesets
1.Enter smit install
2.In SMIT, select Install and Update Software > Update Installed Software to Latest Level (Update All) and press Enter.
3.Enter the values for Preview only? and Accept new license agreements? For all other
field values, select the defaults.
Preview only select No
Accept new license agreement select Yes
4) Press enter

Step 3: Start Cluster Services on the Upgraded Node
Start cluster services on node node1.
To start cluster services on a single upgraded node:
1.Enter smit clstart
2.Enter field values as follows and press Enter
Start now, on system restart or both
Select now
Start Cluster Services on these nodes
Select local node (Default)
Manage Resource Groups Automatically/Manually
Select Automatically
HACMP brings resource group(s) online according to the resource groups’ configuration settings and the current cluster state, and starts monitoring the resource group(s) and applications for availability.
BROADCAST message at startup?
Select false
Startup Cluster Information Daemon?
Select true
Ignore verification errors?
Select false
Automatically correct errors found during cluster start?
Whatever value is in these fields will not make sense since it is a mixed cluster.
Note:Verification is not supported on a mixed version cluster. Run verification only when all nodes have been upgraded.
Step 4: Repeat the same steps for the node2
Step 5 :Verifying the Upgraded Cluster Definition
After the HACMP 6.1 software is installed on all of the nodes in the cluster and cluster services restored, verify and synchronize the cluster configuration. Verification ensures that the cluster definition is the same on all nodes. You can verify and synchronize a cluster only when all nodes in the cluster are running the same version of the software. 
 
To verify the cluster:
1.Enter smit hacmp
2.In SMIT, select Extended Configuration > Extended Verification and Synchronization > Verify Changes only and press Enter.


Verifying Software Levels Installed Using AIX 5L Commands
Run the commands lppchk -v and lppchk -c “cluster.*”
Both commands return nothing if the installation is OK.
perform the another failover to check the RG movement after the upgrade on both nodes.
How to Backout if anything goes wrong :
Boot the server with the clone disk in case of any issue