Tanti Technology

My photo
Bangalore, karnataka, India
Multi-platform UNIX systems consultant and administrator in mutualized and virtualized environments I have 4.5+ years experience in AIX system Administration field. This site will be helpful for system administrator in their day to day activities.Your comments on posts are welcome.This blog is all about IBM AIX Unix flavour. This blog will be used by System admins who will be using AIX in their work life. It can also be used for those newbies who want to get certifications in AIX Administration. This blog will be updated frequently to help the system admins and other new learners. DISCLAIMER: Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility. If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.

Tuesday 19 November 2013

Recovering a Failed VIO Disk


Here is a recovery procedure for replacing a failed client disk on a Virtual IO 
server. It assumes the client partitions have mirrored (virtual) disks. The 
recovery involves both the VIO server and its client partitions. However, 
it is non disruptive for the client partitions (no downtime), and may be 
non disruptive on the VIO server (depending on disk configuration). This
procedure does not apply to Raid5 or SAN disk failures.

The test system had two VIO servers and an AIX client. The AIX client had two 
virtual disks (one disk from each VIO server). The two virtual disks 
were mirrored in the client using AIX's mirrorvg. (The procedure would be 
the same on a single VIO server with two disks.) 

The software levels were:


p520: Firmware SF230_145 VIO Version 1.2.0 Client: AIX 5.3 ML3 


We had simulated the disk failure by removing the client LV on one VIO server. The 
padmin commands to simulate the failure were:


#rmdev -dev vtscsi01 # The virtual scsi device for the LV (lsmap -all)
#rmlv -f aix_client_lv # Remove the client LV


This caused "hdisk1" on the AIX client to go "missing" ("lsvg -p rootvg"....The
"lspv" will not show disk failure...only the disk status at the last boot..)

The recovery steps included:

VIO Server 


Fix the disk failure, and restore the VIOS operating system (if necessary)mklv -lv aix_client_lv rootvg 10G # recreate the client LV mkvdev -vdev aix_client_lv -vadapter vhost1 # connect the client LV to the appropriate vhost 


AIX Client 


# cfgmgr # discover the new virtual hdisk2 
replacepv hdisk1 hdisk2 
# rebuild the mirror copy on hdisk2 
# bosboot -ad /dev/hdisk2 ( add boot image to hdisk2)
# bootlist -m normal hdisk0 hdisk2 ( add the new disk to the bootlist)

# rmdev -dl hdisk1 ( remove failed hdisk1)


The "replacepv" command assigns hdisk2 to the volume group, rebuilds the mirror, and 
then removes hdisk1 from the volume group. 

As always, be sure to test this procedure before using in production.
Virtual SCSI Server Adapter and Virtual Target Device.
The mkvdev command will error out if the same name for both is used.

$ mkvdev -vdev hdiskpower0 -vadapter vhost0 -dev hdiskpower0
Method error (/usr/lib/methods/define -g -d):
0514-013 Logical name is required.

The reserve attribute is named differently for an EMC device than the attribute
for ESS or FasTt storage device. It is “reserve_lock”.

Run the following command as padmin for checking the value of the attribute.
lsdev -dev hdiskpower# -attr reserve_lock

Run the following command as padmin for changing the value of the attribute.
chdev -dev hdiskpower# -attr reserve_lock=no

Commands to change the Fibre Channel Adapter attributes And also change the following attributes of the fscsi#, fc_err_recov to “fast_fail” and dyntrk to “yes”
 

$ chdev -dev fscsi# -attr fc_err_recov=fast_fail dyntrk=yes –perm

The reason for changing the fc_err_recov to “fast_fail” is that if the Fibre
Channel adapter driver detects a link event such as a lost link between a storage
device and a switch, then any new I/O or future retries of the failed I/Os will be
failed immediately by the adapter until the adapter driver detects that the device
has rejoined the fabric. The default setting for this attribute is 'delayed_fail’.
Setting the dyntrk attribute to “yes” makes AIX tolerate cabling changes in the
SAN.

The VIOS needs to be rebooted for fscsi# attributes to take effect.

No comments:

Post a Comment