After an AIX migration everything seems to be fine. However «lppchk -v» shows an error such as below:
# lppchk -v
lppchk: The following filesets need to be installed or corrected to bring
the system to a consistent state:
rsct.core.rmc v=2, r<5 fileset="" installed="" not="" pre="" requisite="">
The error descriptions does not help much. It does not show which fileset's dependencies are actually violating the consistence of the package database. However we can search the ODM for filesets with such a dependency:
# odmget product|fgrep -p 'rsct.core.rmc v=2 r<5 b="">
product:
lpp_name = "sam.core.rte"
comp_id = ""
update = 0
cp_flag = 275
fesn = ""
name = "sam.core"
state = 5
ver = 3
rel = 2
mod = 0
fix = 0
ptf = ""
media = 3
sceded_by = ""
fixinfo = ""
prereq = "*prereq rsct.core.utils 2.4.13.1\n\
*prereq rsct.core.rmc v=2 r<5 b="">\n\
*prereq rsct.basic.rte 2.4.13.1\n\
"
description = "SA CHARM Runtime Commands"
supersedes = ""
5>5>
Conclusion: The fileset «
sam.core.rte» has such a dependency. If you run in such a problem consider to update the fileset causing the error or check if the fileset is needed at all.
Storage Management
1. Is it possible to increase the maximum number of PPs beyond 1016?
If you want to integrate a new and larger disk into an existing Volume Group you might run into problems with the the maximum number of PPs of one Physical Volume. The reason is, that when creating a new Volume Group the PP size is often set to the smallest value possible. The number of PPs/PV of a standard Volume Group is limited to 1016. What to do?
You can use chvg -t to increase the number of PPs with a factor of 2,4,16, or 32:
# chvg -t 2 rootvg
With the above command you increase the maximum number of PPs per PV in the rootvg to 2032. But be aware, that you decrease the number of PVs (hdisks) per VG with the same factor. In this example the rootvg cannot contain more then 16 PVs.
2. How can I figure out if a fibrechannel card is linked to a switch port?
Check the status of the FC SCSI I/O Controller Protocol Device:
The below example shows the status of the FC SCSI I/O Controller Protocol Device of the first fibre channel adapter if the system is not connected to the switch (cable is present, but switch port not configured) - attach: none, no SCSI ID:
# lsattr -El fscsi0
attach none How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
... and this is how it looks, if the card is connected to the switch:
# lsattr -El fscsi1
attach switch How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x610100 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
... and this is how it looks if there is no cable to a switch at all:
# lsattr -El fscsi1
attach al How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x610100 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
al means
Arbitrary Loop. You get this if there is no cable plugged into the fibre channel card. But you also get this if the system is directly attached to a storage box (e.g. FAStT). In the latter case there is nothing wrong if you see
attach: al
3. How can I create a dummy disk to reserve an hdisk number?
Below you find a situation where the next LUN that is mapped to your system would get an hdisk number 0 (hdisk0
):
# lsdev -Cc disk
hdisk1 Available 06-08-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 06-08-00-5,0 16 Bit LVD SCSI Disk Drive
To avoid this you could reserve hdisk0 for a dummy disk, e.g.:
# mkdev -l hdisk0 -c disk -t osdisk -s scsi -p scsi0 -w 0,10 -d
hdisk0 defined
Now we see hdisk0 as defined:
# lsdev -Cc disk
hdisk0 Defined 06-08-00-0,10 Other SCSI Disk Drive
hdisk1 Available 06-08-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 06-08-00-5,0 16 Bit LVD SCSI Disk Drive
... and the next LUN would be mapped to hdisk3
.
Unfortunately this trick only works for systems with a SAS controller assigned. With AIX 5.3 you still have the option to create a dummy SSA disk:
# mkdev -l hdisk0 -p ssar -t hdisk -w dummy
mkdev: 0514-519 The following device was not found in the customized
device configuration database:
name='ssar'
Don't be confused by the error - we have a hdisk0 now:
# lsdev -Cc disk
hdisk0 Defined SSA Logical Disk Drive
hdisk1 Available 06-08-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 06-08-00-5,0 16 Bit LVD SCSI Disk Drive
This complicated procedure is not needed any more since AIX 7.1 and AIX 6.1 TL6 - a new command has been made available:
# lspv
hdisk0 00c8b12ce3c7d496 rootvg active
hdisk1 00c8b12cf28e737b None
# rendev -l hdisk1 hdisk99
# lspv
hdisk0 00c8b12ce3c7d496 rootvg active
hdisk99 00c8b12cf28e737b None
4. How can I directly read out the VGDA of a PV (hdisk)?
Information about VGx, LVx, filesystems, etc. are stored in the ODM. But these information are also written to the VGDA of the disks itself. You can read the information directly from the disk's VGDA with a command like this:
# lqueryvg -Atp hdisk100
You can use
# redefinevg -d hdisk100 myvg
to synchronize the ODM with the information of the VGDA. You can also synchronize the VGDA with the information stored in the ODM:
# synclvodm myvg
5. How can I unlock a SAN disk?
Finally I got my LUN mapped to my system, but when I try to create my Volume Group with mkvg -f vpath100
all I get is an I/O error. What can I do?
Probably there is still a SAN lock on the disk. For vpath devices try to unlock it with:
# lquerypr -ch /dev/vpath100
and retry to create your Volume Group. If you use the newer sddpcm drivers the command to unlock would be
# pcmquerypr -ch /dev/hdisk100
6. How can I identify a generic SCSI disk for replacement?
To identify a SCSI disk (attached to a hot swap enclosure) with AIX you can use diag
to let it blinking:
# diag
Then select
> Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
> Hot Plug Task
> SCSI and SCSI RAID Hot Plug Manager
> Identify a Device Attached to a SCSI Hot Swap Enclosure Device
You see the following screen providing you with a list of hdisks. Select the one you need to identify:
IDENTIFY DEVICE ATTACHED TO SCSI HOT SWAP ENCLOSURE DEVICE
The following is a list of devices attached to SCSI Hot Swap Enclosure devices.
Selecting a slot will set the LED indicator to Identify.
Make selection, use Enter to continue.
U0.1-
ses2 P1-I1/Z1-Af
slot 1 P1-I1/Z1-A8 hdisk2
slot 2 P1-I1/Z1-A9 hdisk3
slot 3 P1-I1/Z1-Aa hdisk4
slot 4 P1-I1/Z1-Ab hdisk5
slot 5 P1-I1/Z1-Ac hdisk6
slot 6 P1-I1/Z1-Ad hdisk7
slot 7 P1-I1/Z1-Ae hdisk8
U0.1-
ses3 P1-I5/Z1-Af
slot 1 P1-I5/Z1-A0 hdisk9
slot 2 P1-I5/Z1-A1 hdisk10
slot 3 P1-I5/Z1-A2 hdisk11
slot 4 P1-I5/Z1-A3 hdisk12
slot 5 P1-I5/Z1-A4 hdisk13
slot 6 P1-I5/Z1-A5 hdisk14
slot 7 +------------------------------------------------------+
| |
| The LED should be in the Identify state for the |
| selected device. |
| |
| Use 'Enter' to put the device LED in the |
| Normal state and return to the previous menu. |
| |
| F3=Cancel F10=Exit Enter |
+------------------------------------------------------+
F1=Help F10=Exit
If you already removed the
hdisk
with the
rmdev
command you would still see the slot in the above screen but no device name.
7. How can I change the the name of a tape device?
You can rename a tape device (i.e. rmtX or smcX) easily with chdev. For example, if you want to rename rmt0 to rmt201 just type:
# chdev -l rmt0 -a new_name=rmt201
rmt0 changed
Please note: It only works with tapes! This is because IBM defined a special attribute new_name in the ODM only for tape drives and media changers.
Update: AIX 7.1 and AIX 6.1 T6 introduced a new command rendev that can be used to rename any device. The below command would rename ent0
to ent99
:
# rendev -l ent0 -n ent99
8. How can I find all hdisks containing an AIX boot signature?
# ipl_varyon -i
PVNAME BOOT DEVICE PVID VOLUME GROUP ID
hdisk0 YES 00f64183e8ff11c50000000000000000 00f6418300004c00
hdisk1 NO 00f6418384f345d00000000000000000 00f6418300004c00
hdisk2 NO 00f6418384f346210000000000000000 00f6418300004c00
hdisk3 NO 00f6418384f3466c0000000000000000 00f6418300004c00
hdisk4 NO 00f6418384f346b00000000000000000 00f6418300004c00
hdisk5 NO 00f6418384f346f20000000000000000 00f6418300004c00
hdisk6 NO 00f6418384f44fca0000000000000000 00f6418300004c00
hdisk7 NO 00f6418384f450150000000000000000 00f6418300004c00
hdisk8 NO 00f6418384f450540000000000000000 00f6418300004c00
hdisk9 NO 00f6418384f4508f0000000000000000 00f6418300004c00
hdisk10 NO 00f6418384f450ca0000000000000000 00f6418300004c00
hdisk11 NO 00f6418384f347390000000000000000 00f6418300004c00
hdisk12 NO 00f6418384f450ff0000000000000000 00f6418300004c00
Conclusion: Only
hdisk0 contains a boot signature.
9. How can I see statistics of an HBA?
Use the fcstat command on the FC adapter:
# fcstat fcs0
The command gives a whole page of output not shown here. The command shows statistics similar to the entstat command. If you are only interested in the port speed, you could type
# fcstat fcs0 | grep 'Port Speed'
Port Speed (supported): 8 GBIT
Port Speed (running): 4 GBIT
10. How can I find WWPNs of FC adapters from the SMS menu?
It is possible to find the WWPNs in the OpenFirmware Prompt - at least on recent hardware. From the HMC boot the LPAR into the Open Firmware Prompt and issue the ioinfo
command at the ok-prompt:
1 = SMS Menu 5 = Default Boot List
8 = Open Firmware Prompt 6 = Stored Boot List
Memory Keyboard Network SCSI Speaker
0 > ioinfo
!!! IOINFO: FOR IBM INTERNAL USE ONLY !!!
This tool gives you information about SCSI,IDE,SATA,SAS,and USB devices attached to the system
Select a tool from the following
1. SCSIINFO
2. IDEINFO
3. SATAINFO
4. SASINFO
5. USBINFO
6. FCINFO
7. VSCSIINFO
q - quit/exit
==> 6
FCINFO Main Menu
Select a FC Node from the following list:
# Location Code Pathname
---------------------------------------------------------------
1. U5877.001.0082113-P1-C10-T1 /pci@80000002000012b/fibre-channel@0
2. U5877.001.0082113-P1-C10-T2 /pci@80000002000012b/fibre-channel@0,1
3. U5877.001.0082924-P1-C10-T1 /pci@80000002000013b/fibre-channel@0
4. U5877.001.0082924-P1-C10-T2 /pci@80000002000013b/fibre-channel@0,1
q - Quit/Exit
==> 1
FC Node Menu
FC Node String: /pci@80000002000012b/fibre-channel@0
FC Node WorldWidePortName: 10000000c9d08fd0
-----------------------------------------------------------------
1. List Attached FC Devices
2. Select a FC Device
3. Enable/Disable FC Adapter Debug flags
q - Quit/Exit
Conclusion: The WWPN of the first port of the first FC adapter is: 10000000c9d08fd0. On a running AIX system you would find the same information with
# lscfg -vpl fcs0 | grep 'Network Address'
Network Address.............10000000C9D08FD0
11. How can I check what qdepth the kernel actually uses for a specific LUN?
It's easy to set the qdepth with chdev
as it is easy to read it out with lsattr
:
# chdev -l hdisk100 -a queue_depth=8
hdisk100 changed
# lsattr -El hdisk100 -a queue_depth
queue_depth 8 Queue DEPTH True
But it's not possible to change the qdepth as long as the hdisk is in use. But you could still change the value in the ODM and wait for the next reboot for the change to apply. But here we have a problem. lsattr
already shows the new value while the kernel still uses the old one.
# lsattr -El hdisk100 -a queue_depth
queue_depth 20 Queue DEPTH True
# chdev -l hdisk100 -a queue_depth=8 -P
hdisk100 changed
# lsattr -El hdisk100 -a queue_depth
queue_depth 8 Queue DEPTH True
But what qdepth does the kernel actually use? The only way to get the kernel's value is to use the kernel debugger:
# echo scsidisk hdisk100 | kdb | grep queue_depth
ushort queue_depth = 0x14;
What we see is the hex value of the qdepth. Use the below command to convert the value to decimal as it would be displayed by lsattr
:
# printf "%d\n" 0x14
20
12. How can I increase a LUN on the fly?
Whenever the SAN admins increase a LUN I run cfgmgr
, but my volume group does not recognize the new size. What to do?
Just run
# chvg -g
and the additional size can be used. Doesn't work for the rootvg and HACMP though¹.
13. How can I set the number of logical partitions to be synchronized in parallel?
In normal operation the syncvg
and varyonvg
commands don't synchronize logical partitons in parallel resulting in a very long synchronization time. But this behaviour can be changed by setting the NUM_PARALLEL_LPS
variable prior to run the synchronization commands:
# export NUM_PARALLEL_LPS=8
# varyonvg myvg
or
# export NUM_PARALLEL_LPS=8
# syncvg -v myvg
This way 8 logical partitions will be synchronized in parallel. Depending on available CPU resources this can speed up the synchronization nearly by a factor 8.
With the syncvg
command the same effect can be realized with the -P
flag:
# syncvg -P 8 -v myvg
However, if you prefer to run
varyonvg
to synchronize logical partition mirrors setting the
NUM_PARALLEL_LPS
variable is your only option.
14. How can I get rid of "ghost paths"?
It happens that a LUN is connected via two paths, but lspath
shows boths paths twice - once as Missing and another time as Enabled:
# lspath -l hdisk151
Missing hdisk151 fscsi0
Missing hdisk151 fscsi1
Enabled hdisk151 fscsi0
Enabled hdisk151 fscsi1
The reason usually is located somewhere in the SAN infrastructure - a new switchport, a replugged cable, etc. Anyway, how can I get rid of these "ghost paths" without affecting the good paths?
Not a big deal - every path to a LUN has its unique path ID:
# lspath -l hdisk151 -F "path_id:parent:path_status:status"
0:fscsi0:Missing:N/A
1:fscsi1:Missing:N/A
2:fscsi0:Available:Enabled
3:fscsi1:Available:Enabled
So all we have to do is to remove the two paths with the IDs 0 and 1...
# rmpath -dl hdisk152 -i 0
paths Deleted
# rmpath -dl hdisk151 -i 1
paths Deleted
...and the "ghost paths" are gone:
# lspath -l hdisk151 -F "path_id:parent:path_status:status"
2:fscsi0:Available:Enabled
3:fscsi1:Available:Enabled
# lspath -l hdisk151
Enabled hdisk151 fscsi0
Enabled hdisk151 fscsi1
15. How do I create a mapfile to create an exact copy of a Logical Volume?
Let's say hdisk100 is the disk holding the first and only copy of a LV called mylv.map and you want to create a second copy on hdisk101. The below command will do the trick:
# lslv -m mylv | awk '/hdisk/ { printf( "hdisk101:%d\n", $2 ) }' | tee mylv.map
hdisk101:1
hdisk101:2
hdisk101:3
hdisk101:4
If your LV is spread over multiple disks sed is your friend:
# lslv -m mylv | awk '/hdisk/ { printf( "%s:%d\n", $3, $2 ) }' | sed -e 's/hdisk100\:/hdisk200\:/' -e 's/hdisk101\:/hdisk201\:/' | tee mylv.map
hdisk200:1
hdisk201:1
hdisk200:2
hdisk201:2
hdisk200:3
hdisk201:3
hdisk200:4
hdisk201:4
In the above example hdisk100 is to going to be copied to hdisk200 and hdisk101 to hdisk201. To actually create the mirror run mklvcopy with the -m switch:
# mklvcopy -m mylv.map mylv 2
16. How can I change the status of a removed PV back to active?
After an I/O failure to a PV due to a down path or a system crash, a volume group may have a disk in a removed state:
# lsvg -p rootvg
rootvg
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 removed
432 136 76..00..00..00..60
hdisk2 active 432 136 76..00..00..00..60
We can use chpv to change the status of the PV back to active:
# chpv -va hdisk1
# syncvg -P 4 -v rootvg
The switch '-P 4' to syncvg may be used to speed up the synchronization process by syncing 4 PPs in parallel.
Miscellaneous
1. How do I create users with long login names (more than 8 characters) under AIX 5.3?
Since AIX version 5.3 one can create users with login names longer than 8 characters. In order to create such a login name you first have to enable longer login names. This can be done with:
# chdev -l sys0 -a max_logname=13
The above example allows login names with up to 12 characters.
2. Can I use passwords with more than 8 (significant) characters?
AIX always accepts passwords with more than 8 characters. But in fact only the first 8 characters are significant. If you want to use passwords with more characters the hash algorithm has to be changed in /etc/security/login.cfg
:
usw:
shells = /bin/sh,/bin/bsh,/bin/csh,/bin/ksh,/bin/tsh,/bin/ksh93,/usr/bin/sh,/usr/bin/bsh,/usr/bin/csh,/usr/bin/ksh,/usr/bin/tsh,/usr/bin/ksh93
maxlogins = 32767
logintimeout = 60
maxroles = 8
auth_type = STD_AUTH
pwd_algorithm = ssha256
The last line changes the hash algorithm from
crypt to
ssha256. The algorithm allows passwords with up to 255 characters. Have a look to
/etc/security/pwdalg.cfg
to see what other algorithms are allowed.
3. What are the correct settings for daylight saving time in Central Europe?
The timezone is set by the TZ environment variable. To set the timezone globally you have to change the TZ variable in /etc/environment. For the central eurpean countries (Brussels time) this variable should be set as follows:
TZ=CET-1CST,M3.5.0/2:00,M10.5/3:00
All services that read the timezone have to be restarted (e.g. cron). A reboot -of course- will restart everything.
Please note that AIX's default time settings for Central Europe are not correct!
Beginning with AIX 7.1 and AIX 6.1 TL5 symbolic ("Olson") values for TZ
are also respected. For The Netherlands you could set:
TZ=Europe/Amsterdam
4. Can I identify deleted files still opened by a process?
Just run fuser -V -d
on the filesystem you want to check for deleted but still opened files. This is an example for /tmp:
# fuser -V -d /tmp
/tmp:
inode=7 size=56 fd=2 512238
The PID points to the process which still has an open file descriptor to the deleted file:
# ps -fp 512238
USER PID PPID C STIME TTY TIME CMD
root 512238 1 0 Mar 20 - 3:29 /usr/sbin/rsct/bin/ctcasd
5. How can I figure out what values are known to device attributes?
From the following example output we want to change the attribute init_link of a fibre channel adapter:
# lsattr -El fcs0
bus_intr_lvl 121 Bus interrupt level False
bus_io_addr 0xbfc00 Bus I/O address False
bus_mem_addr 0xc0040000 Bus memory address False
init_link al INIT Link flags True
intr_priority 3 Interrupt priority False
lg_term_dma 0x800000 Long term DMA True
max_xfer_size 0x100000 Maximum Transfer Size True
num_cmd_elems 200 Maximum number of COMMANDS to queue to the adapter True
pref_alpa 0x1 Preferred AL_PA True
sw_fc_class 2 FC Class for Fabric True
True in the last column indicates that we indeed can change the value of this attribute¹. But what is a valid value? This can be easily figured out with the lsattr command:
# lsattr -Rl fcs0 -a init_link
al
pt2pt
Valid values are al and pt2pt. And that's how we could change it:
# chdev -l fcs0 -a init_link=pt2pt
fcs0 changed
6. How can I mount an ISO image file?
With AIX 6.1 TL4 or newer you can use loopmount:
# ls -l *.iso
-rw-r--r-- 1 root system 43974656 Jan 13 17:05 dvd_aix_profilemanager.iso
# loopmount -i dvd_aix_profilemanager.iso -o "-V cdrfs -o ro" -m /mnt
# df /mnt
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/loop0 84812 0 100% 21203 100% /mnt
7. How can I fix a broken /dev/ipldevice?
I migrated the rootvg to a different disk. Now I get tons of errors when running any mirroring command. I know a reboot solves the problem. But can I fix it without a reboot?
The problem is that /dev/ipldevice
points to the device the system was booted from. When you removed this device from the rootvg /dev/ipldevice
points to a non-existing device and you see error messages like these:
# unmirrorvg rootvg hdisk2
0516-1734 rmlvcopy: Warning, savebase failed. Please manually run 'savebase' before rebooting.
0516-1734 unmirrorvg: Warning, savebase failed. Please manually run 'savebase' before rebooting.
You can fix it by relinking /dev/ipldevice
to the disk holding the BLV. If you have your rootvg mirrored choose the first one.
# lslv -l hd5
hd5:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk16 001:000:000 0% 001:000:000:000:000
# cd /dev
# ls -l ipldevice
crw------- 2 root system 17, 2 Nov 18 2010 ipldevice
# rm -f ipldevice
# ln rhdisk16 ipldevice
# ls -l ipldevice rhdisk16
crw------- 2 root system 17, 16 Jun 25 10:58 ipldevice
crw------- 2 root system 17, 16 Jun 25 10:58 rhdisk16
# savebase
Please note that a hardlink is required.
8. How do I extend a dump device?
«sysdumpdev -e» estimates the size of the dump:
# sysdumpdev -e
0453-041 Estimated dump size in bytes: 547146956
and «sysdumpdev -e» shows the location of the dump device:
# sysdumpdev -l
primary /dev/hd7
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump TRUE
dump compression ON
In our case it's hd7. The size of the dump device is the size of the underlying LV:
# lslv hd7 | egrep 'PP SIZE|LPs'
MAX LPs: 512 PP SIZE: 256 megabyte(s)
LPs: 2
PPs: 2
In our example we need a dump device of at least 547146956 bytes ( = 522 MB) what is a bit more than what we have (2 * 256 MB = 512 MB). So we need to increase our dump device by 1 LP:
# extendlv hd7 1
# lslv hd7 | egrep 'PP SIZE|LPs'
MAX LPs: 512 PP SIZE: 256 megabyte(s)
LPs: 3
PPs: 3
5>