A. AIX GENERAL TROUBLE SHOOTING
1) FILE SYSTEM SPACE USAGE
Check for disk space problems.
# df –I (Checks for inode usage)
Filesystem 512-blocks Used Free %Used Mounted on
/dev/hd4 17301504 5926488 11375016 35% /
/dev/hd2 10485760 4583816 5901944 44% /usr
# df –k (Checks for disk space usage in 1K blocks)
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 8650752 5687508 35% 39729 2% /
/dev/hd2 5242880 2950972 44% 35227 3% /usr
# df –g (Checks for disk space usage in GigaByte blocks)
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 8.25 5.42 35% 39729 2% /
/dev/hd2 5.00 2.81 44% 35227 3% /usr
# df –gP (POSIX view with different heading names)
Filesystem GB blocks Used Available Capacity Mounted on
/dev/hd4 8.25 2.83 5.42 35% /
/dev/hd2 5.00 2.19 2.81 44% /usr
Note that the (df –k or -g) lists the disk usage (%Used) as well as the inodes usage (%Iused).
Be sure to pay close attention and try not to get the two confused when checking file system space.
Use lsps to check paging/swap space usage:
The lsps command displays the characteristics of paging spaces, such as paging space name, physical volume name, volume group name, size, percentage of the paging space used, status of space, and it shows if the paging space is set to automatic.
# lsps –a (Note that this system is paging quite a bit)
Page Space Physical Volume Volume Group Size %Used Active Auto Type
paging00 hdisk0 rootvg 10752MB 45 yes yes lv
hd6 hdisk1 rootvg 2560MB 45 yes yes lv
hd6 hdisk2 rootvg 8192MB 45 yes yes
lv
or
# swap -s
allocated = 5505024 blocks used = 2458677 blocks free = 3046347 blocks
2) LOAD AVERAGE
# uptime
11:14AM up 10 days, 21:02, 2 users, load average: 0.05, 0.05, 0.03
*Note: The load average numbers give the average number of jobs/processes in the run queue over the last 1, 5, and 15 minutes. The lowest possible load average is zero. A load average of one or two is about typical. The load avg. of 3 and above could indicate a critical issue on the system.
B. SYSTEM PERFORMANCE
1) CPU AND MEMORY USAGE
The vmstat command reports statistics about kernel threads, virtual memory, disks, traps, and CPU activity.
*us = user time, sy = system time, id = CPU idle time, wa = CPU cycles to determine that the current process is wait.
# vmstat 5 5
System Configuration: lcpu=8 mem=16384MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
5 1 4818381 24300 0 2 2 636 859 0 2048 321280 9460 24 18 53 5
6 1 4817085 25591 0 0 0 0 0 0 1838 593223 4798 53 21 23 4
7 1 4811637 31031 0 0 0 0 0 0 1975 265643 4706 30 13 49 8
2 1 4813001 29650 0 0 0 0 0 0 1814 95041 7491 8 10 76 6
4 1 4818874 23769 0 0 0 0 0 0 1864 53014 4428 5 7 81 7
A new I/O oriented view using the –I option:
# vmstat -I 5 5
System Configuration: lcpu=8 mem=16384MB
kthr memory page faults cpu
-------- ----------- ------------------------ ------------ -----------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa
5 1 0 4809912 45680 574 203 2 2 636 860 2048 321270 9459 24 18 53 5
1 0 0 4820163 35346 12 152 0 0 0 0 2034 410525 5435 10 20 67 2
2 0 0 4816092 39388 4 57 0 0 0 0 1726 566771 62167 13 20 65 2
2 1 0 4821609 33799 11 216 0 0 0 0 2024 529518 21680 13 27 56 4
6 1 0 4815588 39806 1 43 0 0 0 0 1668 481025 4853 12 18 69 1
Iostat reports CPU and I/O statistics.
# iostat (On large systems this output could be quite large)
System configuration: lcpu=2 disk=3
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.0 0.5 0.3 0.2 99.3 0.2
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk1 0.6 8.2 1.2 2030462 5660599
hdisk0 0.7 7.2 1.1 1116762 5660603ma
cd0 0.0 0.0 0.0 0 0
Note: %user shows the percentage of CPU utilization at the user level and %sys shows the percentage of the CPU utilization at the system level.
# sar 5 5
AIX jrspa22t 2 5 00283EDD4C00 07/26/06
System Configuration: lcpu=8
10:12:49 %usr %sys %wio %idle
10:12:54 22 13 2 64
10:12:59 53 4 1 42
10:13:04 52 9 1 38
10:13:09 52 3 1 44
10:13:14 39 8 2 52
Average 44 7 1 48
To monitor all CPU usage via SAR:
# sar –P ALL 5 10
The topas command displays statistics of system activities and CPU usage. This output may be viewed in intervals of seconds using the –i flag. To ensure output is in a readable format, set your terminal emulation to vt220 prior to accessing the system as well as after logging onto the system.
# topas -i5
The report from the topas command lists the CPU usage of the kernel, user, wait time, and system idle time. Below, it also lists processes, along with the PID, CPU usage, and owner that are currently running on the system.
Process Id, usage, & owner
Kernel, user, wait, & idle usage
To monitor the busiest processes on a system using topas:
# topas –Pi5 (checks at a 5 second interval)
Topas Monitor for host: jrspa22t Interval: 5 Wed Jul 26 10:15:47 2006
DATA TEXT PAGE PGFAULTS
USER PID PPID PRI NI RES RES SPACE TIME CPU% I/O OTH COMMAND
root 258066 1 60 20 88 1 160 1966:11 2.9 0 25 syncd
patrol 7778462 1 75 30 14933 674 17910 1423:17 1.1 027708 PatrolAg lAg
root 8769704 1 62 20 6313 835 6313 9:29 1.1 6 400 bgsagent
root 172116 0 16 41 17 0 17 1090:36 0.5 0 0 wlmsched
bsomqp022642060 2191530 60 20 15194 11 24748 26:38 0.4 0 0 java
root 7958554 2969668 58 41 2790 19 2790 0:01 0.4 0 202 topas
root 1417340 1 1 41 400 245 838 1349:10 0.4 0 0 seosd
ncmsqp042674916 2822388 60 20 25443 17 40486 47:57 0.3 0 0 java
patrol 6553618 1 70 30 2754 674 4157 242:53 0.2 0 1721 PatrolAg t
ncmsqp023493972 3690670 60 20 17462 11 27864 29:37 0.2 0 0 java
Find the top 15 processes using memory on a system:
# svmon -Pt15
perl -e 'while(<>){print if($.==2
$&&&!$s++);$.=0 if(/^-+$/)}'
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd LPage
1589482 oracle 247739 5402 55835 109827 Y N N
2039974 oracle 221077 5402 56167 110311 Y N N
2129990 oracle 220953 5402 56091 110111 Y N N
1982638 oracle 220808 5402 55824 109858 Y N N
1396820 oracle 219414 5402 55839 109946 Y N N
2670812 oracle 219319 5402 55990 109938 Y N N
6779124 oracle 219285 5402 56034 109932 Y N N
2216084 oracle 219245 5402 55979 109899 Y N N
2912464 oracle 219239 5402 55926 109873 Y N N
2470110 oracle 219232 5402 55953 109874 Y N N
2572518 oracle 219002 5402 56018 109846 Y N N
2584744 oracle 218920 5402 56173 109915 Y N N
2211846 oracle 218883 5402 56245 109948 Y N N
6979770 oracle 200825 5402 56144 109830 Y N N
1790028 java 187476 5727 57630 198578 N Y N
Finding the size of a PID using ps:
# ps v 3375240
PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND
3375240 - A 42:25 10859 157132 106180 xx 39 44 0.0 1.0 /pac/nc
2) WHERE TO OBTAIN PERFPMR TO COLLECT PERFORMANCE DATA
If a server has a performance problem, IBM may request that you install perfpmr and collect performance data during a peak load period. IBM normally provides instructions on how to install.
You can obtain a copy of the perfpmr scripts from the following location:
ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr
You will need to get this while you are logged onto the server with the problem.
The IBM performance team has suggested the following changes be made to the script once it is downloaded and installed:
Please change the following lines in each of the stanzas in perfpmr.cfg:
trace.sh:
logsize = 402653184
kbufsize = 201326592
filemon.sh:
filemon_kbufsize = 201326592
filemon_time_seconds = 60
space_required = 83886080
3) LAN STATUS
The netstat command shows network status for each protocol or routing table. The –i flag may be used to determine collisions and I/O errors.
# netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 0.9.6b.3e.57.61 424536 0 239376 0 0
en0 1500 89.10.12 prl28284 424536 0 239376 0 0
en2 1500 link#3 0.9.6b.ce.54.cb 4297312 0 140332 2 0
en2 1500 55.10.32 breac01t-55 4297312 0 140332 2 0
lo0 16896 link#1 5254 0 6076 0 0
lo0 16896 127 loopback 5254 0 6076 0 0
lo0 16896 ::1 5254 0 6076 0 0
Check routing tables with network addresses
# netstat -rn
Routing tables
Destination Gateway Flags Refs Use If PMTU Exp Groups
Route Tree for Protocol Family 2 (Internet):
default 89.10.12.254 UGc 0 0 en0 - -
55.10.32.0 55.10.34.184 UHSb 0 0 en2 - - =>
55.10.32/22 55.10.34.184 U 0 138677 en2 - -
55.10.34.184 127.0.0.1 UGHS 0 1 lo0 - -
55.10.35.255 55.10.34.184 UHSb 0 4 en2 - -
89.10.5.135 89.10.12.254 UGHW 1 2163 en0 - -
# ifconfig -a
en0:flags=5e080863,80
en2:flags=5e080863,c0
lo0:flags=e08084b
inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536
4) HOW TO CHECK INTERFACE CARD SPEED, AUTO NEGOTIATION INFO.
# entstat -d ent4
more
-------------------------------------------------------------
ETHERNET STATISTICS (ent4) :
Device Type: Gigabit Ethernet-SX PCI-X Adapter (14106802)
Hardware Address: 00:02:55:33:77:63
Elapsed Time: 11 days 9 hours 58 minutes 37 seconds
Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 17299124 Packets: 166277808
Bytes: 486040591195 Bytes: 38982878854
Interrupts: 0 Interrupts: 153893117
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 51
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Broadcast Packets: 60 Broadcast Packets: 97825101
Multicast Packets: 1 Multicast Packets: 95415
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0
General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
PrivateSegment LargeSend DataRateSet
Gigabit Ethernet-SX PCI-X Adapter (14106802) Specific Statistics:
--------------------------------------------------------------------
Link Status : Up
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
PCI Mode: PCI-X (100-133)
PCI Bus Width: 64-bit
Latency Timer: 144
Cache Line Size: 128
Jumbo Frames: Disabled
TCP Segmentation Offload: Enabled
TCP Segmentation Offload Packets Transmitted: 14265351
TCP Segmentation Offload Packet Errors: 0
Transmit and Receive Flow Control Status: Enabled
XON Flow Control Packets Transmitted: 0
XON Flow Control Packets Received: 0
XOFF Flow Control Packets Transmitted: 0
XOFF Flow Control Packets Received: 0
Transmit and Receive Flow Control Threshold (High): 45056
Transmit and Receive Flow Control Threshold (Low): 24576
Transmit and Receive Storage Allocation (TX/RX): 16/48
C. LAST REBOOT, RUN LEVEL, BOOT LOG, CONSOLE LOG
Check to see if the box has rebooted recently by running:
who –b
A recent system reboot could explain alarms on the system. The reboot may have been scheduled or may have been caused by a system panic, hardware failure, or power failure. Further investigation should be done. Check the CAMCS logs to see if a system panic occurred or check cron to see if a reboot script was executed.
Check for the system’s current run level. Please note that AIX operates at Run Level 2. Other Run Levels are available, but are rarely used.
who –r
To check for any configuration errors after a system reboot, run the following command to see the bootlog:
# alog –o –f /var/adm/ras/bootlog
more
The console log can be viewed using this command:
# alog –o –f /var/adm/ras/conslog
more
D. DISK DRIVE REPLACEMENT
DISK DRIVE PROCEDURES
The following commands are used to display devices on the system and their characteristics.
1) HARDWARE DEVICES
lsdev displays information about devices in the device configuration database.
Flags: -C lists information about a device that is in the Customized Devices object class.
-c specifies a device class name.
-H displays headers above the column output.
To list the disks that are in the Available state in the Customized Devices object class…..
# lsdev -CH -c disk
name status location description
hdisk0 Available 1S-08-00-5,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 1S-08-00-8,0 16 Bit LVD SCSI Disk Drive
To list all devices:
# lsdev -C -H
pg
name status location description
L2cache0 Available L2 Cache
aio0 Defined Asynchronous I/O (Legacy)
cd0 Available 1G-19-00 IDE DVD-ROM Drive
en0 Available 1L-08 Standard Ethernet Network Interface
en1 Defined 1c-08 Standard Ethernet Network Interface
en2 Available 1j-08 Standard Ethernet Network Interface
en3 Defined 1n-08 Standard Ethernet Network Interface
ent0 Available 1L-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
ent1 Available 1c-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
lspv provides information about known physical volumes on the system along with its physical disk name, physical volume identifier (PVIDs) and volume group.
# lspv
hdisk0 000c8edc02dccea9 rootvg active
hdisk1 000c8edc851ee972 rootvg active
# lspv hdisk0
PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg
PV IDENTIFIER: 000c8edc02dccea9 VG IDENTIFIER 000c8edc00004c00000000fc851ef361
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 7
TOTAL PPs: 542 (34688 megabytes) VG DESCRIPTORS: 1
FREE PPs: 86 (5504 megabytes) HOT SPARE: no
USED PPs: 456 (29184 megabytes)
FREE DISTRIBUTION: 25..60..00..00..01
USED DISTRIBUTION: 84..48..108..108..108
The –p flag will list all physical partitions of physical volume hdisk0.
# lspv -p hdisk0
hdisk0:
PP RANGE STATE REGION LV NAME TYPE MOUNT POINT
1-4 used outer edge hd5 boot N/A
5-29 free outer edge
30-109 used outer edge hd9var jfs /var
110-141 used outer middle hd6 paging N/A
142-201 free outer middle
202-217 used outer middle hd3 jfs /tmp
218-221 used center hd8 jfslog N/A
222-325 used center hd4 jfs /
326-381 used inner middle hd4 jfs /
382-433 used inner middle hd2 jfs /usr
434-541 used inner edge hd2 jfs /usr
542-542 free inner edge
Example of a problem on hdisk0.
# lspv -p hdisk0
PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg
PV IDENTIFIER: 000c8edc001363a5 VG IDENTIFIER 000c8edc00004c00000000fc851ef361
PV STATE: active
STALE PARTITIONS: 6 ALLOCATABLE: yes
Note Stale Partitions – Disk is BAD.
PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 7
TOTAL PPs: 542 (34688 megabytes) VG DESCRIPTORS: 1
FREE PPs: 86 (5504 megabytes) HOT SPARE: no
USED PPs: 456 (29184 megabytes)
FREE DISTRIBUTION: 25..60..00..00..01 FREE PP’s = 86 (25+60+1) -
USED DISTRIBUTION: 84..48..108..108..108 USED PP’s = 456 (84+48+108+108+108)
# lspv -p hdisk0
hdisk0:
PP RANGE STATE REGION LV NAME TYPE MOUNT POINT
1-4 used outer edge hd5 boot N/A
5-29 free outer edge
30-109 used outer edge hd9var jfs /var
110-141 used outer middle hd6 paging N/A
142-201 free outer middle
202-217 used outer middle hd3 jfs /tmp
218-218 *stale center hd8 jfslog N/A
219-221 used center hd8 jfslog N/A
222-222 *stale center hd4 jfs /
223-231 used center hd4 jfs /
232-232 *stale center hd4 jfs /
233-240 used center hd4 jfs /
241-241 *stale center hd4 jfs /
242-325 used center hd4 jfs /
326-381 used inner middle hd4 jfs /
382-382 *stale inner middle hd2 jfs /usr
383-400 used inner middle hd2 jfs /usr
401-401 *stale inner middle hd2 jfs /usr
402-433 used inner middle hd2 jfs /usr
434-541 used inner edge hd2 jfs /usr
542-542 free inner edge
2) VOLUME GROUPS
To list volume groups that are currently active on your system, type:
lsvg -o
# lsvg -o
rootvg
List detailed information and status about the volume group.
# lsvg rootvg
VOLUME GROUP: rootvg VG IDENTIFIER: 000c8edc00004c00000000fc851ef361
VG STATE: active PP SIZE: 64 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1084 (69376 megabytes)
MAX LVs: 256 FREE PPs: 108 (6912 megabytes)
LVs: 9 USED PPs: 976 (62464 megabytes)
OPEN LVs: 8 QUORUM: 1
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: yes
MAX PPs per PV: 1016 MAX PVs: 32
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
List the logical volumes in a volume group.
# lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 42 84 3 open/syncd N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 33 66 2 open/syncd /
hd2 jfs 20 40 2 open/syncd /usr
hd9var jfs 20 40 2 open/syncd /var
hd3 jfs 4 8 2 open/syncd /tmp
pac_lv1 jfs 1 2 2 open/syncd /pac
lvbto jfs 72 144 2 open/syncd /bto/sys
hd7 sysdump 18 18 1 open/syncd N/A
hd71 sysdump 18 18 1 open/syncd N/A
paging00 paging 42 84 2 open/syncd N/A
List the physical volume status within a volume group.
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk2 active 135 5 01..00..00..00..04
hdisk3 active 135 0 00..00..00..00..00
hdisk0 active 135 6 00..00..00..00..06
hdisk1 active 135 21 00..00..10..00..11
List attributes about a physical volume (disk):
# lsattr -El hdisk2
PCM PCM/friend/scsiscsd Path Control Module False
algorithm fail_over Algorithm True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_interval 0 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
max_transfer 0x40000 Maximum TRANSFER Size True
pvid 00283edd26fdf5680000000000000000 Physical volume identifier False
queue_depth 3 Queue DEPTH False
reserve_policy single_path Reserve Policy True
size_in_mb 36400 Size in Megabytes False
E. Running SNAP
Note: You must have an open PMR with pSeries Support (IBM) before continuing. All references to the PMR number below will be in the format of “xxxxx.YYY” where “xxxx” is the problem number and “YYY” is the branch number.
1) CALL IBM
To find the 4-digit machine type:
# uname -M
IBM,7029-6C3
Search the report for General Info and view the HW_MODEL field.
GENERAL INFO
Next Section Previous Section
====================================================================
GENERAL INFO: senthil : 0x590a0c1f : Fri 03-04-11 14:04:31 CST : 80.1
====================================================================
HOSTNAME: senthil
HOSTID: 0x590a0c1f
PRIM_IP_ADDRESS: x.x.x.x
HW_VENDOR: IBM
HW_MODEL: IBM,7029-6C3
OS_LEVEL: AIX 5.2
SYSTEM_MEMORY: 2048 Mb
DDSABLE: TRUE
DOMAIN: none
Follow the steps below to run “snap” and ftp the output to IBM:
2) HOW TO RUN SNAP COMMAND:
Using the "snap" command to gather information:
This is a powerful command to gather lots of data on all types of machines. Following are some cavaets with this command:
-- The "-b" flag gathers SSA information
-- The "-t" flag gathers the TCPIP information
-- The file created from the output is /tmp/ibmsupt/snap.pax.Z
To gather the basic information on a machine like error logs configuration, AIX driver levels, run
# snap -r (this removes any prior snap data)
# snap -gc
NOTE: Depending on the amount of SSA drives this could last anywhere from a few minutes to 2 hours, so be careful.
To gather the SSA info, use: # snap -gbc
To gather the SSA and TCPIP info, use: # snap –gtbc
To gather all system configuration information: # snap –ac
Example of output:
bos62833[root]: snap -r
Nothing to clean up
bos62833[root]: snap -gbc
Checking space requirement for general information.......................................................................................................................................................................................................................................................................................................................................................... done.
..Checking space requirement for ssa information.......... done.
Checking for enough free space in filesystem... done.
********Checking and initializing directory structure
Creating /tmp/ibmsupt directory tree... done.
Creating /tmp/ibmsupt/ssa directory tree... done.
Creating /tmp/ibmsupt/general directory tree... done.
Creating /tmp/ibmsupt/general/diagnostics directory tree... done.
Creating /tmp/ibmsupt/testcase directory tree... done.
Creating /tmp/ibmsupt/other directory tree... done.
********Finished setting up directory /tmp/ibmsupt
Gathering general system information.......................................................................................................................................................................................................................................................................................................................................................... done.
Gathering scanout information..done.
Gathering ssa system information.......... done.
Creating compressed pax file...
Starting pax/compress process... Please wait... done.
-rw------- 1 0 0 834911 Feb 8 00:08 snap.pax.Z
Note: additional flags to be used for specific data.
IBM support may request additional options to be executed with the snap command. From “man snap”, these are the different Flags:
-a Gathers all system configuration information. This option requires approximately 8MB of temporary disk space.
-A Gathers asynchronous (TTY) information.
-b Gathers SSA information.
-c Creates a compressed pax image (snap.pax.Z file) of all files in the /tmp/ibmsupt directory tree or other named output directory.
-D Gathers dump and /unix information. The primary dump device is used.
Notes:
* If bosboot -k was used to specify the running kernel to be other than /unix, the incorrect kernel is gathered. Make sure that /unix is or is linked to, the kernel in use when the dump was taken.
If the dump file is copied to the host machine, the snap command does not collect the dump image in the /tmp/ibmsupt/dump directory. Instead, it creates a link in the dump directory to the actual dump image.
-d Dir Identifies the optional snap command output directory (/tmp/ibmsupt is the default).
-f Gathers file system information.
-g Gathers the output of the lslpp -hBc command, which is required to recreate exact operating system environments. Writes output to the /tmp/ibmsupt/general/lslpp.hBc file.
Also collects general system information and writes the output to the /tmp/ibmsupt/general/general.snap file.
-G Includes predefined Object Data Manager (ODM) files in general information collected with the -g flag.
-i Gathers installation debug vital product data (VPD) information.
-k Gathers kernel information
-l Gathers programming language information.
-L Gathers LVM information.
-n Gathers Network File System (NFS) information.
-N Suppresses the check for free space.
-o OutputDevice Copies the compressed image onto diskette or tape.
-p Gathers printer information.
-r Removes snap command output from the /tmp/ibmsupt directory.
-s Gathers Systems Network Architecture (SNA) information.
-S Includes security files in general information collected with the -g flag.
-t Gathers Transmission Control Protocol/Internet Protocol (TCP/IP) information.
-T Gathers all the log files for a multicpu trace. Only the base file, trcfile, is captured with the -g flag.
-v Component Displays the output of the commands executed by the snap command. Use this flag to view the specified name or group of files.
Note: Press the Ctrl-C key sequence to interrupt the snap command. A prompt will return with the following options: press the Enter key to return to current operation; press the S key to stop the current operation; press the Q key to quit the snap command completely.
-w Gathers WLM information
3) CHECK THE CURRENT MAINTENANCE LEVEL OF YOUR SYSTEM:
# oslevel
5.2.0.0
To determine the highest recommended maintenance level reached for the current version of AIX on the system, type:
# oslevel -r
5200-03
Beginning in 2006, IBM AIX changed from “Maintenance Level (ML)” to “Technology Level (TL)” and “Service Pack (SP)” terminology. The command below will provide you will TL and SP information:
# oslevel –s
# 5200-08-01
This can be broken down as follows:
AIX Version: 5.2
Technology Level: 8
Service Pack: 1
For more detailed information on these topics, please refer to The IBM AIX 5L Service Strategy and Best Practices document.
4) CHECK DUMP SIZE
Identify the dump space settings. Note that the dump will only write to the primary or secondary and will not span to the secondary if the primary fills:
# sysdumpdev –l
primary /dev/hd7
secondary /dev/hd71
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump TRUE
dump compression OFF
Display statistical info about the most recent dump:
# sysdumpdev -L
0453-039
Device name: /dev/hd7
Major device number: 10
Minor device number: 8
Size: 23327232 bytes
Uncompressed Size: 191149876 bytes
Date/Time: Fri Feb 11 10:50:40 CST 2005
Dump status: 0
dump completed successfully
Estimates the size of the dump (in bytes) for the current running system:
# sysdumpdev –e
0453-041 Estimated dump size in bytes: 4280287232
To identify how much space is allocated to the dump device:
# lslv hd7
LOGICAL VOLUME: hd7 VOLUME GROUP: rootvg
LV IDENTIFIER: 00283edd00004c00000001024cb1a4c3.10 PERMISSION: read/write
VG STATE: active/complete LV STATE: opened/syncd
TYPE: sysdump WRITE VERIFY: off
MAX LPs: 512 PP SIZE: 256 megabyte(s)
COPIES: 1 SCHED POLICY: parallel
LPs: 18 PPs: 18
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 32
MOUNT POINT: N/A LABEL: None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
Dump Space Size (hd7) = PPs x PP SIZE
Dump Space Size (hd7) = 18 X 256 megabytes = 4608 megabytes
5) CREATE A DUMP FILE
Look at the dump size and then execute df –I or df –k to find a file system with enough space to proceed to packaging. Once a file system has been found, may proceed with creating a dump file to ftp to IBM.
# snap –gfkDN (This command can be run from any directory.)
# cd /tmp/ibmsupt/dump
# ls (Ensure that unix.Z, dump.snap, and dump.Z are present.)
# cd /tmp/ibmsupt
# snap –c
Ftp file to IBM.
If there is no room in /tmp, then run……
# snap –gfkDNd
# cd /
# ls (Ensure that unix.Z, dump.snap, and dump.Z are present.)
# snap –cd /
This will create a snap.pax.Z file in the /tmp/ibmsupt directory. The file will need to be renamed to pmr#.branch#.snap.pax.Z.
# mv snap.pax.Z
F. SHUTDOWN
The shutdown command halts the operating system. Only a user with root user authority can run this command. Do not attempt to restart the system or turn off the system before the shutdown completion message is displayed; otherwise, file system damage can result.
Make sure you are on the correct server prior to entering shutdown command:
Enter: hostname
To shutdown and restart the system:
# shutdown –Fr
Other flags that could be used with the shutdown command are:
- h Halts the operating system completely.
-m Brings the system down to maintenance (single user) mode.
-d Brings the system down from a distributed mode to a multiuser mode.
-i Interactive mode. Displays interactive messages to guide the user through the shutdown.
The last command can be used to help determine when the system was last shut down.
# last shutdown
shutdown tty0 Feb 11 14:05
shutdown tty0 Feb 10 20:23
shutdown pts/1 Feb 04 07:08
G. HARDWARE ASSISTANCE
HOW TO RUN DIAGNOSTICS
The diag command is menu driven and is used to run diagnostics for a suspected problem.
# diag
Press
Select Diagnostic Routines.
Select Problem Determination.
This instructs the diag command to test the system and analyze the error log.
You may run a diagnosis on a particular device by using the –d flag.
# diag –d (device name)
Display previous diagnostic results.
# cd /usr/lpp/diagnostics/bin
# ./diagrpt -o
Display all diagnostic result files logged since the data specified.
# /usr/lpp/diagnostics/bin/diagrpt –s 030705
This will list results for March 7, 2011.
Diagnostic result files are stored in /etc/lpp/diagnostics/data directory.
FINDING SYSTEM CONFIGURATION INFORMATION
Total physical memory in system
# bootinfo –r
Total number of processors in system
# lsdev –Cc processor (this will list each processor)
Display configuration, diagnostic, vital product data about system
# lscfg –vp
more
H. LOGS
The first place you should go when troubleshooting problems in AIX is the error report (errpt).
First run errpt without any options to get an overview of current errors:
# errpt
more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
B6048838 0725140606 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838 0725133506 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838 0725122506 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838 0724140106 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838 0721033906 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6267342 0721032506 P H hdisk1356 DISK OPERATION ERROR
B6267342 0721032506 P H hdisk1356 DISK OPERATION ERROR
B6267342 0721032506 P H hdisk1355 DISK OPERATION ERROR
B6267342 0721032506 P H hdisk1355 DISK OPERATION ERROR
To get the specifics associated with the IDENTIFIER:
# errpt -aj B6048838
more
---------------------------------------------------------------------------
LABEL: CORE_DUMP
IDENTIFIER: B6048838
Date/Time: Tue Jul 25 14:06:04 EDT
Sequence Number: 113629
Machine Id: 00283E9D4C00
Node Id: jrspa13t
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
7540756
FILE SYSTEM SERIAL NUMBER
44
INODE NUMBER
1474687
PROCESSOR ID
16
CORE FILE NAME
/pac/brsmdp07/bea/app/user_projects/domains/collections/core
PROGRAM NAME
java
ADDITIONAL INFORMATION
abort E8
??
Symptom Data
REPORTABLE
You can display errors that were encountered during the last day by specifying a date in your search.
# date
Wed Feb 23 14:57:39 CST 2005
# errpt -a -s 0222145601
more
-a display information in a detailed format
-s display all records posted after the StartDate
Example: errpt -a -s (mmddhhmmyy) month, day, hour, minute, and year minus 24 hours
I. Installed Software Installation Info
How to determine the maintenance level of software:
# lslpp –l
more (This will list every fileset on the system)
# lslpp –l
# lslpp –L
grep
# lslpp –h