TANTI TECHNOLOGIES: 02/18/13

A. AIX GENERAL TROUBLE SHOOTING

1) FILE SYSTEM SPACE USAGE

Check for disk space problems.

# df –I (Checks for inode usage)

Filesystem 512-blocks Used Free %Used Mounted on

/dev/hd4 17301504 5926488 11375016 35% /

/dev/hd2 10485760 4583816 5901944 44% /usr

# df –k (Checks for disk space usage in 1K blocks)

Filesystem 1024-blocks Free %Used Iused %Iused Mounted on

/dev/hd4 8650752 5687508 35% 39729 2% /

/dev/hd2 5242880 2950972 44% 35227 3% /usr

# df –g (Checks for disk space usage in GigaByte blocks)

Filesystem GB blocks Free %Used Iused %Iused Mounted on

/dev/hd4 8.25 5.42 35% 39729 2% /

/dev/hd2 5.00 2.81 44% 35227 3% /usr

# df –gP (POSIX view with different heading names)

Filesystem GB blocks Used Available Capacity Mounted on

/dev/hd4 8.25 2.83 5.42 35% /

/dev/hd2 5.00 2.19 2.81 44% /usr

Note that the (df –k or -g) lists the disk usage (%Used) as well as the inodes usage (%Iused).

Be sure to pay close attention and try not to get the two confused when checking file system space.

Use lsps to check paging/swap space usage:

The lsps command displays the characteristics of paging spaces, such as paging space name, physical volume name, volume group name, size, percentage of the paging space used, status of space, and it shows if the paging space is set to automatic.

# lsps –a (Note that this system is paging quite a bit)

Page Space Physical Volume Volume Group Size %Used Active Auto Type

paging00 hdisk0 rootvg 10752MB 45 yes yes lv

hd6 hdisk1 rootvg 2560MB 45 yes yes lv

hd6 hdisk2 rootvg 8192MB 45 yes yes

lv

or

# swap -s

allocated = 5505024 blocks used = 2458677 blocks free = 3046347 blocks

2) LOAD AVERAGE

# uptime

11:14AM up 10 days, 21:02, 2 users, load average: 0.05, 0.05, 0.03

*Note: The load average numbers give the average number of jobs/processes in the run queue over the last 1, 5, and 15 minutes. The lowest possible load average is zero. A load average of one or two is about typical. The load avg. of 3 and above could indicate a critical issue on the system.

B. SYSTEM PERFORMANCE

1) CPU AND MEMORY USAGE

The vmstat command reports statistics about kernel threads, virtual memory, disks, traps, and CPU activity.

*us = user time, sy = system time, id = CPU idle time, wa = CPU cycles to determine that the current process is wait.

# vmstat 5 5

System Configuration: lcpu=8 mem=16384MB

kthr memory page faults cpu

----- ----------- ------------------------ ------------ -----------

r b avm fre re pi po fr sr cy in sy cs us sy id wa

5 1 4818381 24300 0 2 2 636 859 0 2048 321280 9460 24 18 53 5

6 1 4817085 25591 0 0 0 0 0 0 1838 593223 4798 53 21 23 4

7 1 4811637 31031 0 0 0 0 0 0 1975 265643 4706 30 13 49 8

2 1 4813001 29650 0 0 0 0 0 0 1814 95041 7491 8 10 76 6

4 1 4818874 23769 0 0 0 0 0 0 1864 53014 4428 5 7 81 7

A new I/O oriented view using the –I option:

# vmstat -I 5 5

System Configuration: lcpu=8 mem=16384MB

kthr memory page faults cpu

-------- ----------- ------------------------ ------------ -----------

r b p avm fre fi fo pi po fr sr in sy cs us sy id wa

5 1 0 4809912 45680 574 203 2 2 636 860 2048 321270 9459 24 18 53 5

1 0 0 4820163 35346 12 152 0 0 0 0 2034 410525 5435 10 20 67 2

2 0 0 4816092 39388 4 57 0 0 0 0 1726 566771 62167 13 20 65 2

2 1 0 4821609 33799 11 216 0 0 0 0 2024 529518 21680 13 27 56 4

6 1 0 4815588 39806 1 43 0 0 0 0 1668 481025 4853 12 18 69 1

Iostat reports CPU and I/O statistics.

# iostat (On large systems this output could be quite large)

System configuration: lcpu=2 disk=3

tty: tin tout avg-cpu: % user % sys % idle % iowait

0.0 0.5 0.3 0.2 99.3 0.2

Disks: % tm_act Kbps tps Kb_read Kb_wrtn

hdisk1 0.6 8.2 1.2 2030462 5660599

hdisk0 0.7 7.2 1.1 1116762 5660603ma

cd0 0.0 0.0 0.0 0 0

Note: %user shows the percentage of CPU utilization at the user level and %sys shows the percentage of the CPU utilization at the system level.

# sar 5 5

AIX jrspa22t 2 5 00283EDD4C00 07/26/06

System Configuration: lcpu=8

10:12:49 %usr %sys %wio %idle

10:12:54 22 13 2 64

10:12:59 53 4 1 42

10:13:04 52 9 1 38

10:13:09 52 3 1 44

10:13:14 39 8 2 52

Average 44 7 1 48

To monitor all CPU usage via SAR:

# sar –P ALL 5 10

The topas command displays statistics of system activities and CPU usage. This output may be viewed in intervals of seconds using the –i flag. To ensure output is in a readable format, set your terminal emulation to vt220 prior to accessing the system as well as after logging onto the system.

# topas -i5

The report from the topas command lists the CPU usage of the kernel, user, wait time, and system idle time. Below, it also lists processes, along with the PID, CPU usage, and owner that are currently running on the system.

Process Id, usage, & owner

Kernel, user, wait, & idle usage

To monitor the busiest processes on a system using topas:

# topas –Pi5 (checks at a 5 second interval)

Topas Monitor for host: jrspa22t Interval: 5 Wed Jul 26 10:15:47 2006

DATA TEXT PAGE PGFAULTS

USER PID PPID PRI NI RES RES SPACE TIME CPU% I/O OTH COMMAND

root 258066 1 60 20 88 1 160 1966:11 2.9 0 25 syncd

patrol 7778462 1 75 30 14933 674 17910 1423:17 1.1 027708 PatrolAg lAg

root 8769704 1 62 20 6313 835 6313 9:29 1.1 6 400 bgsagent

root 172116 0 16 41 17 0 17 1090:36 0.5 0 0 wlmsched

bsomqp022642060 2191530 60 20 15194 11 24748 26:38 0.4 0 0 java

root 7958554 2969668 58 41 2790 19 2790 0:01 0.4 0 202 topas

root 1417340 1 1 41 400 245 838 1349:10 0.4 0 0 seosd

ncmsqp042674916 2822388 60 20 25443 17 40486 47:57 0.3 0 0 java

patrol 6553618 1 70 30 2754 674 4157 242:53 0.2 0 1721 PatrolAg t

ncmsqp023493972 3690670 60 20 17462 11 27864 29:37 0.2 0 0 java

Find the top 15 processes using memory on a system:

# svmon -Pt15
perl -e 'while(<>){print if($.==2

$&&&!$s++);$.=0 if(/^-+$/)}'

-------------------------------------------------------------------------------

Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd LPage

1589482 oracle 247739 5402 55835 109827 Y N N

2039974 oracle 221077 5402 56167 110311 Y N N

2129990 oracle 220953 5402 56091 110111 Y N N

1982638 oracle 220808 5402 55824 109858 Y N N

1396820 oracle 219414 5402 55839 109946 Y N N

2670812 oracle 219319 5402 55990 109938 Y N N

6779124 oracle 219285 5402 56034 109932 Y N N

2216084 oracle 219245 5402 55979 109899 Y N N

2912464 oracle 219239 5402 55926 109873 Y N N

2470110 oracle 219232 5402 55953 109874 Y N N

2572518 oracle 219002 5402 56018 109846 Y N N

2584744 oracle 218920 5402 56173 109915 Y N N

2211846 oracle 218883 5402 56245 109948 Y N N

6979770 oracle 200825 5402 56144 109830 Y N N

1790028 java 187476 5727 57630 198578 N Y N

Finding the size of a PID using ps:

# ps v 3375240

PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND

3375240 - A 42:25 10859 157132 106180 xx 39 44 0.0 1.0 /pac/nc

2) WHERE TO OBTAIN PERFPMR TO COLLECT PERFORMANCE DATA

If a server has a performance problem, IBM may request that you install perfpmr and collect performance data during a peak load period. IBM normally provides instructions on how to install.

You can obtain a copy of the perfpmr scripts from the following location:

ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr

You will need to get this while you are logged onto the server with the problem.

The IBM performance team has suggested the following changes be made to the script once it is downloaded and installed:

Please change the following lines in each of the stanzas in perfpmr.cfg:

trace.sh:

logsize = 402653184

kbufsize = 201326592

filemon.sh:

filemon_kbufsize = 201326592

filemon_time_seconds = 60

space_required = 83886080

3) LAN STATUS

The netstat command shows network status for each protocol or routing table. The –i flag may be used to determine collisions and I/O errors.

# netstat -i

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

en0 1500 link#2 0.9.6b.3e.57.61 424536 0 239376 0 0

en0 1500 89.10.12 prl28284 424536 0 239376 0 0

en2 1500 link#3 0.9.6b.ce.54.cb 4297312 0 140332 2 0

en2 1500 55.10.32 breac01t-55 4297312 0 140332 2 0

lo0 16896 link#1 5254 0 6076 0 0

lo0 16896 127 loopback 5254 0 6076 0 0

lo0 16896 ::1 5254 0 6076 0 0

Check routing tables with network addresses

# netstat -rn

Routing tables

Destination Gateway Flags Refs Use If PMTU Exp Groups

Route Tree for Protocol Family 2 (Internet):

default 89.10.12.254 UGc 0 0 en0 - -

55.10.32.0 55.10.34.184 UHSb 0 0 en2 - - =>

55.10.32/22 55.10.34.184 U 0 138677 en2 - -

55.10.34.184 127.0.0.1 UGHS 0 1 lo0 - -

55.10.35.255 55.10.34.184 UHSb 0 4 en2 - -

89.10.5.135 89.10.12.254 UGHW 1 2163 en0 - -

# ifconfig -a

en0:flags=5e080863,80 inet 89.10.12.31 netmask 0xffffff00 broadcast 89.10.12.255

en2:flags=5e080863,c0inet 55.10.34.184 netmask 0xfffffc00 broadcast 55.10.35.255

lo0:flags=e08084b inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255

inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536

4) HOW TO CHECK INTERFACE CARD SPEED, AUTO NEGOTIATION INFO.

# entstat -d ent4
more

-------------------------------------------------------------

ETHERNET STATISTICS (ent4) :

Device Type: Gigabit Ethernet-SX PCI-X Adapter (14106802)

Hardware Address: 00:02:55:33:77:63

Elapsed Time: 11 days 9 hours 58 minutes 37 seconds

Transmit Statistics: Receive Statistics:

-------------------- -------------------

Packets: 17299124 Packets: 166277808

Bytes: 486040591195 Bytes: 38982878854

Interrupts: 0 Interrupts: 153893117

Transmit Errors: 0 Receive Errors: 0

Packets Dropped: 0 Packets Dropped: 0

Bad Packets: 0

Max Packets on S/W Transmit Queue: 51

S/W Transmit Queue Overflow: 0

Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 60 Broadcast Packets: 97825101

Multicast Packets: 1 Multicast Packets: 95415

No Carrier Sense: 0 CRC Errors: 0

DMA Underrun: 0 DMA Overrun: 0

Lost CTS Errors: 0 Alignment Errors: 0

Max Collision Errors: 0 No Resource Errors: 0

Late Collision Errors: 0 Receive Collision Errors: 0

Deferred: 0 Packet Too Short Errors: 0

SQE Test: 0 Packet Too Long Errors: 0

Timeout Errors: 0 Packets Discarded by Adapter: 0

Single Collision Count: 0 Receiver Start Count: 0

Multiple Collision Count: 0

Current HW Transmit Queue Length: 0

General Statistics:

-------------------

No mbuf Errors: 0

Adapter Reset Count: 0

Adapter Data Rate: 2000

Driver Flags: Up Broadcast Running

Simplex 64BitSupport ChecksumOffload

PrivateSegment LargeSend DataRateSet

Gigabit Ethernet-SX PCI-X Adapter (14106802) Specific Statistics:

--------------------------------------------------------------------

Link Status : Up

Media Speed Selected: Auto negotiation

Media Speed Running: 1000 Mbps Full Duplex

PCI Mode: PCI-X (100-133)

PCI Bus Width: 64-bit

Latency Timer: 144

Cache Line Size: 128

Jumbo Frames: Disabled

TCP Segmentation Offload: Enabled

TCP Segmentation Offload Packets Transmitted: 14265351

TCP Segmentation Offload Packet Errors: 0

Transmit and Receive Flow Control Status: Enabled

XON Flow Control Packets Transmitted: 0

XON Flow Control Packets Received: 0

XOFF Flow Control Packets Transmitted: 0

XOFF Flow Control Packets Received: 0

Transmit and Receive Flow Control Threshold (High): 45056

Transmit and Receive Flow Control Threshold (Low): 24576

Transmit and Receive Storage Allocation (TX/RX): 16/48

C. LAST REBOOT, RUN LEVEL, BOOT LOG, CONSOLE LOG

Check to see if the box has rebooted recently by running:

who –b

A recent system reboot could explain alarms on the system. The reboot may have been scheduled or may have been caused by a system panic, hardware failure, or power failure. Further investigation should be done. Check the CAMCS logs to see if a system panic occurred or check cron to see if a reboot script was executed.

Check for the system’s current run level. Please note that AIX operates at Run Level 2. Other Run Levels are available, but are rarely used.

who –r

To check for any configuration errors after a system reboot, run the following command to see the bootlog:

# alog –o –f /var/adm/ras/bootlog
more

The console log can be viewed using this command:

# alog –o –f /var/adm/ras/conslog
more

D. DISK DRIVE REPLACEMENT

DISK DRIVE PROCEDURES

The following commands are used to display devices on the system and their characteristics.

1) HARDWARE DEVICES

lsdev displays information about devices in the device configuration database.

Flags: -C lists information about a device that is in the Customized Devices object class.

-c specifies a device class name.

-H displays headers above the column output.

To list the disks that are in the Available state in the Customized Devices object class…..

# lsdev -CH -c disk

name status location description

hdisk0 Available 1S-08-00-5,0 16 Bit LVD SCSI Disk Drive

hdisk1 Available 1S-08-00-8,0 16 Bit LVD SCSI Disk Drive

To list all devices:

# lsdev -C -H
pg

name status location description

L2cache0 Available L2 Cache

aio0 Defined Asynchronous I/O (Legacy)

cd0 Available 1G-19-00 IDE DVD-ROM Drive

en0 Available 1L-08 Standard Ethernet Network Interface

en1 Defined 1c-08 Standard Ethernet Network Interface

en2 Available 1j-08 Standard Ethernet Network Interface

en3 Defined 1n-08 Standard Ethernet Network Interface

ent0 Available 1L-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)

ent1 Available 1c-08 10/100 Mbps Ethernet PCI Adapter II (1410ff01)

lspv provides information about known physical volumes on the system along with its physical disk name, physical volume identifier (PVIDs) and volume group.

# lspv

hdisk0 000c8edc02dccea9 rootvg active

hdisk1 000c8edc851ee972 rootvg active

# lspv hdisk0

PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg

PV IDENTIFIER: 000c8edc02dccea9 VG IDENTIFIER 000c8edc00004c00000000fc851ef361

PV STATE: active

STALE PARTITIONS: 0 ALLOCATABLE: yes

PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 7

TOTAL PPs: 542 (34688 megabytes) VG DESCRIPTORS: 1

FREE PPs: 86 (5504 megabytes) HOT SPARE: no

USED PPs: 456 (29184 megabytes)

FREE DISTRIBUTION: 25..60..00..00..01

USED DISTRIBUTION: 84..48..108..108..108

The –p flag will list all physical partitions of physical volume hdisk0.

# lspv -p hdisk0

hdisk0:

PP RANGE STATE REGION LV NAME TYPE MOUNT POINT

1-4 used outer edge hd5 boot N/A

5-29 free outer edge

30-109 used outer edge hd9var jfs /var

110-141 used outer middle hd6 paging N/A

142-201 free outer middle

202-217 used outer middle hd3 jfs /tmp

218-221 used center hd8 jfslog N/A

222-325 used center hd4 jfs /

326-381 used inner middle hd4 jfs /

382-433 used inner middle hd2 jfs /usr

434-541 used inner edge hd2 jfs /usr

542-542 free inner edge

Example of a problem on hdisk0.

# lspv -p hdisk0

PHYSICAL VOLUME: hdisk0 VOLUME GROUP: rootvg

PV IDENTIFIER: 000c8edc001363a5 VG IDENTIFIER 000c8edc00004c00000000fc851ef361

PV STATE: active

STALE PARTITIONS: 6 ALLOCATABLE: yes

Note Stale Partitions – Disk is BAD.

PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 7

TOTAL PPs: 542 (34688 megabytes) VG DESCRIPTORS: 1

FREE PPs: 86 (5504 megabytes) HOT SPARE: no

USED PPs: 456 (29184 megabytes)

FREE DISTRIBUTION: 25..60..00..00..01 FREE PP’s = 86 (25+60+1) -

USED DISTRIBUTION: 84..48..108..108..108 USED PP’s = 456 (84+48+108+108+108)

# lspv -p hdisk0

hdisk0:

PP RANGE STATE REGION LV NAME TYPE MOUNT POINT

1-4 used outer edge hd5 boot N/A

5-29 free outer edge

30-109 used outer edge hd9var jfs /var

110-141 used outer middle hd6 paging N/A

142-201 free outer middle

202-217 used outer middle hd3 jfs /tmp

218-218 *stale center hd8 jfslog N/A

219-221 used center hd8 jfslog N/A

222-222 *stale center hd4 jfs /

223-231 used center hd4 jfs /

232-232 *stale center hd4 jfs /

233-240 used center hd4 jfs /

241-241 *stale center hd4 jfs /

242-325 used center hd4 jfs /

326-381 used inner middle hd4 jfs /

382-382 *stale inner middle hd2 jfs /usr

383-400 used inner middle hd2 jfs /usr

401-401 *stale inner middle hd2 jfs /usr

402-433 used inner middle hd2 jfs /usr

434-541 used inner edge hd2 jfs /usr

542-542 free inner edge

2) VOLUME GROUPS

To list volume groups that are currently active on your system, type:

lsvg -o

# lsvg -o

rootvg

List detailed information and status about the volume group.

# lsvg rootvg

VOLUME GROUP: rootvg VG IDENTIFIER: 000c8edc00004c00000000fc851ef361

VG STATE: active PP SIZE: 64 megabyte(s)

VG PERMISSION: read/write TOTAL PPs: 1084 (69376 megabytes)

MAX LVs: 256 FREE PPs: 108 (6912 megabytes)

LVs: 9 USED PPs: 976 (62464 megabytes)

OPEN LVs: 8 QUORUM: 1

TOTAL PVs: 2 VG DESCRIPTORS: 3

STALE PVs: 0 STALE PPs: 0

ACTIVE PVs: 2 AUTO ON: yes

MAX PPs per PV: 1016 MAX PVs: 32

LTG size: 128 kilobyte(s) AUTO SYNC: no

HOT SPARE: no BB POLICY: relocatable

List the logical volumes in a volume group.

# lsvg -l rootvg

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 2 2 closed/syncd N/A

hd6 paging 42 84 3 open/syncd N/A

hd8 jfslog 1 2 2 open/syncd N/A

hd4 jfs 33 66 2 open/syncd /

hd2 jfs 20 40 2 open/syncd /usr

hd9var jfs 20 40 2 open/syncd /var

hd3 jfs 4 8 2 open/syncd /tmp

pac_lv1 jfs 1 2 2 open/syncd /pac

lvbto jfs 72 144 2 open/syncd /bto/sys

hd7 sysdump 18 18 1 open/syncd N/A

hd71 sysdump 18 18 1 open/syncd N/A

paging00 paging 42 84 2 open/syncd N/A

List the physical volume status within a volume group.

# lsvg -p rootvg

rootvg:

PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION

hdisk2 active 135 5 01..00..00..00..04

hdisk3 active 135 0 00..00..00..00..00

hdisk0 active 135 6 00..00..00..00..06

hdisk1 active 135 21 00..00..10..00..11

List attributes about a physical volume (disk):

# lsattr -El hdisk2

PCM PCM/friend/scsiscsd Path Control Module False

algorithm fail_over Algorithm True

dist_err_pcnt 0 Distributed Error Percentage True

dist_tw_width 50 Distributed Error Sample Time True

hcheck_interval 0 Health Check Interval True

hcheck_mode nonactive Health Check Mode True

max_transfer 0x40000 Maximum TRANSFER Size True

pvid 00283edd26fdf5680000000000000000 Physical volume identifier False

queue_depth 3 Queue DEPTH False

reserve_policy single_path Reserve Policy True

size_in_mb 36400 Size in Megabytes False

E. Running SNAP

Note: You must have an open PMR with pSeries Support (IBM) before continuing. All references to the PMR number below will be in the format of “xxxxx.YYY” where “xxxx” is the problem number and “YYY” is the branch number.

1) CALL IBM

To find the 4-digit machine type:

# uname -M

IBM,7029-6C3

Search the report for General Info and view the HW_MODEL field.

GENERAL INFO

Next Section Previous Section

====================================================================

GENERAL INFO: senthil : 0x590a0c1f : Fri 03-04-11 14:04:31 CST : 80.1

====================================================================

HOSTNAME: senthil

HOSTID: 0x590a0c1f

PRIM_IP_ADDRESS: x.x.x.x

HW_VENDOR: IBM

HW_MODEL: IBM,7029-6C3

OS_LEVEL: AIX 5.2

SYSTEM_MEMORY: 2048 Mb

DDSABLE: TRUE

DOMAIN: none

Follow the steps below to run “snap” and ftp the output to IBM:

2) HOW TO RUN SNAP COMMAND:

Using the "snap" command to gather information:

This is a powerful command to gather lots of data on all types of machines. Following are some cavaets with this command:

-- The "-b" flag gathers SSA information

-- The "-t" flag gathers the TCPIP information

-- The file created from the output is /tmp/ibmsupt/snap.pax.Z

To gather the basic information on a machine like error logs configuration, AIX driver levels, run

# snap -r (this removes any prior snap data)

# snap -gc

NOTE: Depending on the amount of SSA drives this could last anywhere from a few minutes to 2 hours, so be careful.

To gather the SSA info, use: # snap -gbc

To gather the SSA and TCPIP info, use: # snap –gtbc

To gather all system configuration information: # snap –ac

Example of output:

bos62833[root]: snap -r

Nothing to clean up

bos62833[root]: snap -gbc

Checking space requirement for general information.......................................................................................................................................................................................................................................................................................................................................................... done.

..Checking space requirement for ssa information.......... done.

Checking for enough free space in filesystem... done.

********Checking and initializing directory structure

Creating /tmp/ibmsupt directory tree... done.

Creating /tmp/ibmsupt/ssa directory tree... done.

Creating /tmp/ibmsupt/general directory tree... done.

Creating /tmp/ibmsupt/general/diagnostics directory tree... done.

Creating /tmp/ibmsupt/testcase directory tree... done.

Creating /tmp/ibmsupt/other directory tree... done.

********Finished setting up directory /tmp/ibmsupt

Gathering general system information.......................................................................................................................................................................................................................................................................................................................................................... done.

Gathering scanout information..done.

Gathering ssa system information.......... done.

Creating compressed pax file...

Starting pax/compress process... Please wait... done.

-rw------- 1 0 0 834911 Feb 8 00:08 snap.pax.Z

Note: additional flags to be used for specific data.

IBM support may request additional options to be executed with the snap command. From “man snap”, these are the different Flags:

-a Gathers all system configuration information. This option requires approximately 8MB of temporary disk space.

-A Gathers asynchronous (TTY) information.

-b Gathers SSA information.

-c Creates a compressed pax image (snap.pax.Z file) of all files in the /tmp/ibmsupt directory tree or other named output directory.

-D Gathers dump and /unix information. The primary dump device is used.

Notes:

* If bosboot -k was used to specify the running kernel to be other than /unix, the incorrect kernel is gathered. Make sure that /unix is or is linked to, the kernel in use when the dump was taken.

If the dump file is copied to the host machine, the snap command does not collect the dump image in the /tmp/ibmsupt/dump directory. Instead, it creates a link in the dump directory to the actual dump image.

-d Dir Identifies the optional snap command output directory (/tmp/ibmsupt is the default).

-f Gathers file system information.

-g Gathers the output of the lslpp -hBc command, which is required to recreate exact operating system environments. Writes output to the /tmp/ibmsupt/general/lslpp.hBc file.

Also collects general system information and writes the output to the /tmp/ibmsupt/general/general.snap file.

-G Includes predefined Object Data Manager (ODM) files in general information collected with the -g flag.

-i Gathers installation debug vital product data (VPD) information.

-k Gathers kernel information

-l Gathers programming language information.

-L Gathers LVM information.

-n Gathers Network File System (NFS) information.

-N Suppresses the check for free space.

-o OutputDevice Copies the compressed image onto diskette or tape.

-p Gathers printer information.

-r Removes snap command output from the /tmp/ibmsupt directory.

-s Gathers Systems Network Architecture (SNA) information.

-S Includes security files in general information collected with the -g flag.

-t Gathers Transmission Control Protocol/Internet Protocol (TCP/IP) information.

-T Gathers all the log files for a multicpu trace. Only the base file, trcfile, is captured with the -g flag.

-v Component Displays the output of the commands executed by the snap command. Use this flag to view the specified name or group of files.

Note: Press the Ctrl-C key sequence to interrupt the snap command. A prompt will return with the following options: press the Enter key to return to current operation; press the S key to stop the current operation; press the Q key to quit the snap command completely.

-w Gathers WLM information

3) CHECK THE CURRENT MAINTENANCE LEVEL OF YOUR SYSTEM:

# oslevel

5.2.0.0

To determine the highest recommended maintenance level reached for the current version of AIX on the system, type:

# oslevel -r

5200-03

Beginning in 2006, IBM AIX changed from “Maintenance Level (ML)” to “Technology Level (TL)” and “Service Pack (SP)” terminology. The command below will provide you will TL and SP information:

# oslevel –s

# 5200-08-01

This can be broken down as follows:

AIX Version: 5.2

Technology Level: 8

Service Pack: 1

For more detailed information on these topics, please refer to The IBM AIX 5L Service Strategy and Best Practices document.

4) CHECK DUMP SIZE

Identify the dump space settings. Note that the dump will only write to the primary or secondary and will not span to the secondary if the primary fills:

# sysdumpdev –l

primary /dev/hd7

secondary /dev/hd71

copy directory /var/adm/ras

forced copy flag TRUE

always allow dump TRUE

dump compression OFF

Display statistical info about the most recent dump:

# sysdumpdev -L

0453-039

Device name: /dev/hd7

Major device number: 10

Minor device number: 8

Size: 23327232 bytes

Uncompressed Size: 191149876 bytes

Date/Time: Fri Feb 11 10:50:40 CST 2005

Dump status: 0

dump completed successfully

Estimates the size of the dump (in bytes) for the current running system:

# sysdumpdev –e

0453-041 Estimated dump size in bytes: 4280287232

To identify how much space is allocated to the dump device:

# lslv hd7

LOGICAL VOLUME: hd7 VOLUME GROUP: rootvg

LV IDENTIFIER: 00283edd00004c00000001024cb1a4c3.10 PERMISSION: read/write

VG STATE: active/complete LV STATE: opened/syncd

TYPE: sysdump WRITE VERIFY: off

MAX LPs: 512 PP SIZE: 256 megabyte(s)

COPIES: 1 SCHED POLICY: parallel

LPs: 18 PPs: 18

STALE PPs: 0 BB POLICY: relocatable

INTER-POLICY: minimum RELOCATABLE: yes

INTRA-POLICY: middle UPPER BOUND: 32

MOUNT POINT: N/A LABEL: None

MIRROR WRITE CONSISTENCY: on/ACTIVE

EACH LP COPY ON A SEPARATE PV ?: yes

Serialize IO ?: NO

Dump Space Size (hd7) = PPs x PP SIZE

Dump Space Size (hd7) = 18 X 256 megabytes = 4608 megabytes

5) CREATE A DUMP FILE

Look at the dump size and then execute df –I or df –k to find a file system with enough space to proceed to packaging. Once a file system has been found, may proceed with creating a dump file to ftp to IBM.

# snap –gfkDN (This command can be run from any directory.)

# cd /tmp/ibmsupt/dump

# ls (Ensure that unix.Z, dump.snap, and dump.Z are present.)

# cd /tmp/ibmsupt

# snap –c

Ftp file to IBM.

If there is no room in /tmp, then run……

# snap –gfkDNd

# cd //ibmsupt/dump

# ls (Ensure that unix.Z, dump.snap, and dump.Z are present.)

# snap –cd //ibmsupt

This will create a snap.pax.Z file in the /tmp/ibmsupt directory. The file will need to be renamed to pmr#.branch#.snap.pax.Z.

# mv snap.pax.Z

F. SHUTDOWN

The shutdown command halts the operating system. Only a user with root user authority can run this command. Do not attempt to restart the system or turn off the system before the shutdown completion message is displayed; otherwise, file system damage can result.

Make sure you are on the correct server prior to entering shutdown command:

Enter: hostname

To shutdown and restart the system:

# shutdown –Fr

Other flags that could be used with the shutdown command are:

- h Halts the operating system completely.

-m Brings the system down to maintenance (single user) mode.

-d Brings the system down from a distributed mode to a multiuser mode.

-i Interactive mode. Displays interactive messages to guide the user through the shutdown.

The last command can be used to help determine when the system was last shut down.

# last shutdown

shutdown tty0 Feb 11 14:05

shutdown tty0 Feb 10 20:23

shutdown pts/1 Feb 04 07:08

G. HARDWARE ASSISTANCE

HOW TO RUN DIAGNOSTICS

The diag command is menu driven and is used to run diagnostics for a suspected problem.

# diag

Press to advance past the information screen.

Select Diagnostic Routines.

Select Problem Determination.

This instructs the diag command to test the system and analyze the error log.

You may run a diagnosis on a particular device by using the –d flag.

# diag –d (device name)

Display previous diagnostic results.

# cd /usr/lpp/diagnostics/bin

# ./diagrpt -o

Display all diagnostic result files logged since the data specified.

# /usr/lpp/diagnostics/bin/diagrpt –s 030705

This will list results for March 7, 2011.

Diagnostic result files are stored in /etc/lpp/diagnostics/data directory.

FINDING SYSTEM CONFIGURATION INFORMATION

Total physical memory in system

# bootinfo –r

Total number of processors in system

# lsdev –Cc processor (this will list each processor)

Display configuration, diagnostic, vital product data about system

# lscfg –vp
more

H. LOGS

The first place you should go when troubleshooting problems in AIX is the error report (errpt).

First run errpt without any options to get an overview of current errors:

# errpt
more

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

B6048838 0725140606 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED

B6048838 0725133506 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED

B6048838 0725122506 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED

B6048838 0724140106 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED

B6048838 0721033906 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED

B6267342 0721032506 P H hdisk1356 DISK OPERATION ERROR

B6267342 0721032506 P H hdisk1356 DISK OPERATION ERROR

B6267342 0721032506 P H hdisk1355 DISK OPERATION ERROR

B6267342 0721032506 P H hdisk1355 DISK OPERATION ERROR

To get the specifics associated with the IDENTIFIER:

# errpt -aj B6048838
more

---------------------------------------------------------------------------

LABEL: CORE_DUMP

IDENTIFIER: B6048838

Date/Time: Tue Jul 25 14:06:04 EDT

Sequence Number: 113629

Machine Id: 00283E9D4C00

Node Id: jrspa13t

Class: S

Type: PERM

Resource Name: SYSPROC

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes

SOFTWARE PROGRAM

User Causes

USER GENERATED SIGNAL

Recommended Actions

CORRECT THEN RETRY

Failure Causes

SOFTWARE PROGRAM

Recommended Actions

RERUN THE APPLICATION PROGRAM

IF PROBLEM PERSISTS THEN DO THE FOLLOWING

CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

SIGNAL NUMBER

6

USER'S PROCESS ID:

7540756

FILE SYSTEM SERIAL NUMBER

44

INODE NUMBER

1474687

PROCESSOR ID

16

CORE FILE NAME

/pac/brsmdp07/bea/app/user_projects/domains/collections/core

PROGRAM NAME

java

ADDITIONAL INFORMATION

abort E8

??

Symptom Data

REPORTABLE

You can display errors that were encountered during the last day by specifying a date in your search.

# date

Wed Feb 23 14:57:39 CST 2005

# errpt -a -s 0222145601
more

-a display information in a detailed format

-s display all records posted after the StartDate

Example: errpt -a -s (mmddhhmmyy) month, day, hour, minute, and year minus 24 hours

I. Installed Software Installation Info

How to determine the maintenance level of software:

# lslpp –l
more (This will list every fileset on the system)

# lslpp –l (Lists the state of a fileset)

# lslpp –L
grep (Easy way to get basic version info)

# lslpp –h (Displays when a fileset was installed)

TANTI TECHNOLOGIES

Tanti Technology

Monday, 18 February 2013

AIX GENERAL TROBULESHOOTING