When a system hangs
When a system hangs
Processes requesting additional memory are killed once the system runs low on paging space. The system appears hung as new processes and telnet connections are terminated. Error messages such as Not enough memory or Fork function failed are generated. There are three ways to resolve this situation.
1. Add additional paging space. To know how much paging space is "enough", use the lsps -s command often to get a feel for the %Used of the paging space. Based on this percentage, a system at its maximum workload should have no more than 80% of paging space used.
Example output of the command lsps -s looks like the following:
Total Paging Space Percent Used
200MB 51%
2. Systems often have plenty of paging space (sometimes 3-4 times RAM) and can still run out. This could be due to a memory leak. The question then is which process is causing the memory leak.
Discussed below are ways to find out what process is causing the memory leak and the tools used to accomplish this task.
a. The command ps vg provides useful information. In this case the
data in the column labeled SIZE is needed. The SIZE column reports virtual memory (paging space) usage on a per-process basis, in 1KB units.
Sample output from ps vg | pg looks like the following:
PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND
0 - A 87:42 6 20 8 xx 0 0 0.1 0.0 swapper
1 - A 191:58 94 240 240 xx 25 28 0.3 0.0 /etc/init
516 - A 70228:47 0 16 20 xx 0 0 97.0 0.0 kproc
774 - A 5:53 1 24 28 xx 0 0 0.0 0.0 kproc
1032 - A 28:40 0 56 56 xx 0 0 0.0 0.0 kproc
1866 - A 0:00 0 24 20 xx 0 0 0.0 0.0 kproc
2174 pts/1 A 2:55 31 420 544 32768 260 164 0.0 1.0 aixterm
2454 - A 1:32 62 272 224 xx 96 60 0.0 0.0 /usr/dt/b
Collect ps vg output at different instances throughout the period of time that %Used from lsps -s grows to 99%. The output can then be examined for large numerical increases from the SIZE column. This process would exhibit extraordinarily large increases in the amount of paging space it uses between the two ps vgreadings.
b. One could write a Kornshell script to collect this data and to do the comparison.
c. Another tool that can be used to track a memory leak is svmon.
NOTE: The fileset perfagent.tools must be installed in order use svmon (and other commands, such as tprof, netpmon, and filemon). To check if this is installed, enter: lslpp -1 perfagent.tools.
If you are at AIX Version 4.3.0 or higher, this file can be found on the AIX Base Operating System media on Volume 2 of the CD set.
As root, enter the following command:
svmon -Pu | more
This will list the top memory consumers in decreasing order, the first process being the largest consumer. The rest of the report shows memory and paging space usage for each segment of each process.
Sample output looks like the following:
Pid Command Inuse Pin Pgspace
13794 dtwm 1603 1 449
Pid: 13794
Command: dtwm
Segid Type Description Inuse Pin Pgspace Address Range
b23 pers /dev/hd2:24849 2 0 0 0..1
14a5 pers /dev/hd2:24842 0 0 0 0..2
6179 work lib data 131 0 98 0..891
280a work shared library text 1101 0 10 0..65535
181 work private 287 1 341 0..310:65277..65535
57d5 pers code,/dev/hd2:61722 82 0 0 0..135
In each process report, find items in the Type column identified as work and in the Description column identified as private, and check how many 4KB (4096-byte) pages are used under the Pgspace column. This is the minimum number of working pages this segment is using in all of virtual memory. A Pgspacenumber that grows but never decreases may indicate a memory leak.
3. The system may be reaching its Maximum number of PROCESSES allowed per user, or maxuproc. Depending on what maxuproc is set to (default is 40), if a user has already forked a number of processes equal to maxuproc, the system will not allow that user to fork any more processes.
The maxuproc parameter can be increased via SMIT. Enter SMIT and proceed in sequence through the panels System Environments and then Change / Show Characteristics of the Operating System. The first line on this last screen is maxuproc. Increasing this number by a conservative increment (50-100 at a time) allows users to fork more processes, thus avoiding any Out of memory or Cannot fork messages.
|
No comments:
Post a Comment