Technote (troubleshooting)
Problem(Abstract)
Determining memory leaks
Symptom
The AIX system will appear to be hung, or the "0403-031 Fork function failed" error will be generated.
CauseProcesses requesting additional memory are killed once the system runs low on paging space. The system appears hung as new processes and telnet connections are terminated. Error messages such as 'Not enough memory' or 'Fork function failed' errors are generated.
Environment
AIX
Diagnosing the problem
The following AIX commands are used to monitor processes' real memory and paging space usage:
When a system hangs, there are three ways to resolve this situation.
1. Add additional paging space. To know how much paging space is "enough", use the lsps -s command often to get a feel for the Percent Used of the paging space. Based on this percentage, a system at its maximum workload should have no more than 80% of paging space used. Example output of the command lsps -s looks like the following:
1. Add additional paging space. To know how much paging space is "enough", use the lsps -s command often to get a feel for the Percent Used of the paging space. Based on this percentage, a system at its maximum workload should have no more than 80% of paging space used. Example output of the command lsps -s looks like the following:
#lsps -s
Total Paging Space Percent Used
5376MB 1%
#
|
2. Systems often have plenty of paging space (sometimes 3-4 times RAM) and can still run out. This could be due to a memory leak. The question then is which process is causing the memory leak. Discussed below are ways to find out what process is causing the memory leak and the tools used to accomplish this task.
a. The command ps vg provides useful information. In this case the data in the column labeled SIZE is needed. The SIZE column reports virtual memory (paging space) usage on a per-process basis, in 1KB units. Sample output from ps vg | pg looks like the following:
a. The command ps vg provides useful information. In this case the data in the column labeled SIZE is needed. The SIZE column reports virtual memory (paging space) usage on a per-process basis, in 1KB units. Sample output from ps vg | pg looks like the following:
Collect ps vg output at different instances throughout the period of time that Percent Used from lsps -s grows to 99%. The output can then be examined for large numerical increases from the SIZE column. This process would exhibit extraordinarily large increases in the amount of paging space it uses between the many ps vg readings.
b. One could write a Kornshell script to collect this data and to do the comparison.
c. Another tool that can be used to track a memory leak is svmon.
NOTE: The fileset bos.perf.tools (on AIX 5.2 and above) must be installed in order to use svmon. To check to see if this fileset is installed, enter:
lslpp -l bos.perf.tools
As root, enter the following command:
b. One could write a Kornshell script to collect this data and to do the comparison.
c. Another tool that can be used to track a memory leak is svmon.
NOTE: The fileset bos.perf.tools (on AIX 5.2 and above) must be installed in order to use svmon. To check to see if this fileset is installed, enter:
lslpp -l bos.perf.tools
As root, enter the following command:
svmon -Pu | more
This will list the top memory consumers in decreasing order, the first process being the largest consumer. The rest of the report shows memory and paging space usage for each segment of each process. Sample output looks like the following:
In each process report, find items in the Type column identified as "work", and in the Description column identified as "process private", and check how many 4KB (4096-byte) pages are used under the Pgspace column. This is the minimum number of working pages this segment is using in all of virtual memory. A Pgspace number that grows but never decreases may indicate a memory leak.
*Note: To truncate 'svmon -Pu' output (to make it more user friendly):
- On AIX 5.2.0 and AIX 5.3.0:svmon -Pu|grep -p Pid|grep -v Pin|grep -v "^-"|sort +3
- On AIX 6.1.0:svmon -P -O summary=basic,unit=MB
d. Another method than can be used to determine a memory leak is to compare 'svmon -Pu' & 'svmon -Sg' output:
- Run the following commands:
svmon -Pu > /chdev -l sys0 -a maxuproc=
The first line on this last screen is maxuproc. Increasing this number by a conservative increment (50-100 at a time) allows users to fork more processes, thus avoiding any "Out of memory" or "Cannot fork" messages.
This will list the top memory consumers in decreasing order, the first process being the largest consumer. The rest of the report shows memory and paging space usage for each segment of each process. Sample output looks like the following:
In each process report, find items in the Type column identified as "work", and in the Description column identified as "process private", and check how many 4KB (4096-byte) pages are used under the Pgspace column. This is the minimum number of working pages this segment is using in all of virtual memory. A Pgspace number that grows but never decreases may indicate a memory leak.
*Note: To truncate 'svmon -Pu' output (to make it more user friendly):
- On AIX 5.2.0 and AIX 5.3.0:svmon -Pu|grep -p Pid|grep -v Pin|grep -v "^-"|sort +3
- On AIX 6.1.0:svmon -P -O summary=basic,unit=MB
d. Another method than can be used to determine a memory leak is to compare 'svmon -Pu' & 'svmon -Sg' output:
- Run the following commands:
svmon -Pu > /chdev -l sys0 -a maxuproc=
The first line on this last screen is maxuproc. Increasing this number by a conservative increment (50-100 at a time) allows users to fork more processes, thus avoiding any "Out of memory" or "Cannot fork" messages.
No comments:
Post a Comment