DLPAR (Dynamic Logical Partitioning) problems
DLPAR or Dynamic Logical Partitioning problems
DLPAR provides users the ability to dynamically add, remove or modify LPAR resources such as memory, CPU, or I/O devices.
The most common problem with DLPAR operations is related to RMC (Resource Monitoring and Control). Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:
HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings
To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:
# lspartition -dlpar
The output will show you any logical partitions which are ready for DLPAR operations. If there is no output at all, that means that either there are no LPARs which can communicate with HMC via network, or there is some problem with the HMC itself. If you suspect the HMC, try with rebooting it. Experience shows that lots of strange problems associated with HMC can be solved just by reboot.
The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:
<#1> Partition:<1 10.30.23.15="" hostname="">
Active:<1>, OS:, DCaps:<0x4ebf>, CmdCaps:<0x1b 0x1b="">, PinnedMem:<384>
You should check if DCaps value is higher than 0x0 and active value is higher than 0. If it is not, perform the next steps from the LPAR you are trying to perform DLPAR operations.
Check the RMC connection to HMC using the following command:
# lsrsrc IBM.ManagementServer
If you are using AIX 7.1 type the following command instead:
# lsrsrc IBM.MCP
You should see something like this:
Resource Persistent Attributes for IBM.MCP
resource 1:
MNName = "10.30.23.15"
NodeID = 18194515442147552355
KeyToken = "hmc.localdomain"
IPAddresses = {"10.30.23.15"}
ConnectivityNames = {"10.30.23.15"}
HMCName = "7042CR4*XXXXXXX"
HMCIPAddr = "10.30.23.10"
HMCAddIPs = "192.168.128.1"
HMCAddIPv6s = ""
ActivePeerDomain = ""
NodeNameList = {"Test"}
If you can see information about the HMC, it’s a good sign; if not, check the status of the main daemon IBM.DRM used for dynamic logical partitioning:
# lssrc -g rsct_rm
Subsystem Group PID Status
IBM.ServiceRM rsct_rm inoperative
IBM.DRM rsct_rm inoperative
IBM.ERRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
IBM.MgmtDomainRM rsct_rm inoperative
IBM.HostRM rsct_rm inoperative 384>0x1b>0x4ebf> 1>1>
DLPAR provides users the ability to dynamically add, remove or modify LPAR resources such as memory, CPU, or I/O devices.
The most common problem with DLPAR operations is related to RMC (Resource Monitoring and Control). Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:
HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings
To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:
# lspartition -dlpar
The output will show you any logical partitions which are ready for DLPAR operations. If there is no output at all, that means that either there are no LPARs which can communicate with HMC via network, or there is some problem with the HMC itself. If you suspect the HMC, try with rebooting it. Experience shows that lots of strange problems associated with HMC can be solved just by reboot.
The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:
<#1> Partition:<1 10.30.23.15="" hostname="">
Active:<1>, OS:
You should check if DCaps value is higher than 0x0 and active value is higher than 0. If it is not, perform the next steps from the LPAR you are trying to perform DLPAR operations.
Check the RMC connection to HMC using the following command:
# lsrsrc IBM.ManagementServer
If you are using AIX 7.1 type the following command instead:
# lsrsrc IBM.MCP
You should see something like this:
Resource Persistent Attributes for IBM.MCP
resource 1:
MNName = "10.30.23.15"
NodeID = 18194515442147552355
KeyToken = "hmc.localdomain"
IPAddresses = {"10.30.23.15"}
ConnectivityNames = {"10.30.23.15"}
HMCName = "7042CR4*XXXXXXX"
HMCIPAddr = "10.30.23.10"
HMCAddIPs = "192.168.128.1"
HMCAddIPv6s = ""
ActivePeerDomain = ""
NodeNameList = {"Test"}
If you can see information about the HMC, it’s a good sign; if not, check the status of the main daemon IBM.DRM used for dynamic logical partitioning:
# lssrc -g rsct_rm
Subsystem Group PID Status
IBM.ServiceRM rsct_rm inoperative
IBM.DRM rsct_rm inoperative
IBM.ERRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
IBM.MgmtDomainRM rsct_rm inoperative
IBM.HostRM rsct_rm inoperative
If it is in inoperative state, you can restart it with the following commands (note that sometimes, especially in AIX 7, this daemon is not active all the time but only when needed):
# /usr/sbin/rsct/bin/rmcctrl –z à Stops the RMC subsystem and all resource managers, but the command does not return until the RMC subsystem and the resource managers are actually stopped.
# /usr/sbin/rsct/bin/rmcctrl –A à Adds the RMC subsystem to the subsystem object class and starts the RMC subsystem
# /usr/sbin/rsct/bin/rmcctrl –p à to enable remote client connections.
Check its status again:
# lssrc -a | grep rsct
ctrmc rsct 5374166 active
IBM.HostRM rsct_rm 14156014 active
IBM.ServiceRM rsct_rm 5439716 active
IBM.MgmtDomainRM rsct_rm 9240692 active
IBM.DRM rsct_rm 17301604 active
ctcas rsct inoperative
IBM.ERRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
If the above does not change the output of lspartition -dlpar, you can try to reconfigure the RMC by using recfgct command. Basically this command recreates the RMC connection.
Before using the recfgct command make sure that your server is not part of CMS or GPFS cluster because it could bring you more trouble than non-working DLPAR.
The full path of recfgct command is:
# /usr/sbin/rsct/install/bin/recfgct
Wait 5 to 10 minutes and check the RMC deamon again:
# lssrc -g rsct_rm
....
# lsrsrc IBM.ManagementServer
or
# lsrsrc IBM.MCP
The full path of recfgct command is:
# /usr/sbin/rsct/install/bin/recfgct
Wait 5 to 10 minutes and check the RMC deamon again:
# lssrc -g rsct_rm
....
# lsrsrc IBM.ManagementServer
or
# lsrsrc IBM.MCP
Note:
When you change system resources dynamically do not forget to modify your LPAR profile accordingly, since at next boot system resources will be assigned according to the profile (which is not affected when performing DLPAR functions).
No comments:
Post a Comment