Tanti Technology

My photo
Bangalore, karnataka, India
Multi-platform UNIX systems consultant and administrator in mutualized and virtualized environments I have 4.5+ years experience in AIX system Administration field. This site will be helpful for system administrator in their day to day activities.Your comments on posts are welcome.This blog is all about IBM AIX Unix flavour. This blog will be used by System admins who will be using AIX in their work life. It can also be used for those newbies who want to get certifications in AIX Administration. This blog will be updated frequently to help the system admins and other new learners. DISCLAIMER: Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility. If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.

Friday 18 October 2019

DLPAR problem



DLPAR  (Dynamic Logical Partitioning) problems

DLPAR or Dynamic Logical Partitioning problems

DLPAR provides users the ability to dynamically add, remove or modify LPAR resources such as memory, CPU, or I/O devices.

The most common problem with DLPAR operations is related to RMC (Resource Monitoring and Control). Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:

HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings

To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:

# lspartition -dlpar
The output will show you any logical partitions which are ready for DLPAR operations. If there is no output at all, that means that either there are no LPARs which can communicate with HMC via network, or there is some problem with the HMC itself. If you suspect the HMC, try with rebooting it. Experience shows that lots of strange problems associated with HMC can be solved just by reboot.

The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:

<#1> Partition:<1 10.30.23.15="" hostname="">
Active:<1>, OS:, DCaps:<0x4ebf>, CmdCaps:<0x1b 0x1b="">, PinnedMem:<384>
You should check if DCaps value is higher than 0x0 and active value is higher than 0. If it is not, perform the next steps from the LPAR you are trying to perform DLPAR operations.

Check the RMC connection to HMC using the following command:

# lsrsrc IBM.ManagementServer
If you are using AIX 7.1 type the following command instead:

# lsrsrc IBM.MCP
You should see something like this:

Resource Persistent Attributes for IBM.MCP
resource 1:
        MNName            = "10.30.23.15"
        NodeID            = 18194515442147552355
        KeyToken          = "hmc.localdomain"
        IPAddresses       = {"10.30.23.15"}
        ConnectivityNames = {"10.30.23.15"}
        HMCName           = "7042CR4*XXXXXXX"
        HMCIPAddr         = "10.30.23.10"
        HMCAddIPs         = "192.168.128.1"
        HMCAddIPv6s       = ""
        ActivePeerDomain  = ""
        NodeNameList      = {"Test"}
If you can see information about the HMC, it’s a good sign; if not, check the status of the main daemon IBM.DRM used for dynamic logical partitioning:

# lssrc -g rsct_rm
Subsystem         Group            PID          Status
 IBM.ServiceRM    rsct_rm                       inoperative
 IBM.DRM          rsct_rm                       inoperative
 IBM.ERRM         rsct_rm                       inoperative
 IBM.AuditRM      rsct_rm                       inoperative
 IBM.MgmtDomainRM rsct_rm                       inoperative
 IBM.HostRM       rsct_rm                       inoperative

If it is in inoperative state, you can restart it with the following commands (note that sometimes, especially in AIX 7, this daemon is not active all the time but only when needed):

# /usr/sbin/rsct/bin/rmcctrl –z  
à Stops the RMC subsystem and all resource managers, but the command does not return until the RMC subsystem and the resource managers are actually stopped.
# /usr/sbin/rsct/bin/rmcctrl –A  
à Adds the RMC subsystem to the subsystem object class and starts the RMC subsystem
# /usr/sbin/rsct/bin/rmcctrl –p  à  to enable remote client connections.

Check its status again:

# lssrc -a | grep rsct
 ctrmc            rsct             5374166      active
 IBM.HostRM       rsct_rm          14156014     active
 IBM.ServiceRM    rsct_rm          5439716      active
 IBM.MgmtDomainRM rsct_rm          9240692      active
 IBM.DRM          rsct_rm          17301604     active
 ctcas            rsct                          inoperative
 IBM.ERRM         rsct_rm                       inoperative
 IBM.AuditRM      rsct_rm                       inoperative


If the above does not change the output of lspartition -dlpar, you can try to reconfigure the RMC by using recfgct command. Basically this command recreates the RMC connection.


Before using the recfgct command make sure that your server is not part of CMS or GPFS cluster because it could bring you more trouble than non-working DLPAR.

The full path of recfgct command is:

# /usr/sbin/rsct/install/bin/recfgct
Wait 5 to 10 minutes and check the RMC deamon again:

# lssrc -g rsct_rm
....
# lsrsrc IBM.ManagementServer
or

# lsrsrc IBM.MCP





Note:

When you change system resources dynamically do not forget to modify your LPAR profile accordingly, since at next boot system resources will be assigned according to the profile (which is not affected when performing DLPAR functions).