Quantcast
Viewing latest article 4
Browse Latest Browse All 5

Using the Simplified Remote Restart capability on Power8 Scale Out Servers

A few weeks ago I had to work on simplified remote restart. I’m not lucky enough yet -because of some political decisions in my company- to have access to any E880 or E870. We just have a few scale-out machines to play with (S814). For some critical applications we need in the future to be able to reboot the virtual machine if the system hosting the machine has failed (Hardware problem). We decided a couple of month ago not to use remote restart because it was mandatory to use a reserved storage pool device and it was too hard to manage because of this mandatory storage. We now have enough P8 boxes to try and understand the new version of remote restart called simplified remote restart which does not need any reserved storage pool device. For those who want to understand what remote restart is I strongly recommend you to check my previous blog post about remote restart on two P7 boxes: Configuration of a remote restart partition. For the others here is what I learned about the simplified version of this awesome feature.

Please keep in mind that the FSP of the machine must be up to perform a simplified remote restart operation. It means that if for instance you loose one of your datacenter or the link between your two datacenters you cannot use simplified remote restart to restart you partitions on the main/backup site. Simplified Remote Restart only prevents you from an hardware failure of your machine. Maybe this will change in a near future but for the moment it is the most important thing to understand about simplified remote restart.

Updating to the latest version of firmware

I was very surprised when I got my Power8 machines. After deploying these boxes I decided to give a try to simplified remote restart but It was just not possible. Since the Power8 Scale Out servers were release they were NOT simplified remote restart capable. The release of the SV830 firmware now enables the Simplified Remote restart on Power8 Scale Out machines. Please note that there is nothing about it in the patch note, so chmod666.org is the only place where you can get this information :-). Here is the patch note: here. Last word you will find on the internet that you need Power8 to use simplified remote restart. It’s true but partially true. YOU NEED A P8 MACHINE WITH AT LEAST A 820 FIRMWARE.

The first thing to do is to update your firmware to the SV830 version (on both systems participating in the simplified remote restart operation):

# updlic -o u -t sys -l latest -m p814-1 -r mountpoint -d /home/hscroot/SV830_048 -v
[..]
# lslic -m p814-1 -F activated_spname,installed_level,ecnumber
FW830.00,48,01SV830
# lslic -m p814-2 -F activated_spname,installed_level,ecnumber
FW830.00,48,01SV830

You can check the firmware version directly from the Hardware Management Console or in the ASMI:

Image may be NSFW.
Clik here to view.
fw1

Image may be NSFW.
Clik here to view.
fw3

After the firmware upgrade verify that you now have the Simplfied Remote Restart capability set to true.

Image may be NSFW.
Clik here to view.
fw2

# lssyscfg -r sys -F name,powervm_lpar_simplified_remote_restart_capable
p720-1,0
p814-1,1
p720-2,0
p814-2,1

Prerequisites

These prerequisites are true ONLY for Scale out systems:

  • To update to the firmware SV830_048 you need the latest Hardware Management Console release which is v8r8.3.0 plus MH01514 PTF.
  • Obviously on Scale out system SV830_048 is the minimum firmware requirement.
  • Minimum level of Virtual I/O Servers is 2.2.3.4 (for both source and destination systems).
  • PowerVM enterprise. (to be confirmed)

Enabling simplified remote restart of an existing partition

You probably want to enable simplified remote restart after an LPM migration/evacuation. After migrating your virtual machine(s) to a Power 8 with the Simplified Remote Restart Capability you have to enable this capability on all the virtual machines. This can only be done when the machine is shutdown, so you first have to stop the virtual machines (after a live partition mobility move) if you want to enable the SRR. It can’t be done without having to reboot the virtual machine:

  • List current partition running on the system and check which one are “simplified remote restart capable” (here only one is simplified remote restart capable):
# lssyscfg -r lpar -m p814-1 -F name,simplified_remote_restart_capable
vios1,0
vios2,0
lpar1,1
lpar2,0
lpar3,0
lpar4,0
lpar5,0
lpar6,0
lpar7,0
  • For each lpar not simplified remote restart capable change the simplified_remote_restart_capable attribute using the chssyscfg command. Please note that you can’t do this using the Hardware Management Console gui (in the latest 8r8.3.0, when enabling it by the Hardware management console the GUI is telling you that you need a reserved device storage which is needed by the Remote Restart Capability and not by the simplified version of remote restart. You have to use the command line ! (check screenshot below)
  • You can’t change this attribute while the machine is running:
  • Image may be NSFW.
    Clik here to view.
    gui_change_to_srr

  • You can’t do it with the GUI after the machine is shutdown:
  • Image may be NSFW.
    Clik here to view.
    gui_change_to_srr2

    Image may be NSFW.
    Clik here to view.
    gui_change_to_srr3

  • The only way to enable this attribute is to do it by using the Hardware Management Console command line (please note in the output below that running lpar cannot be changed):
  • # for i in lpar2 lpar3 lpar4 lpar5 lpar6 lpar7 ; do chsyscfg -r lpar -m p824-2 -i "name=$i,simplified_remote_restart_capable=1" ; done
    An error occurred while changing the partition named lpar6.
    HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
    An error occurred while changing the partition named lpar7.
    HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
    # lssyscfg -r lpar -m p824-1 -F name,simplified_remote_restart_capable,lpar_env | grep -v vioserver
    lpar1,1,aixlinux
    lpar2,1,aixlinux
    lpar3,1,aixlinux
    lpar4,1,aixlinux
    lpar5,1,aixlinux
    lpar6,0,aixlinux
    lpar7,0,aixlinux
    

    Remote restarting

    If you are trying to do a live partition mobility operation back to a P7 or P8 box without the simplified remote restart capability it will not be possible. Enabling the simplified remote restart will force the virtual machine to stay on P8 boxes with simplified remote restart capability. This is one of the reason why most of customers are not doing it:

    # migrlpar -o v -m p814-1 -t p720-1 -p lpar2
    Errors:
    HSCLB909 This operation is not allowed because managed system p720-1 does not support PowerVM Simplified Partition Remote Restart.
    

    Image may be NSFW.
    Clik here to view.
    lpm_not_capable_anymore

    On the Hardware Management Console you can see that the virtual machine is simplified remote restart capable by checking its properties:

    Image may be NSFW.
    Clik here to view.
    gui_change_to_srr4

    You can now try to remote restart your virtual machines to another server. As always the status of the server has to be different from Operating (Power Off, Error, Error – Dump in progress, Initializing). As always my advice is to validate before restarting:

    # rrstartlpar -o validate -m p824-1 -t p824-2 -p lpar1
    # echo $?
    0
    # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
    HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
    
    # lssyscfg -r sys -F name,state
    p824-2,Operating
    p824-1,Power Off
    # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
    

    By doing a remote restart operation the machine will boot automatically. You can check in the errpt that in most cases the partition ID will be changed (proving that you are on another machine):

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    A6DF45AA   0618170615 I O RMCdaemon      The daemon is started.
    1BA7DF4E   0618170615 P S SRC            SOFTWARE PROGRAM ERROR
    CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
    CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
    D872C399   0618170615 I O sys0           Partition ID changed and devices recreat
    

    Be very careful with the ghostdev sys0 attribute. Every VM remote restarted needs to have ghostdev set to 0 to avoid an ODM wipe (If you remote restart an lpar with ghostdev set to 1 you will loose all ODM customization)

    # lsattr -El sys0 -a ghostdev
    ghostdev 0 Recreate ODM devices on system change / modify PVID True
    

    When the source machine is up and running you have to clean the old definition of the remote restarted lpar by launching a cleanup operation. This will wipe the old lpar defintion:>

    # rrstartlpar -o cleanup -m p814-1 -p lpar1
    

    The RRmonitor (modified version)

    There is a script delivered by IBM called rrMonitor, this one is looking at the PowerSystem‘s state and if this one is in particular state is restarting a specific virtual machine. This script is just not usable by a user because it has to be executed directly on the HMC (you need a pesh password to put the script on the hmc) and is only checking one particular virtual machine. I had to modify this script to ssh to the HMC and then check for every lpar on the machine and not just one in particular. You can download my modified version here : rrMonitor. Here is what’s the script is doing:

    • Checking the state of the source machine.
    • If this one is not “Operating”, the script search for every remote restartable lpars on the machine.
    • The script is launching remote restart operations to remote restart all the partitions.
    • The script is telling the user the command to cleanup the old lpar when the source machine will be running again.
    # ./rrMonitor p814-1 p814-2 all 60 myhmc
    Getting remote restartable lpars
    lpar1 is rr simplified capable
    lpar1 rr status is Remote Restartable
    lpar2 is rr simplified capable
    lpar2 rr status is Remote Restartable
    lpar3 is rr simplified capable
    lpar3 rr status is Remote Restartable
    lpar4 is rr simplified capable
    lpar4 rr status is Remote Restartable
    Checking for source server state....
    Source server state is Operating
    Checking for source server state....
    Source server state is Operating
    Checking for source server state....
    Source server state is Power Off In Progress
    Checking for source server state....
    Source server state is Power Off
    It's time to remote restart
    Remote restarting lpar1
    Remote restarting lpar2
    Remote restarting lpar3
    Remote restarting lpar4
    Thu Jun 18 20:20:40 CEST 2015
    Source server p814-1 state is Power Off
    Source server has crashed and hence attempting a remote restart of the partition lpar1 in the destination server p814-2
    Thu Jun 18 20:23:12 CEST 2015
    The remote restart operation was successful
    The cleanup operation has to be executed on the source server once the server is back to operating state
    The following command can be used to execute the cleanup operation,
    rrstartlpar -m p814-1 -p lpar1 -o cleanup
    Thu Jun 18 20:23:12 CEST 2015
    Source server p814-1 state is Power Off
    Source server has crashed and hence attempting a remote restart of the partition lpar2 in the destination server p814-2
    Thu Jun 18 20:25:42 CEST 2015
    The remote restart operation was successful
    The cleanup operation has to be executed on the source server once the server is back to operating state
    The following command can be used to execute the cleanup operation,
    rrstartlpar -m sp814-1 -p lpar2 -o cleanup
    Thu Jun 18 20:25:42 CEST 2015
    [..]
    

    Conclusion

    As you can see the Simplified version of the remote restart feature is simpler that the normal one. My advice is to create all your lpars with the simplified remote restart attribute. It’s that easy :). If you plan to LPM back to P6 or P7 box, don’t use simplified remote restart. I think this functionality will become more popular when all the old P7 and P6 will be replaced by P8. As always I hope it helps.

    Here are a couple of link with great documentations about Simplified Remote Restart:

    • Simplified Remote Restart Whitepaper: here
    • Original rrMonitor: here
    • Materials about lastest HMC release and a couple of videos related to the Simplified Remote Restart: here

    Viewing latest article 4
    Browse Latest Browse All 5

    Trending Articles