Cisco UCS B-Series CPU Upgrades – WILL_BOOT_FAULT

UCS is different then other server platforms, which sometimes makes simple maintenance tasks not as straightforward as you’d think. We had an truckload of CPU upgrades last week. A regular server admin would think; “Hey, I just take this old CPU out and put this new CPU in and bob’s my uncle!” – well, UCS might have a surprise for you.

One of my colleagues (a ‘traditional’ server guy) handled this replacement and was confronted with an error message when reinserting the blades: WILL_BOOT_FAULT (awesome description once again, Cisco). Considering the error message itself, they went looking for boot issues. (boot policy, fiberchannel zoning, LUN masking, LAN boot, etc).

All vendors of hardware additions usually supply a manual with your part. But, who reads the manual of a memory DIMM on how to properly insert the DIMM, right? Well, it seems it’s beneficial to briefly scan the Cisco UCS manuals for procedures that are not traditional. In this case the manual pointed to a procedure to fix this error message with a forced firmware sync to the CIMC and reset it.

Open up a SSH session to your UCSM and apply this:

UCS-A # scope server 1/1 (chassis 1 blade 1)
UCS-A /chassis/server # scope boardcontroller
UCS-A /chassis/server/boardcontroller # show image (look for the latest, currently: 11.0)
UCS-A /chassis/server/boardcontroller # activate firmware 11.0 force 
UCS-A /chassis/server/boardcontroller* # commit-buffer

Watch the FSM after this, the server will report when it’s done synchronising. There’s a small chance the server will report “OK” after this, but reset the CIMC anyway, just to be sure. Reset the CIMC using this procedure:

UCS-A /chassis/server/boardcontroller # exit
UCS-A /chassis/server # scope CIMC
UCS-A /chassis/server/cimc # reset
UCS-A /chassis/server/cimc* # commit-buffer

(or, if you prefer use the GUI: Server context -> Recover Server -> Reset CIMC)

After a few minutes, your server will be able to boot again.



Share the wealth!

3 Comments

  1. Darren Finch

    June 5, 2017 at 20:23

    Found I needed to do the above and update the BIOS for this to work – took ages to complete discovery and association but the blade came back eventually.

    Thanks for this, it helped

  2. Thanks, it helped. !

  3. Wow, thank you. We just had to replace CPUs in a couple of our blades to match up with other blades in the cluster and started to think we had a bad CPUs. Had already looked at firmware and saw the blade was running version 16 so assumed it was really good to go with the E5 v2’s. I had thought I should try to force firmware updates to it from the GUI but it didn’t seem to take. Really glad it was so straight forward from terminal!

    Thanks again!

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Lostdomain

Theme by Anders NorénUp ↑