UCS is different then other server platforms, which sometimes makes simple maintenance tasks not as straightforward as you’d think. We had an truckload of CPU upgrades last week. A regular server admin would think; “Hey, I just take this old CPU out and put this new CPU in and bob’s my uncle!” – well, UCS might have a surprise for you.
One of my colleagues (a ‘traditional’ server guy) handled this replacement and was confronted with an error message when reinserting the blades: WILL_BOOT_FAULT (awesome description once again, Cisco). Considering the error message itself, they went looking for boot issues. (boot policy, fiberchannel zoning, LUN masking, LAN boot, etc).
All vendors of hardware additions usually supply a manual with your part. But, who reads the manual of a memory DIMM on how to properly insert the DIMM, right? Well, it seems it’s beneficial to briefly scan the Cisco UCS manuals for procedures that are not traditional. In this case the manual pointed to a procedure to fix this error message with a forced firmware sync to the CIMC and reset it.
Open up a SSH session to your UCSM and apply this:
UCS-A # scope server 1/1 (chassis 1 blade 1) UCS-A /chassis/server # scope boardcontroller UCS-A /chassis/server/boardcontroller # show image (look for the latest, currently: 11.0) UCS-A /chassis/server/boardcontroller # activate firmware 11.0 force UCS-A /chassis/server/boardcontroller* # commit-buffer
Watch the FSM after this, the server will report when it’s done synchronising. There’s a small chance the server will report “OK” after this, but reset the CIMC anyway, just to be sure. Reset the CIMC using this procedure:
UCS-A /chassis/server/boardcontroller # exit UCS-A /chassis/server # scope CIMC UCS-A /chassis/server/cimc # reset UCS-A /chassis/server/cimc* # commit-buffer
(or, if you prefer use the GUI: Server context -> Recover Server -> Reset CIMC)
After a few minutes, your server will be able to boot again.