Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

4647

May 4th, 2017 08:00

Any issues with XtremIO upgrades from 4.0.15-20 to 4.0.15-24?

I am asking because I want to know if anyone else has experienced disruptive upgrades specifically from 4.0.15-20  to 4.0.15-24?

I am trying to figure out if I should be asking the community before proceeding with upgrades. 

I have a support case open and I am sure DELL/EMC will have answers soon on what cause the issue, but I would like to avoid issues when it comes to affecting production.

June 27th, 2017 07:00

The XtremIO upgrade went well last week, All AIX servers involved stayed up.  Successful NDU! The fix is below. Customer Upgrade Preparation Guide - XtremIO  EMC added the below line. * Is using AIX, please review KB491002 prior to NDU. KB491002 Referenced the IBM fix. * Apply the IBM Authorized Program Analysis Report (APAR) mentioned in IBM IV84862 - Improve Handling of Aborted Commands on the host side. Note the fix has been rolled into a service pack.  My AIX systems that made it through the Successful NDU were running the below AIX OS level. # oslevel -s 7100-04-03-1642

727 Posts

May 8th, 2017 08:00

4.0.15-20 to 4.0.15-24 is an NDU (non disruptive upgrade) process. Dell EMC Support will run pre-upgrade checks in your environment to make sure we are aware of any potential issues before the NDU process starts.

May 8th, 2017 09:00

Hi,

We have got this upgrade done (same versions) as there was an advisory recommending upgrade. We didn't faced any issues. Maybe EMC can suggest more on this.

May 8th, 2017 09:00

Thank you, the above is great news, glad to hear your upgrade went smoothly.

May 8th, 2017 09:00


Avi, thank you for your comment, and yes my earlier updates were NDU, but the 4.0.15-20 to 4.0.15-24 was NOT a NDU.

EMC did all the pre-checking and everything passed, but a production database crashed during the update.

I will update this post when I get the RCA results, because many things can cause issues during an update.

In my case the previous update was done less than 90 days earlier without issues.

I believe my hosts were all configured correctly, so that is why I was asking the community if anyone else has experienced issue with the specified version jump from 4.0.15-20 to 4.0.15-24.

I was very surprised the upgrade had issues because the previous upgrade had no issues at all.

64 Posts

May 12th, 2017 00:00

4.0.15-20 to 4.0.15-24 is about as simple an upgrade as they come.  There are no "firmware" changes in this version, so there's no need to reboot any of the storage controllers - just a quick blip as we reload the new XIOS code, and that's it.

This is one additional step that the person carrying out the upgrade will do due to the fact that there wasn't an reboot, but that's completely transparent.

There's certainly no expectation of any problems for this (or any other) upgrades. Most of the times we see issues during upgrades it's down to things like multipathing or timeouts not being set correctly on the host, but I'm sure support will be working with you to try and work out exactly what went wrong in this case and get it fixed.  We're actually working on a set of scripts that will validate the host-sided configuration before an upgraded (or at any other time) to help avoid such issues - the one for VMware is in final testing, and (physical) Windows and Linux will follow shortly.

May 12th, 2017 09:00

Hi Mdeitrick  and Scotthoward,  thank you for your input. 

EMC Support suggested I open a support case with the switch vender.  Which I have done and that investigation is taking place.  No root cause identified yet, but we are still digging. 

May 17th, 2017 07:00

Just an update, the switch vender did not find any FC switch issues shortly before, during, and after the upgrade.


May 17th, 2017 12:00

mdeitrick,

EMC support just requested I have the vender check for flapping on the port. No flapping was detected.

Regarding SR #

Service Request Number

07017682

Former Service Request Number

85818360

May 24th, 2017 06:00

I am still waiting for official RCA.  Preliminary info that was provided.

Referenced two possible reasons for the Outage.

The AIX fix which public knowledge, I just did not know about it.

Make sure your  AIX systems have IBM IV87492: IMPROVE HANDLING OF ABORTED COMMANDS APPLIES TO AIX 7100-04 - United States  installed.

The other possibility that EMC referenced, was an XtremIO bug, but since the preliminary info has EMC confidential all over it, I will let someone from EMC disclose EMC bug issue and details.


June 20th, 2017 07:00

EMC Final RCA indicates the issue was the missing AIX patch.  I have a XtremIO upgrade scheduled this week.  The upgrade scheduled this week is a bigger jump in code  4.0.2-80 to 4.0.15-24.  I have the AIX patch applied to the AIX servers.

No Events found!

Top