Start a Conversation

Unsolved

This post is more than 5 years old

996

February 10th, 2017 02:00

Upgrade from 1.32.2 to 2.0 -- Never ending rebuild

We have four nodes in our ScaleIO 1.3.2 deployment - (Server1-Primary, Server2-Secondary, Server3-TB, Server4).   They are physical Windows servers running Windows 2012 as the base OS.   Started an NDU from 1.3.2 to 2.0 -- everything started wonderfully.   In starting the upgrade we checked the option to reboot the server.   So far so good -- Server1 upgraded itself, automatically switched the MDM ownership to Server2 and proceeded to reboot.    Server1 rebooted, Server2 (now primary MDM) started a rebuild once Server1 came back on-line.  This seemed normal.   This rebuild took forever and never finished (we let it run 24 hours) -- it kept reducing the GB remaining but then would go right back up.  Taking some advice from this forum -- we changed MDM ownership back to Server1 thinking that would "reset" the rebuild.   Server1 now has been running rebuild for 24 hours too.

Strangely -- there is no alert for "degraded", the capacity doughnut isn't showing any degraded capacity.  Yet even with no alert, and no degraded capacity the ScaleIO deployment STILL is rebuilding endlessly.   Everything is still running on the storage -- so that is good, but we cannot continue the upgrade until the rebuild finishes.    Any advice??

Capture7.PNG.png

73 Posts

February 10th, 2017 08:00

In a case like this, I see that there is a rebalance that is also trying to run, but 0.0 KB/s. Have you seen the rebalance going at all, or is it constantly sitting at 0 KB/s as well?

For this, if you can, please open a case with support so they can help diagnose what is going on here. Thanks

February 10th, 2017 08:00

This is our development lab cluster -- so it is running on the FREE license (i.e. no support)   We did the same upgrade to our production environment and of course it upgraded with no issues whatsoever.    

February 10th, 2017 08:00

Yes I have seen it running, and I have even seen the rebalance FINISH.    The rebuild is weird too -- it always shows 1.8-2.0 TB "left" -- BUT

1. Sometimes there is NO alert indicating degraded.  Sometimes there is.

2. Sometimes the "capacity" donut shows perfect -- sometimes it shows a sliver of orange degraded.

February 10th, 2017 09:00

+-

See -- Rebuild still running.  And now the donut shows some degraded capacity.   This will clear after a while and the alert will clear, but the rebuild will never finish.

Capture9.PNG.png

5 Practitioner

 • 

274.2K Posts

February 11th, 2017 16:00

Just open the case with a low severity and specify that this is your testlab. Run a complete getinfo from the installation manager/gateway and attach the output to the ticket. I would love to take a look at it.

22 Posts

February 22nd, 2017 06:00

Hi, we have similar issue with never ending rebuild:

Try to disable Check sum protection on all your storage pools: Scaleio GUI: Backend -> (Order by storage pool) -> Selecto Configure Use Checksum, disable Checksum usage.

February 23rd, 2017 05:00

We were able to resolve the issue.    It ended up being caused by the fact that we were using the Install Website and we had failed to deploy/install OpenSSL as a prerequisite step.   The upgrade to 2.0 had gotten so far and had failed because OpenSSL wasn't there and the Install Website couldn't communicate with the other nodes.

The solution was to manually finish the upgrade by installing the remaining software on the rest of the nodes, and then using the CLI to verify and complete the ugprade.  Once we did this -- the rebuild finished normally.

P.S. In all of this -- ScaleIO never went down and never stopped working.   I love this software!

No Events found!

Top