Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products

RecoverPoint & RecoverPoint for Virtual Machines: Consistency Groups Swapping Between RPAs Due to Replication Process Crashes or Reboot Regulation

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



RecoverPoint and RecoverPoint for Virtual Machines Consistency Groups, (CGs,) can enter into a state where they swap between primary RecoverPoint Appliances, (RPAs,) because of numerous replication process crashes.  If the crashes are numerous enough, the RPAs can enter reboot regulation and detach from the RecoverPoint cluster, resulting in an RPA showing as down within the GUI.

Within the RPA replication logs, the DistributorPhase1 process will have a status of low credit (without enough available memory), causing an assertion to be witnessed:

201X/XX/XX 05:46:00.708 - #2 - 27302/26963 - MemoryManager: viscus on assert  ...  >> 1158731105004683264 :phase1#1 (groupTaskID=(sessionID=1407996198,replicationLinkID=  (kVolSlot=XXXXXXXXX,srcCopyID=GlobalCopy(SiteUID(0xXXXXXX) 0)  ,destCopyID=GlobalCopy(SiteUID(0xXXXXXXXXXXXXXXXX) 0) )),gridCopyID=0) using 0 credit 12463 min 512  max 13056 counter 269585 bound 282048 overld 275816 reachBound 0 standalone  ...  201X/XX/XX 05:46:00.713 - #2 - 27091/26963 - RemoteLogSender: got event (uniqueId=0, eventTime=1555998360713692),  EventID_KBOX_ASSERTION_FAILED(3031), SiteUID(0xXXXXXXXXXXXXXXX), seDetails=Sender=replication, Topic=DistributorGroupHandler,  msg=Assertion failed: isPhase1CacheMemorySufficient(m_phase1SubConsumer)  Line XXXX File DistributorGroupHandlerPhase1.cc PID: XXXXX Info: regular phase1 cache memory not sufficient 

Cause

When I/O is coming in at a high rate to the replica copy, (for example, during an initialization with extremely fast primary storage,) the Distributor's Phase1 memory allocation, used for moving I/O between the RPAs and the journal, can reach 100%. At the same time, there are additional I/O requests waiting in the queue to be processed. This can cause a RACE condition between freeing the utilized memory and requesting memory for the queued requests. When this occurs, the RPA's replication process can crash. During a first time initialization of a CG, this can lead to reboot regulation, as after every process crash, the same I/O rate will commence once again.

Resolution

Workaround:
1. Enable I/O Throttling to either Low or High on the Array(s) in question to limit how fast RecoverPoint will read I/O off the Production Array(s).
2. Attempt to initialize CGs sequentially, only attempting one or two at maximum to limit reading off the Production Array(s).

Resolution:

This issue is addressed in the RecoverPoint for Virtual Machines 5.1 and higher.
This issue is not addressed in RecoverPoint Classic.  Dell EMC Engineering is currently investigating this issue. A permanent fix is still in progress. Contact the Dell EMC Customer Support Center or your service representative for assistance and reference this solution ID.

Affected Products

RecoverPoint

Products

RecoverPoint, RecoverPoint for Virtual Machines
Article Properties
Article Number: 000168745
Article Type: Solution
Last Modified: 20 Nov 2020
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.