Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products

RecoverPoint for Virtual Machines: Consistency Groups in Error State

Summary: RecoverPoint for Virtual Machines: Consistency Groups in Error State, or pause/init/error loops

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



Consistency groups in Error state, or pause/init/error loops.

Sample state from get_group_state command:
  CG:
    Enabled: YES
    Transfer source: Source
    Copy:
      Target:
        Enabled: YES
        Regulation Status: REGULATED
        Active primary RPA: RPA 1
        Journal: LONG RESYNC
        Storage access: NO ACCESS
        Max journal size: 1.09 TB
      Source:
        Enabled: YES
        Active primary RPA: RPA 1
        Storage access: DIRECT ACCESS (marking data)
    Link:
      Source->Target:
        Data Transfer: ERROR

Events in GUI or CLI get_events_log:

  Time:                 Mon Jan 16 15:29:30 2017
  Topic:                GROUP
  Scope:                DETAILED
  Level:                ERROR
  Event ID:             4009
  Cluster:              Target_Site_vRPA
  Global links:         None
  Groups:               [CG, CG_Copy]
  Links:                [CG, CG_Prod->CG_Copy]
  Summary:              Pausing data transfer for group
  Details:              Reason=distributor error.

  Time:                 Mon Jan 16 15:31:03 2017
  Topic:                GROUP
  Scope:                DETAILED
  Level:                WARNING
  Event ID:             4001
  Cluster:              Source_Site
  Global links:         None
  Groups:               [CG, CG_Prod]
  Summary:              Minor problem in group capabilities
  Details:              Copies are linked.

RPA1:
Marking (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = Yes - side defined as source. Site=Source_Site.
Source backlog mirroring (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = NOT NEEDED
Transfer (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = NO - can't maintain history - not paused on snapshot and  user volume problem - Volume issue. Site=Target_Site, RPA1, Device=[CG, CG_Copy, CG_RSET_CG_3_0_scsi].
Box and VM share ESX = SAME_ESX_NOT_SAME

RPA2:
Marking (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = Yes - side defined as source. Site=Source_Site.
Source backlog mirroring (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = NOT NEEDED
Transfer (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = Yes
Journal (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = Yes
Target backlog mirroring (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = NOT NEEDED
Preferred = NO
Box and VM share ESX = SAME_ESX_SAME
Number of VMs in same ESX is 1

Error in replication logs (extracted.*/files/home/kos/replication/result.log):
2017/01/16 13:03:30.187 - #1 - 5508/5482 - DataCommIoRequest: Got NACK from splitter, Error code = 2 *m_kboxDataCommMessage = KboxDataCommMessage, DataCommMessage, m_multiIoId: 1804479667 m_msgId: 17968710 m_type: 1 m_lbaAndLens: 1  lengthInBlocks: 1024  m_guid: 0x69f6a49648317aec m_version: 0 m_isFastPath: 1 m_hostId: ESX 0x10f9536584e11e30 m_priority: 6

Errors in splitter logs on host seen in replication logs above(m_hostId: ESX 0x10f9536584e11e30):
2017/01/16 13:11:13.714 - #2 - 570188/570154 - KS: krnl:[13:11:13.594] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x41132aa45240 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:13.594] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250463060, startTC = 0
krnl:[13:11:13.594] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.
2017/01/16 13:11:14.828 - #2 - 570188/570154 - KS: krnl:[13:11:14.289] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x41132aa41210 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:14.289] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250463777, startTC = 0
krnl:[13:11:14.289] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.
2017/01/16 13:11:15.937 - #2 - 570188/570154 - KS: krnl:[13:11:15.118] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x41132aa44c90 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:15.118] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250464606, startTC = 0
krnl:[13:11:15.118] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.
krnl:[13:11:15.676] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x4113298f6108 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:15.676] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250465165, startTC = 0
krnl:[13:11:15.676] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.

Cause

The  Data Store on the target side for these Consistency Group had no free space, so RecoverPoint could not write to them.
 

Resolution

Resolution:

Review target side ESX to see that all target Data Stores (for target hosts and journals) have free space.
For Data Stores without space - free up space or increase the Data Store size, so RecoverPoint can write to it.

Affected Products

RecoverPoint for Virtual Machines

Products

RecoverPoint for Virtual Machines
Article Properties
Article Number: 000054855
Article Type: Solution
Last Modified: 20 Nov 2020
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.