Consistency groups in Error state, or pause/init/error loops.
Sample state from get_group_state command:
CG:
Enabled: YES
Transfer source: Source
Copy:
Target:
Enabled: YES
Regulation Status: REGULATED
Active primary RPA: RPA 1
Journal: LONG RESYNC
Storage access: NO ACCESS
Max journal size: 1.09 TB
Source:
Enabled: YES
Active primary RPA: RPA 1
Storage access: DIRECT ACCESS (marking data)
Link:
Source->Target:
Data Transfer: ERROR
Events in GUI or CLI get_events_log:
Time: Mon Jan 16 15:29:30 2017
Topic: GROUP
Scope: DETAILED
Level: ERROR
Event ID: 4009
Cluster: Target_Site_vRPA
Global links: None
Groups: [CG, CG_Copy]
Links: [CG, CG_Prod->CG_Copy]
Summary: Pausing data transfer for group
Details: Reason=
distributor error.
Time: Mon Jan 16 15:31:03 2017
Topic: GROUP
Scope: DETAILED
Level: WARNING
Event ID: 4001
Cluster: Source_Site
Global links: None
Groups: [CG, CG_Prod]
Summary: Minor problem in group capabilities
Details: Copies are linked.
RPA1:
Marking (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = Yes - side defined as source. Site=Source_Site.
Source backlog mirroring (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = NOT NEEDED
Transfer (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = NO - can't maintain history - not paused on snapshot and user volume problem - Volume issue. Site=Target_Site, RPA1, Device=[CG, CG_Copy, CG_RSET_CG_3_0_scsi].
Box and VM share ESX = SAME_ESX_NOT_SAME
RPA2:
Marking (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = Yes - side defined as source. Site=Source_Site.
Source backlog mirroring (GlobalCopy(SiteUID(0x1207d88a9b552bf9) 0) ) = NOT NEEDED
Transfer (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = Yes
Journal (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = Yes
Target backlog mirroring (GlobalCopy(SiteUID(0x7b8822840a7e317f) 0) ) = NOT NEEDED
Preferred = NO
Box and VM share ESX = SAME_ESX_SAME
Number of VMs in same ESX is 1
Error in replication logs (extracted.*/files/home/kos/replication/result.log):
2017/01/16 13:03:30.187 - #1 - 5508/5482 - DataCommIoRequest:
Got NACK from splitter, Error code = 2 *m_kboxDataCommMessage = KboxDataCommMessage, DataCommMessage, m_multiIoId: 1804479667 m_msgId: 17968710 m_type: 1 m_lbaAndLens: 1 lengthInBlocks: 1024 m_guid: 0x69f6a49648317aec m_version: 0 m_isFastPath: 1 m_hostId: ESX 0x10f9536584e11e30 m_priority: 6
Errors in splitter logs on host seen in replication logs above(m_hostId: ESX 0x10f9536584e11e30):
2017/01/16 13:11:13.714 - #2 - 570188/570154 - KS: krnl:[13:11:13.594] 0/0 #0 -
IoEsx_ToStorage_v_isSucceeded_i: IO 0x41132aa45240
Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:13.594] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250463060, startTC = 0
krnl:[13:11:13.594] 0/0 #0 -
CommandIoDataCommWrite_v_storageEndIo_i:
Failed write to storage. io_index = 0. Io status 0.
Failing DataComm Write.
2017/01/16 13:11:14.828 - #2 - 570188/570154 - KS: krnl:[13:11:14.289] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x41132aa41210 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:14.289] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250463777, startTC = 0
krnl:[13:11:14.289] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.
2017/01/16 13:11:15.937 - #2 - 570188/570154 - KS: krnl:[13:11:15.118] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x41132aa44c90 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:15.118] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250464606, startTC = 0
krnl:[13:11:15.118] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.
krnl:[13:11:15.676] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: IO 0x4113298f6108 Failed. Host_Status = 0x0, Device_Status = 0x8, dataLength = 393216
krnl:[13:11:15.676] 0/0 #0 - IoEsx_ToStorage_v_isSucceeded_i: numFSSRetries = 0, numFDSRetries = 0, absTimeoutMS = 250465165, startTC = 0
krnl:[13:11:15.676] 0/0 #0 - CommandIoDataCommWrite_v_storageEndIo_i: Failed write to storage. io_index = 0. Io status 0. Failing DataComm Write.
Resolution:
Review target side ESX to see that all target Data Stores (for target hosts and journals) have free space.
For Data Stores without space - free up space or increase the Data Store size, so RecoverPoint can write to it.