Unsolved
1 Rookie
•
10 Posts
0
102
MD3260: Failed Thin virtual Disk with no hints what the problem is, help!
Hello,
Really hoping for some hints as we facing 230TB recovery otherwise..
We have two MD3260 enclosures with 4 thin virtual disks that are bundled in a single 4x60T LVM volume, then XFS on it - actually 3x60TB+1x58TB and 3TB available.
We approached recently 230TB and we got critical problem, with tons of hardware errors - problem is the SMcli says that the smaller Virtual Thin Disk is problematic but without any hints and all hardware components seems to be ok (disks and controller):
SMcli -n aspera_md_1 -c 'show storagearray healthStatus;'Performing syntax check...Syntax check complete.Executing script...The following failures have been found:Thin Virtual Disk FailedStorage array: aspera_md_1Disk pool: Disk_Pool_1Thin Virtual Disk: gaia_virtual_p2Status: Failed
On the OS side we have plenty of IO errors in the kernel:
Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#0 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#0 CDB: Write(16) 8a 00 00 00 00 00 00 3d e0 f8 00 00 00 08 00 00[Sun Oct 15 05:24:10 2023] blk_update_request: 60 callbacks suppressed[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev sdg, sector 4055288[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 Sense Key : Hardware Error [current][Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 CDB: Write(16) 8a 00 00 00 00 00 01 23 fa f8 00 00 00 08 00 00[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev sdg, sector 19135224[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev dm-3, sector 4055288[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev dm-3, sector 19135224[Sun Oct 15 05:24:10 2023] XFS (dm-7): metadata I/O error in "xfs_buf_iodone_callback_error" at daddr 0x7bb2f8 len 8 error 121[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 Sense Key : Hardware Error [current][Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 05 7f ff fc 01 00 00 00 01 00 00[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 23622319105[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 Sense Key : Hardware Error [current][Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 CDB: Write(16) 8a 00 00 00 00 05 7f ff fc 02 00 00 00 01 00 00[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 23622319106[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev dm-3, sector 23622319105[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev dm-3, sector 23622319106[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 Sense Key : Hardware Error [current][Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 CDB: Write(16) 8a 00 00 00 00 00 00 3d e0 f8 00 00 00 08 00 00[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 4055288[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 Sense Key : Hardware Error [current][Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 CDB: Write(16) 8a 00 00 00 00 00 01 23 fa f8 00 00 00 08 00 00[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 19135224[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 Sense Key : Hardware Error [current][Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 05 7f ff fc 01 00 00 00 02 00 00[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 Sense Key : Hardware Error [current][Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 <<vendor>>ASC=0x84 ASCQ=0x0[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 CDB: Write(16) 8a 00 00 00 00 00 00 3d e0 f8 00 00 00 08 00 00[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 Sense Key : Hardware Error [current]
It seems that VD size id 60TB but actual size of the thin VD is 58TB..
We tried several things:
1. increasing the size of the repo associated with this thin vd, but it errors out (we have one spare that could be used for this). There are no details shown with the error (SMClient)
2. Run the test, with the error below...
SMcli -n aspera_md_1 -c 'check virtualDisk [gaia_virtual_p2] repositoryConsistency file="/tmp/p2_consistency.txt";'Performing syntax check...Syntax check complete.Executing script...Script execution complete.SMcli completed successfully.[root@gaiaftp01: disks]$ less /tmp/p2_consistency.txtThinRepo 0xc002 State:FAILED RollbackState:NONEImage 5(xc002) validation startingError: Dir2CN:0x000000440bbb(Current) Bad signatureError: Dir2CN:0x000000440bbb(Current) Bad Level Found:0x47 Expected:0x32Error: Dir2CN:0x000000440bbb(Current) Bad Location DirA:0x5a3c806b33570a1d DirB:0xdb8e591145f3a3a2Error: Dir2CN:0x000000440bbb(Current) Cluster out of bounds DataCN:0x67fa78c873cc Offset:0x30Dumping Dir Level 2 (Current) CN:x000000440bbb LBA:x0002205dd800x0000 | ...].{m8..3Gr... | 7F94AB5D 987B6D38 DF1B3347 72CCF7E0 |0x0010 | ..W3k.<Z...E.Y.. | 1D0A5733 6B803C5A A2A3F345 11598EDB |0x0020 | ...*R....}....h. | E0A4D72A 52CEB290 B87DB292 E6C9689C |0x0030 | .s.x.g{.O@]..pIk | CC73C878 FA677BCE 4F405DA5 9670496B |0x0040 | .^.n....1wY..... | BC5E076E FC8DAA10 31775915 B20308B8 |0x0050 | .7j.(i.4...yme.. | 99376AF2 28698234 84E0F879 6D65F112 |0x0060 | ..Uf].Eo.V.....e | DFED5566 5DED456F C356D485 10C19565 |0x0070 | ...Y..yK.*...i.. | F4939759 0181794B 9E2AB1E6 F069EFE9 |0x0080 | N..&........T..f | 4E8BCE26 04010A91 819DDEB8 54DF9B66 |0x0090 | (<...>...m...@.f | 283CFA09 FB3E961D 976D9CCB 1040DB66 |0x00a0 | ~........;...... | 7EDC07E8 07ECE2E9 C43BEDFE FBBFD2B1 |0x00b0 | ......&.uo5.M5m. | 8DCBEDDA F1F62692 756F35FE 4D356D8A |0x00c0 | .....G.....2.... | E5F49EE1 F347C5F5 BC9ED632 E8C5B815 |0x00d0 | H6{.lv...J..<.]. | 48367B18 6C76DD19 B84AA887 3CA05DA1 |0x00e0 | ....Bc`....5V].. | 83D7D00D 42636018 C89CAF35 565DDD1F |0x00f0 | ....X..B.F/.V..1 | 9DA4A4F5 581AFE42 F1462FB9 56A1CB31 |0x0100 | '.{"...E..).."0. | 27EC7B22 7FD3C945 1FC229A0 0322301D |0x0110 | ..CF..(...N..Wx. | 041A4346 D18D2894 CAFA4EC1 8B577817 |0x0120 | 3..5Z$.!.j#JSf.} | 33FCCA35 5A240821 9A6A234A 5366E27D |0x0130 | .......j..-..... | A49A1FBE ED0BA56A A6E32DB2 D4AD07E9 |0x0140 | .I..z;..'.,..Zr. | DD49B38A 7A3B97B7 27972CA3 0B5A7204 |0x0150 | .^t..)...6..n.'. | F45E74D4 8229180C 0A3682C3 6E0B27B2 |0x0160 | .A.%3G..*AD..;.) | 8A41B125 3347B995 2A4144F8 C23BCF29 |0x0170 | ..-.....*...N.:. | 1F8C2DBB B2078718 2AB3EBEA 4EC63AF1 |0x0180 | .N.~...-y..`.5.. | 8D4E017E 82C2852D 79DDEB60 C535D4EA |0x0190 | ,.....Q.).....V+ | 2CCBEC04 12C451A1 29C8D004 BF88562B |0x01a0 | ..K>.....j....zl | B1824B3E 7FE9D4C2 846AA094 E7157A6C |0x01b0 | ..S...Vx...K...a | 0EDB53A3 BD985678 BAF8BA4B 149E8661 |0x01c0 | .."...+.J....... | FE032203 E2B82BD6 4AAAB813 88AFDAE5 |0x01d0 | ....hi.RQ.C..." | 0DEDB0EF 6869E252 51DC43BD DC1B2220 |0x01e0 | ....0.).../+.Jv{ | F6BDFEE6 30AE29FA 8E022F2B A44A767B |0x01f0 | .^.p.G%,.~.).Su. | 155ECB70 F047252C 117E9229 905375A6 |Directory Structure - Dir Level 0 CN:x000000000005 Offset:x0Dir Level 1 CN:x000000000007 Offset:x72Dir Level 2 CN:x000000440bbbImage 5(xc002) validation failed with 4 errors 0 warningsValidation error limit reached, stopping validationThinRepo 0xc002 validation failed with 4 errors 0 warnings
Please help as we are really out of ideas. The logical try was to expand thin v-disk repo, but it errored out - also no hints why.. Is there any chance to make it back alive?
The deeper problem is that we do not know if it's really a non-reported hardware error or not as we would not like to dump the enclosures - but we cannot be sure the same situation does not resurface if we delete the failed VD (and loose all 4 VDs LVM/XFS volume)..
Regards,
Chris
DELL-Joey C
Moderator
Moderator
•
3.3K Posts
0
October 23rd, 2023 03:02
Hi,
Probably you can try to check if the error output can lead you to some information what is the cause of the storage not accessible: https://dell.to/3tJpkSs or, you can also try check the physical disk info status, probably multiple disk failure: https://dell.to/3FtV3K3
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
October 23rd, 2023 08:38
Thank you for the hints @DELL-Joey C !
However, what's puzzling is that there are no hw errors reported, I am attaching the excerpt from allPhysicalDisks:
Otherwise, exerpt gave the hint why the repo epansion failed, not sure if there's a newer firmware though - are you aware of any newer version that would support underlying repository thin VD expansion?
Regards,
Chris
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
October 24th, 2023 11:28
Hello @DELL-Joey C
Thanks, will try to update the Storage Manager and retry repo expansion!
The OS is Centos 7.9, machine is R720.
(edited)
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
October 26th, 2023 13:39
Hello,
We managed to expand the underlying repository capacity by enabling manual expansion and then adding capacity in chunks of 256GB. But it did not help, think VD is still marked as failed.
We are inclining to reset it, but it's a pity neither reason nor explanation are given for this error..
(edited)
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
October 27th, 2023 14:49
@Dell-ErmanO - thanks for the hints, before I saw your suggestion we tried to run some Reports from SMClient to see anything unusual, and indeed we see strange error in the event log (even drives are marked as ok)
Will do the test on all the disks of this VD as well, maybe we'll see more surprises..
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
October 27th, 2023 22:26
Hello @Dell-ErmanO
Tried to run "set physicalDisk" command but at no avail:
This is with updated SMcli (still from 2016)..
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
November 1st, 2023 15:00
Hello,
We are on Modular Disk Storage Manager 11.25.0A06.0026 which seems to be newer, it's from DELL_MDSS_Consolidated_RDVD_6_5_0_1.iso from 2018.
I do not think "check virtualDisk" command works for think VDs? - it fails also for the healthy thin VD (gaia_parition_p1) - tried also quoted VD name:
(edited)
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
November 1st, 2023 15:04
(sorry, I have trouble to format here so it does not cut the lines to oblivion - either in code block or preformatted- let me know if the full comand is visible, which should be
/SMcli -n aspera_md_1 -c ' check virtualDisk ["gaia_virtual_p2"] consistency consistencyErrorFile="/tmp/ConsistencyError.txt" mediaErrorFile="/tmp/mediaError.txt" priority=high verbose=true;'
Performing syntax check...
Syntax check complete.
Executing script...
check virtual disk["gaia_virtual_p2"] consistency command has started.
Unable to execute the Check Virtual Disk Consistency command on virtual disk "gaia_virtual_p2" using the command at line 1.
Error 34 - The operation cannot complete because the virtual disk specified in the request is not valid (unknown virtual disk reference). The virtual disk may have been deleted or modified by a user on another management station accessing this storage array.
The command at line 1 that caused the error is:
check virtualDisk ["gaia_virtual_p2"] consistency consistencyErrorFile="/tmp/ConsistencyError.txt" mediaErrorFile="/tmp/mediaError.txt" priority=high verbose=true;
Script execution halted due to error.
SMcli failed.
DPCG Gaia
1 Rookie
1 Rookie
•
10 Posts
0
November 6th, 2023 13:30
@DELL-Chris H
Thank you for the hint,
This will be difficult as we do not have support contract anymore.
I guess we must accept the data loss (well, we lost it already).
Thanks for trying to help anyways.
Regards,
Chris