Unsolved
4 Posts
0
1812
NVIDIA v100 32GB SXM2 (Dell p/n NWWWX) in Dell C4130 Configuration K - HW Power Brake Slowdown
Hello,
We're running a couple of Dell C4130's "Configuration K" (each with four NVIDIA v100 16GB SXM2, Dell p/n YMV9T) and decided to upgrade one of those with 32GB v100s accelerators (Dell p/n NWWWX).
The system boots up without warnings however the GPU is throttled by "HW Power Brake":
HW Slowdown : Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Active
The system is equipped with dual 1600W power supplies (Dell p/n 95HR5). BIOS and iDRAC are upgraded to the latest version available.
It seems like this behavior is due to BMC asserting PCIe power brake signal for some reason. Would it be possible to override it somehow (e.g. with ipmitool raw command)? The NWWWX accelerator is in fully working condition (i.e. it works fine in a Dell C4140).
Sincerely,
Mike
DELL-Shine K
4 Operator
4 Operator
•
3K Posts
1
October 11th, 2021 04:00
NVIDIA V100 32GB GPGPU is not supported on C4130. As this is not supported it is not validated and we can not guarantee that it will work. Did you have latest BIOS and iDRAC FW installed on the server? If not can you try after updating latest iDRAC and BIOS FW.
apescairos
4 Posts
0
October 11th, 2021 07:00
I already have the latest BIOS and iDRAC FW (2.13.0 and 2.81 respectively) installed. I realize that it is not a supported GPGPU.
The question is, is there a BMC command to override the assertion of PCIe power throttling signal / PCIe power monitoring for C4130?
Sincerely,
Mike
apescairos
4 Posts
0
November 9th, 2021 06:00
Bumping up this topic since it's not resolved.
Could anobody please advise on ipmitool raw command to override this behaviour when BIOS/IPMI signals the BMC to assert PCIe power brake and/or cpu hot signals for SXM2 GPU when the configuration is unsupported?
The same SXM2 carrier board and V100 combo works fine (i.e. without power brake) when removed from C4130 and placed in C4140.
Sincerely,
Mike
DELL-Shine K
4 Operator
4 Operator
•
3K Posts
1
November 9th, 2021 17:00
I do not have any details on ipmitool command to change this behavior. Sorry I could not help you on this
apescairos
4 Posts
0
November 16th, 2021 07:00
Update:
Unsupported SXM2 GPGPUs get throttled with "HW Power Brake Slowdown" because the BMC (iDRAC) fails to find power data for their VID/DID in power tables:
Nov 16 10:04:32 idrac-FFDCWL2 L5, S55 [9322]: DellGPGPUPwrBudget: FRU read unsuccessful, getting power data from tables
Nov 16 10:04:32 idrac-FFDCWL2 L4, S55 [9322]: GetGPGPUPwr: Looking for VID=0x10de DID=0x1db5 SVID=0x10de SDID=0x1249
Nov 16 10:04:32 idrac-FFDCWL2 L4, S55 [9322]: GetGPGPUPwr: End of table reached (Entry 92). Didn't find a power table match for device
Adding an entry to power table makes the GPGPU run at full speed. Yes it does not seem that it's possible to do it via ipmi. Managed to find a workaround, working on a more permanent solution. Might update later.