跳至主要內容

PowerEdge: TMP0203: CPU temperature is greater than the upper critical threshold

摘要: Dell EMC PowerEdge Servers with iDRAC9 may report CPU x temperature is greater than the upper critical threshold events when CPU workload spikes at or near CPU Power Max.

本文章適用於 本文章不適用於 本文無關於任何特定產品。 本文未識別所有產品版本。

症狀

Dell EMC PowerEdge Servers with iDRAC9 may report CPU x temperature is greater than the upper critical threshold events when CPU workload spikes at or near CPU Power Max. When transient power bursts occur to meet additional CPU demand, the processor temperature may briefly exceed the upper critical threshold. For example, a server that is operating at 50-60% utilization and spikes to 100% utilization for 5-20 seconds may exceed the upper critical threshold for CPU temperature briefly. When this threshold is exceeded, events are recorded in the System Event Log and Lifecycle Log. When the peak transient spike is over and CPU temperature returns to normal, an event is recorded indicating CPU x temperature is within range.

When transient spikes like this occur, the two events will typically occur within 5-20 seconds of each other. Refer to examples below as guidance for these types of transient events.

 

System Event Log:

2020-04-09 11:14:11

85

CPU 2 temperature is within range.

2020-04-09 11:14:06

84

CPU 2 temperature is greater than the upper critical threshold.

2020-04-09 09:16:31

83

CPU 2 temperature is within range.

2020-04-09 09:16:16

82

CPU 2 temperature is greater than the upper critical threshold.

2020-04-09 08:58:33

81

CPU 2 temperature is within range.

2020-04-09 08:58:17

80

CPU 2 temperature is greater than the upper critical threshold.

2020-04-09 08:25:47

79

CPU 2 temperature is within range.

2020-04-09 08:25:27

78

CPU 2 temperature is greater than the upper critical threshold.

2020-04-09 06:57:02

77

CPU 2 temperature is within range.

2020-04-09 06:56:57

76

CPU 2 temperature is greater than the upper critical threshold.

 

 

Lifecycle Log:

2020-04-09 00:44:15

7851

TMP0205

CPU 2 temperature is within range.

2020-04-09 00:44:07

7850

TMP0203

CPU 2 temperature is greater than the upper critical threshold.

2020-04-08 22:46:31

7773

TMP0205

CPU 2 temperature is within range.

2020-04-08 22:46:18

7772

TMP0203

CPU 2 temperature is greater than the upper critical threshold.

2020-04-08 22:28:34

7769

TMP0205

CPU 2 temperature is within range.

2020-04-08 22:28:18

7768

TMP0203

CPU 2 temperature is greater than the upper critical threshold.

2020-04-08 21:55:49

7736

TMP0205

CPU 2 temperature is within range.

2020-04-08 21:55:29

7735

TMP0203

CPU 2 temperature is greater than the upper critical threshold.

2020-04-08 20:27:03

7697

TMP0205

CPU 2 temperature is within range.

2020-04-08 20:26:58

7696

TMP0203

CPU 2 temperature is greater than the upper critical threshold.

When these temperature thresholds are met, Intel processors may throttle to reduce power consumption and lower CPU temperature.

原因

Issues occur when the CPU temperature is running near target for optimum performance and the CPU transitions to higher workloads. This transient temperature increase can often occur when processor core C-state awakens or processor on-demand Turbo Mode is invoked.

解析度

While these types of transient CPU performance spikes are not abnormal, iDRAC9 Engineering continues to fine-tune the thermal algorithm to prevent these events. For example, iDRAC9 4.22.00.00 and iDRAC9 4.40.00.00 includes thermal improvements specific for this sequence. To ensure that the latest dynamic thermal algorithms are installed on Dell EMC PowerEdge servers, update to the latest available iDRAC9 firmware.

Workarounds:

End-users can manually modify the system thermals to prevent these transient events from spiking CPU temperature. Use either of the following workarounds to increase the fan speed baseline and maintain lower CPU temperature.

Max Performance Profile

System Thermal Profile Optimization can be modified to Maximum Performance (Performance Optimized). This thermal profile carries the following advantages:

  • Reduced probability of memory or CPU throttling
  • Increased probability of turbo mode activation
  • Generally, higher fan speeds at idle and stress loads

Thermal Profile Optimization can be modified through following methods:

iDRAC9 GUI >> Configuration >> System Settings >> Hardware Settings >> Cooling Configuration
IDRAC Cooling configuration settings page 

racadm set System.ThermalSettings.ThermalProfile

racadm>>racadm set System.ThermalSettings.ThermalProfile 1

[Key=System.Embedded.1#ThermalSettings.1]

Object value modified successfully

 

Supported Values:

0 - Default Thermal Profile Settings

1 - Maximum Performance

2 - Minimum Power

3 - Sound Cap

 

 

Fan Speed Offset

Fan speed offset allows you to increase the system fan speed with four incremental steps. These steps are equally divided between the typical baseline speed and the maximum speed of the server system fans. A fan speed offset causes fan speeds to increase (by the offset % value) over baseline fan speeds calculated by the Thermal Control algorithm. Possible values are:

  • Low Fan Speed — Drives fan speeds to a moderate fan speed.
  • Medium Fan Speed — Drives fan speeds close to medium.
  • High Fan Speed — Drives fan speeds close to full speed.
  • Max Fan Speed — Drives fan speeds to full speed.
  • Off — Fan speed offset is set to off. This is the default value. When set to off, the percentage does not display. The default fan speed is applied with no offset. Conversely, the maximum setting results in all fans running at maximum speed.

Fan Speed Offset can be modified through following methods:

iDRAC9 GUI >> Configuration >> System Settings >> Hardware Settings >> Cooling Configuration
IDRAC Cooling configuration Fan Speed Offset 

racadm set System.ThermalSettings.FanSpeedOffset

 

racadm>>racadm set System.ThermalSettings.FanSpeedOffset 2

[Key=System.Embedded.1#ThermalSettings.1]

Object value modified successfully

 

Supported Values:

0 - Low

1 - High

2 - Medium

3 - Max

255 - Off

 

Note: Increasing fan speed baselines increase fan power consumption and increase in fan acoustics.

受影響的產品

iDRAC9, PowerEdge R450, PowerEdge R540, PowerEdge R550, PowerEdge R640, PowerEdge R6415, PowerEdge R650, PowerEdge R650xs, PowerEdge R660, PowerEdge R660xs

產品

PowerEdge R740, PowerEdge R740XD, PowerEdge R740XD2, PowerEdge R7415, PowerEdge R7425, PowerEdge R750, PowerEdge R750XA, PowerEdge R750xs, PowerEdge R760, PowerEdge R760XA, PowerEdge R760xd2, PowerEdge R760xs, PowerEdge R840, PowerEdge R860 , PowerEdge R940, PowerEdge R940xa, PowerEdge R960, PowerEdge T440, PowerEdge T550, PowerEdge T560, PowerEdge T640 ...
文章屬性
文章編號: 000123186
文章類型: Solution
上次修改時間: 04 12月 2024
版本:  9
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。