"Found SFM #, last power-cycle reason:", as highlighted below in a sample of show trace output.Force10#show trace 100 | grep SFM[2/19 13:18:59] RAM-(RpmAvailMgr):Send data sync msg (42) to task 4 SFM Config State ).[2/19 13:22:47] TSM-(tsm):Receive SFM 7 SFM_DETECT REMOVE event.[2/19 13:22:47] TSM-(tsm):tsmSfmRemove: Remove SFM 7[2/19 13:22:47] TSM-(tsm):tsmSfmRemove: SFM 7 is powered off.[2/19 13:22:48] TSM-(tsm):tsmSfmRemove: SFM 7 is powered on.[2/19 13:22:49] TSM-(tsm):Set SFM minor alarm[2/19 13:22:49] TSM-(tsm):tsmSfmRemove:8: SW FAB is good after removing SFM 7 [2/19 13:22:50] TSM-(tsm):Receive SFM 7 SFM_DETECT INSERT event. [2/19 13:22:50] TSM-(tsm):SFM 7 is reset with SFM Card insert event, bring up the card [2/19 13:22:50] TSM-(tsm):Found SFM 7, last power-cycle reason: power on with cause of DEFAULT [2/19 13:22:50] TSM-(tsm):TSM initilizes SFM 7... [2/19 13:22:51] ****** ERROR CHMGR-(chmgr):SFM 7 not present or bad slot id [2/19 13:22:52] TSM-(tsm):Clear SFM minor alarm [2/19 13:22:52] TSM-(tsm):tsmSfmAdd:8: LC is in service, no PP test. SFM 7 standby. numSfmFound = 9 [2/19 13:22:52] TSM-(tsm):Receive SFM 7 RESET_DETECT ASSERT event. [2/19 13:22:52] TSM-(tsm):SFM 7 reset is cleared, no action
show trace show logging Dec 30 11:12:20 PST: %RPM0:CP %CHMGR-2-MINOR_SFM: Minor alarm: No working standby SFM Dec 30 11:12:20 PST: %RPM0:CP %TSM-2-SFM_RESET_PRESENT: SFM 2 reset unexpectedly Dec 30 11:12:22 PST: %RPM0:CP %TSM-6-SFM_DISCOVERY: Found SFM 2 Dec 30 11:12:23 PST: %RPM0:CP %CHMGR-5-MINOR_SFM_CLR: Minor alarm cleared: Working standby SFM present Dec 30 11:12:23 PST: %RPM0:CP %TSM-6-SFM_DISCOVERY: Found 9 SFMs show sfm all
"m" - MDIO error "I" - I2C access error
Feb Feb 19 04:44:02: %RPM0:CP %TSM-6-SFM_SWITCHFAB_STATE: Switch Fabric: DOWN Feb 19 04:44:02: %RPM0:CP %TSM-2-SFM_GENERAL_ACCESS_M: SFM 3 found general access error (type m) Feb 19 04:44:05: %RPM0:CP %TSM-6-SFM_DISCOVERY: Found SFM 3 Feb 19 04:44:06: %RPM0:CP %TSM-6-SFM_SWITCHFAB_STATE: Switch Fabric: UP Feb 19 04:44:36: %RPM0:CP %TSM-6-SFM_SWITCHFAB_STATE: Switch Fabric: DOWN Feb 19 04:44:37: %RPM0:CP %CHMGR-0-MAJOR_SFM: Major alarm: Switch fabric down Feb 19 04:44:38: %RPM0:CP %TSM-2-SFM_UNDER_VOLT: SFM 3 powered off due to under voltage SFM Simba PSI access error
show trace Output [6/4 2:13:13] TSM-(tsm):Receive SFM 1 ERR_DETECT event [6/4 2:13:13] TSM-(tsm):tsmSfmRemove: Remove SFM 1 [6/4 2:13:13] TSM-(tsm):tsmSfmRemove: SFM 1 is powered off. [6/4 2:13:13] POLLER-(PM):doSfmSaSanErr: eventId=17, slotId=1, state=1, value[0]=0x1fd, value[1]=0x0 [6/4 2:13:14] TSM-(tsm):tsmSfmRemove: SFM 1 is powered on. [6/4 2:13:14] CHMGR-(chmgr):add min alrm 12 UNKNOWN 0 0 [6/4 2:13:14] CHMGR-(tsm):0x1382 log alrm 12 to chmgr (rc=84) [6/4 2:13:14] TSM-(tsm):Set SFM minor alarm [6/4 2:13:14] TSM-(tsm):Change SW FAB state from SW_FAB_UP_9 to SW_FAB_UP_8 !—The Etherscale supports one SFM in standby mode. The Terascale requires all 9 SFMs to be operationally active.[5/4 2:13:14] ***** WARNING TSM-(tsm):Turn off SFM 1 active LED fail. [5/4 2:13:14] ***** WARNING TSM-(tsm):Turn on SFM 1 Status LED Amber fail. !—During a failure, check the Status LED. [5/4 2:13:15] ****** ERROR TSM-(tsm):tsmIsSfmPowerOn: f10SysRpmSfmCardInfoGet() failed for SFM 1 power status [5/4 2:13:15] ****** ERROR TSM-(tsm):CheckSFMCardPower: tsmIsSfmPowerOn() failed for SFM 1 power status [5/4 2:13:15] ****** ERROR TSM-(tsm):tsmHandleSfmError: Different error detected on SFM 1 (erro = 262163). SFM already in SFM_ERROR state [6/4 2:13:15] TSM-(tsm):SFM 1 ERR_DETECT event is confirmed [6/4 2:13:15] TSM-(tsm):Receive SFM 1 SIMAB_DETECT event [5/4 2:13:15] ****** ERROR TSM-(tsm):tsmIsSFMReset: SFM 1 is not accessible via scratch pad (SFM_FAITH_CR = 0) [6/4 2:13:15] TSM-(tsm):tsmSfmRemove: Remove SFM 1 [6/4 2:13:15] TSM-(tsm):tsmSfmRemove: SFM 1 is powered off. [6/4 2:13:16] TSM-(tsm):tsmSfmRemove: SFM 1 is powered on. [5/4 2:13:16] ***** WARNING TSM-(tsm):Turn off SFM 1 active LED fail. [5/4 2:13:16] ***** WARNING TSM-(tsm):Turn on SFM 1 Status LED Amber fail. [5/4 2:13:17] ****** ERROR TSM-(tsm):tsmIsSfmPowerOn: f10SysRpmSfmCardInfoGet() failed for SFM 1 power status
show sfm all Force10#sh sfm all Switch Fabric State: up -- Switch Fabric Modules -- Slot Status --------------------------------------------------------------------------- 0 card problem (SFM Simba PSI access error) 1 active 2 active 3 active 4 active 5 active 6 active 7 active 8 active
Force10#show chassis brief Chassis Type : E300 Chassis Mode : TeraScale Chassis Epoch : 10.4 micro-seconds -- Line cards -- Slot Status NxtBoot ReqTyp CurTyp Version Ports --------------------------------------------------------------------------- 0 online online EX1YE3 EX1YE3 5.3.1.2b 1 1 online online EX1YE3 EX1YE3 5.3.1.2b 1 2 online online EX1YE3 EX1YE3 5.3.1.2b 1 3 online online EX1YE3 EX1YE3 5.3.1.2b 1 4 online online E12PE3 E12PE3 5.3.1.2b 12 5 not present -- Route Processor Modules -- Slot Status NxtBoot Version --------------------------------------------------------------------------- 0 active online 5.3.1.2b 1 not present Switch Fabric State: up -- Switch Fabric Modules -- Slot Status --------------------------------------------------------------------------- 0 SW FAB diags failed (Multiple SFMs failed SW FAB portpipe diags) 1 active [output omitted]
在多种情况下会报告重大警报。其中一种情况是超过环境监控硬件和软件检测到的 SFM 安全工作温度。除了错误消息之外, showenvironment 命令还可能会捕获高温情况:
Feb 27 04:52:16 UTC: %RPM0:CP %CHMGR-2-TEMP_SHUTDOWN_WARN: WARNING! SFM 6 temperature is 85C; approaching shutdown threshold of 80C) Feb 27 04:52:16 UTC: %RPM0:CP %CHMGR-2-MAJOR_TEMP: Major alarm: chassis temperature high (SFM temperature reaches or exceeds threshold of 75C) Feb 27 04:52:21 UTC: %RPM0:CP %CHMGR-2-MAJOR_TEMP_CLR: Major alarm cleared: chassis temperature lower (SFM 6 temperature is within threshold of 70C)
发生这种情况时,SFM 确实过热,或者传感器出现故障。如果紧邻的 SFM 温度正常,请怀疑传感器有故障。如果紧邻的 SFM 的温度不是正常,请怀疑真正的过热情况。
当系统检测到真正的温度过高情况时,它会关闭 SFM 电源,直至其冷却下来,直到软件确定可以安全地重新通电。重新通电后,SFM 重置原因将被硬件报告为“温度过高”。如果软件检测到温度过高事件并手动关闭 SFM,系统将报告 SFM 重置原因为“远程关机”。
要查看已编程的警报阈值级别,请执行 show alarms threshold 命令:
E600-TAC-3#show alarms threshold -- Temperature Limits (deg C) -- ----------------------------------------------------------- Minor Minor Off Major Major Off Shutdown Linecard 75 70 80 77 85 RPM 65 60 75 70 80 SFM 65 60 75 70 80
使用以下步骤对此情况进行故障处理:
通过 reset sfm 命令重置活动 SFM 可能会导致流量中断,并且此消息:
Force10#reset sfm 0 SFM 0 is active. Resetting it might temporarily impact traffic. Proceed with reset? Confirm [yes/no]:
SFM 因欠压状态
Force10>show sfm 3 Switch Fabric State: up -- SFM card 3 -- Status : power off - SFM powered off due to under-voltage Card Type : SFM - Switch Fabric Module Up Time : 0 sec Temperature : 33C Power Status : PEM0: up PEM1: up Serial Number : 0012632 Part Number : 7520003706 Rev A Vendor Id : 01 Date Code : 01442003