FOS v8.1.x 和 FOS v8.2.x 之间的固件下载
影响:
- 定向器交换机冷恢复
- 用户执行 lscfg 操作,操作失败并发生 PMGR-1006 raslog 事件
环境:
Dell EMC 硬件:Connectrix ED-DCX6-4B
Dell EMC 硬件:Connectrix ED-DCX6-8B
Dell EMC 软件:Connectrix B 系列 Fabric OS (FOS) 8.1
Dell EMC 软件:Connectrix B 系列 Fabric OS (FOS) 8.2
问题:
从 FOS 8.1.x 升级到 FOS 8.2.x 可能会在虚拟构造 (VF) 配置文件的 PORTMAP 条目中创建一组无效的端口范围。
只有在从较早的 FOS 版本升级到 FOS 8.2.0 或更高版本,并且 PORTMAP 条目大小增长到大于或等于 1024 时,才会在 X6-8 或 X6-4 定向器交换机上发生此情况。
只有在使用“lscfg”CLI 命令在逻辑交换机之间反复移动端口后,PORTMAP 条目大小才会增长到超过 1024 个字符。可以通过以下方式计算当前的 PORTMAP 条目大小:
- 使用 CLI 命令“configupload-vf”,找到已上传的配置文件,然后运行“grep PORTMAP <uploaded-configuration-filename>”,并计算从“F”或“G”到结束“]”字符的字符数。
出厂安装 FOS 8.2.0 或更高版本的 X6-8 或 X6-4 定向器交换机将不会发生此问题,除非它们被降级为 8.2.0 之前的 FOS 版本,然后再次升级到 FOS 8.2.0 或更高版本。
症状:
未启用 VF 的交换机:
- 未启用 VF 的用户将不会看到任何外部症状。
- 检查 VF 配置中的 PORTMAP 条目可能会在表映射中显示无效的端口号,但如果未启用 VF,这些无效的端口号条目将不会影响交换机的运行。
- 1800 和 3399 之间的任何端口号均被视为“无效条目”。
- 这不会影响没有 VF 的系统。但是,如果正在使用逻辑构造,或计划在将来启用逻辑构造,则应移除这些无效条目。
如果字符限制高于或接近 1024,则应减少字符限制以解决/防止上述症状。
启用了 VF 的交换机:
仅当 VF 配置文件中的 PORTMAP 条目的大小超过 1024 个字时,已启用 VF 的用户才会看到影响。
如果端口在一个逻辑交换机和另一个逻辑交换机之间重复移动,则会发生这种情况。管理员可以验证表的大小,以确定是否接近故障点。
检查 PORTMAP:
- 以管理员用户身份运行命令“configupload -vf”。输出可用于查看 VF 配置文件中的 PORTMAP 条目。
它同时显示 FC 和 GE PORTMAP,并为所有逻辑交换机显示 PORTMAP。
- 要手动检查 1024 限制,请计算从“F”或“G”到最终的“]”的字符数,或者联系支持部门。
示例:
上传的示例文件中的 FC PORTMAP 包含 528 个字符。
上传的示例文件中的 GE PORTMAP 包含 510 个字符。
达到字符限制之前的症状:
上传的文件示例:
# BROCADE
# VERSION 822
# PLATFORM 166
# SWITCHCONF
SYSTEM max
ATTRIBUTE SYS_NAME:sw0
ATTRIBUTE VF:0
ATTRIBUTE ETHSW_ENABLED:0
ATTRIBUTE BLADE_IDS1:00afafbab20000
ATTRIBUTE BLADE_IDS2:b1b10000000000
SWITCH fcsw-0
ATTRIBUTE FID:128 SWNAME:sw0 USR:3400 GE:256 ICL:128 DS:1 TID:775683370
PIN 5
PORTMAP FC:[0-447,768-895,1152-1215,1816-1823,1848-1855,1880-1887,1912-1919,1944-1951,1976-1983,2008-2015,2040-2047,2072-2079,2104-2111,2136-2143,2168-2175,2200-2207,2232-2239,2264-2271,2296-2303,2328-2335,2360-2367,2392-2399,2424-2431,2456-2463,2488-2495,2520-2527,2552-2559,2584-2591,2616-2623,2648-2655,2680-2687,2712-2719,2744-2751,2776-2783,2808-2815,2840-2847,2872-2879,2904-2911,2936-2943,2968-2975,3000-3007,3032-3039,3064-3071,3096-3103,3128-3135,3160-3167,3192-3199,3224-3231,3256-3263,3288-3295,3320-3327,3352-3359,3384-3391]
PORTMAP GE:[0-255,1816-1823,1848-1855,1880-1887,1912-1919,1944-1951,1976-1983,2008-2015,2040-2047,2072-2079,2104-2111,2136-2143,2168-2175,2200-2207,2232-2239,2264-2271,2296-2303,2328-2335,2360-2367,2392-2399,2424-2431,2456-2463,2488-2495,2520-2527,2552-2559,2584-2591,2616-2623,2648-2655,2680-2687,2712-2719,2744-2751,2776-2783,2808-2815,2840-2847,2872-2879,2904-2911,2936-2943,2968-2975,3000-3007,3032-3039,3064-3071,3096-3103,3128-3135,3160-3167,3192-3199,3224-3231,3256-3263,3288-3295,3320-3327,3352-3359,3384-3391]
也可以在 RAS supportsave 文件中找到该 PORTMAP。(不是 RAS_POST;仅 RAS。)
它可能会出现多次,因为 supportsave 会对 vf-conf.<swbd>、switch-conf.<swbd> 和 .save 文件执行“cat”操作。vf-conf.<swbd> 的“cat”是最佳选择。
来自 supportsave 文件 switch0-xxx.xxx.xx.xxx-S1cp-202001152137.RAS.txt 的示例:
********************************************************
SWITCHCMD /bin/cat /etc/fabos/config/vf-conf.166:
********************************************************
/bin/cat /etc/fabos/config/vf-conf.166:
SYSTEM max
ATTRIBUTE SYS_NAME:sw0
ATTRIBUTE VF:0
ATTRIBUTE ETHSW_ENABLED:0
ATTRIBUTE BLADE_IDS1:00afaf00000000
ATTRIBUTE BLADE_IDS2:b1b10000000000
SWITCH fcsw-0
ATTRIBUTE FID:128 SWNAME:sw0 USR:3400 GE:256 ICL:128 DS:1 TID:901059396
PIN 5
PORTMAP FC:[0-447,768-895,1152-1215,1816-1823,1848-1855,1880-1887,1912-1919,1944-1951,1976-1983,2008-2015,2040-2047,2072-2079,2104-2111,2136-2143,2168-2175,2200-2207,2232-2239,2264-2271,2296-2303,2328-2335,2360-2367,2392-2399,2424-2431,2456-2463,2488-2495,2520-2527,2552-2559,2584-2591,2616-2623,2648-2655,2680-2687,2712-2719,2744-2751,2776-2783,2808-2815,2840-2847,2872-2879,2904-2911,2936-2943,2968-2975,3000-3007,3032-3039,3064-3071,3096-3103,3128-3135,3160-3167,3192-3199,3224-3231,3256-3263,3288-3295,3320-3327,3352-3359,3384-3391]
PORTMAP GE:[0-255,1816-1823,1848-1855,1880-1887,1912-1919,1944-1951,1976-1983,2008-2015,2040-2047,2072-2079,2104-2111,2136-2143,2168-2175,2200-2207,2232-2239,2264-2271,2296-2303,2328-2335,2360-2367,2392-2399,2424-2431,2456-2463,2488-2495,2520-2527,2552-2559,2584-2591,2616-2623,2648-2655,2680-2687,2712-2719,2744-2751,2776-2783,2808-2815,2840-2847,2872-2879,2904-2911,2936-2943,2968-2975,3000-3007,3032-3039,3064-3071,3096-3103,3128-3135,3160-3167,3192-3199,3224-3231,3256-3263,3288-3295,3320-3327,3352-3359,3384-3391]
达到字符限制后的症状:
pdmd 和 hasmd 崩溃症状:
由于 pdmd 崩溃,备用 CP2 通过发起重启到 CP1 来进行接管。
然后,CP2 上触发 hasmd 死机,导致交换机上的双处理器重启以及后续的冷恢复。
- 活动 CP1 崩溃的 pdm
[KSWD-1002], 204763/5877, SLOT 1 | FFDC | CHASSIS, WARNING, , Detected termination of process pdmd:2942
[HAM-1014], 204765/5878, SLOT 1 | CHASSIS, CRITICAL, , Non restartable component (pdm (pid=2942)) died
- 备用 CP2 接管,但备用 CP2 也随 hasmd 崩溃
[HAM-1004], 152469/1316, SLOT 2 | CHASSIS, INFO, BPGLCG01SL35V, Processor rebooted - Reset., reboot.c
- 活动 CP 启动重置待机和冷恢复。
[EM-5012], 204809/0, SLOT 1 | CHASSIS, INFO, BPGLCG01SL35V, start emd FSS_RECOV_COLD
[HAM-1004], 5879, SLOT 1 | CHASSIS, INFO, BPGLCG01SL35V, Processor rebooted - Reset.
[HAM-1004], 5942, SLOT 2 | CHASSIS, INFO, BPGLCG01SL35V, Processor rebooted - Reset.
- Switchshow 可能有永久禁用的端口,并显示“已获取区域”状态:
BASE:FID128:admin> switchshow | grep Area
256 1 32 338840 N16 No_Sync Disabled (Persistent) (Area has been acquired)
258 1 34 338a40 N16 No_Sync Disabled (Persistent) (Area has been acquired)
264 1 40 338040 N16 No_Sync Disabled (Persistent) (Area has been acquired)
266 1 42 338240 N16 No_Sync Disabled (Persistent) (Area has been acquired)
268 1 44 338440 N16 No_Sync Disabled (Persistent) (Area has been acquired)
270 1 46 338640 N16 No_Sync Disabled (Persistent) (Area has been acquired)
272 2 32 339840 N16 No_Sync Disabled (Persistent) (Area has been acquired)
[truncated]
- lscfg show 输出可能显示所有端口均为 -1 状态:
SW0:FID128:admin> lscfg --show
Created switches FIDs(Domain IDs): 128(ds)(51) 127(51) 100(51) 77(51)
Slot 1 2 3 4 5 6 7 8 9 10 11 12
-------------------------------------------------------------------------------
Port
0 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
1 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
2 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
3 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
4 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
5 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
6 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
7 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
8 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
9 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
10 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
11 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
12 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
13 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
14 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
15 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
16 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
17 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
18 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
19 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
20 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
21 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
22 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
23 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
24 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
25 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
26 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
27 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
28 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
29 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
30 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
31 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
32 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
33 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
34 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
35 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
36 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
37 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
38 | | | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
- 当 lscfg --show 输出处于以下状态时,HAfailover 将会失败并显示以下错误:
hafailover >>>
can t failover because system is not ready yet or other LS/HA config is in progress.
Hashow >>>
Local CP (Slot 6, CP0): Active, Cold Recovered
Remote CP (Slot 7, CP1): Standby, Faulted
HA enabled, Heartbeat Up, HA State synchronized
FOS 8.2.x 具有更大的 PORTMAP 缓冲区大小,以支持 FCOE 端口。在 HA 期间从 FOS 8.1.x 同步的数据较小,可能会导致在更高的端口范围中观察到损坏的数据。