vic_engle

1 Rookie

•

48 Posts

0

7704

October 24th, 2014 07:00

lpfc_nodev_tmo

Question for anyone with linux hosts in multipathing and VMAX storage. The default setting for lpfc_nodev_tmo is 30 seconds. EMC recommends 10 seconds if powerpath is in use. Same for VxDMP and MPIO. Obviously, lowering this to 10 seconds will result in a faster path failover but is there any real issue with leaving this at default other than slightly slower path failovers? .

Responses(6)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

May 4th, 2016 23:00

fast_io_fail_tmo is a timeout value of the HBA and not of PowerPath. It determines how long SCSI will wait after an issue is detected on the remote port before failing an IO.

From PowerPath for Linux 6.0.0, there are two timeout parameters that can be configured in PowerPath.

path_retry_timeout: This is set to 5 seconds by default. When PowerPath is active, it uses this value to change the fast_io_fail_tmo of the HBAs. Thereby SCSI will fail the IOs faster to us and PowerPath can retry that IO on another active alive path.

all_paths_dead_retry_timeout: This is set to 'default' by default. By 'default' it means it will take the 'dev_loss_tmo' value of the HBA and it will keep retying the IOs for that long AFTER an all paths are marked as dead to a LUN. if any of the paths come back alive before this timeout, then the IO will be retried on that path and will succeed.

Instead of 'default' it can be set to any user desired value. In case, it is set to any other value, then care must be taken to set it back to 'default' or atleast 30 seconds before a Symm NDU is done. Because in case of Symm NDU, all the paths can briefly go dead and a low all_paths_dead_rety_timeout may mean that the IO is failed back up before any of the paths can become alive.

Information on both these commands and their usage can be found in PowerPath 6.x CLI and message reference guide.

HurseB

11 Posts

2

October 24th, 2014 10:00

The main issue is native multipathing waits (as does PowerPath) until being told by the host that the device is not accessible. It relies on the scsi timeout, then the lpfc timeout. Where this can become an issue is that while it will eventually failover, it causes the host applications to wait on IO for an additional 20 seconds. Applications may not be able to survive this long of a wait. This also can cause issues when doing things like NDU upgrades of storage.

Per the emulex host connectivity guide, it is a required hba setting:

https://support.emc.com/docu6349_Host-Connectivity-with-Emulex-Fibre-Channel-Host-Bus-Adapters-%28HBAs%29-and-Fibre-Channel-over-Ethernet-Converged-Network-Adapters-%28CNAs%29-for-the-Linux-Environments.pdf

On page 88 see:

“* If EMC PowerPath,® Veritas DMP, or Linux native multipath (DM-MPIO) is installed, lpfc_nodev_tmo must be set to 10”

This indicates that the value must be set to 10.

(Note there is a slight typo in the doc where P for PowerPath is in front of EMC which I have corrected below.)

modinfo lpfc | grep devloss

parm: lpfc_devloss_tmo:Seconds driver will hold I/O waiting for a device to come back (int)

modinfo lpfc | grep nodev

parm: lpfc_nodev_tmo:Seconds driver will hold I/O waiting for a device to come back (int)

vic_engle

1 Rookie

•

48 Posts

1

October 25th, 2014 07:00

Thanks. Follow-up question. When an HBA nodev timeout is actually reached, is that the mechanism that triggers path failover within powerpath or does powerpath have its own internal timeout for nodev?

Alex_Ye

109 Posts

1

October 26th, 2014 14:00

PowerPath doesn't have any timeout setting of its own. So when HBA nodev timeout is reached (and that is when PowerPath gets the notification from HBA driver), PowerPath would immediately fail the corresponding path.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

October 27th, 2014 03:00

I believe lpfc_nodev_tmo parameter has been deprecated and dev_loss_tmo
is used instead now. There are two main parameters that determine how long the
lower layers hold an IO when a path failure occurs before returning to upper
layers like the multipathing layer. They are:

fast_io_fail_tmo – This value determines how long an IO is held in
the lower layers before returning either a success or failure for an IO.
By default, this value is not set and hence this parameter does not normally
have any effect on a host

dev_loss_tmo – This value determines for how long an IO should be
held when the scsi device goes away (on a path failure). By default
this value is set to 30 seconds for lpfc drivers.

Prior to PowerPath for Linux 6.0.0, PowerPath on install and start
does not modify the above parameters. Hence, on a path failure, an IO can
hang for upto 30 seconds (the dev_loss_tmo value) in the lower layers before an
error is returned to PowerPath which can then redirect the IO to another valid
path. Hence, prior to 6.0.0, if customers change the lpfc_nodev_tmo value
from its default of 30 seconds to 10 seconds, then IO will not hang for more
than 10 seconds on a path failure.

AFAIK, there is only disadvantage of lowering the dev_loss_tmo
value from its default value. In case of Symmetrix/VMAX NDU, all paths
can go dead briefly. In such a scenario, if the dev_loss_tmo value
is lowered to 10 seconds, and if the Symm paths don’t come back within 10
seconds during a NDU, then there are chances of IO failures.

From PowerPath for Linux 6.0.0, things have changed a bit.
On install and start, PowerPath will modify the fast_io_fail_tmo value to 5
seconds. This means that on a path failure, the IO will hang for no more
than 5 seconds. This parameter can also be changed by users through
a new powermt command (set path_dead_retry_timeout). In case of all
paths failure (such as during Symm NDU), then PowerPath will retry the IO for a
period of time that is equivalent to the dev_loss_tmo value of that port.

Also, from PowerPath for Linux 6.0.0, users do not have change the
HBA timeout values directly.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

May 3rd, 2016 15:00

Tangophi,

I'm having difficult to find information regarding fast_io_fail_tmo. I took a look at powerpath for Linux install / admin guide and release notes for version 6 and found few information.

I just found more information at Powerpath Family 6.x CLI and system message about path_retry_timeout

I figure out that if we set any value we dont need to change any HBA parameter like devloss. And about this value I read some old papers to don't set devloss less than 10 due to vmax/symm NDU and this powerpath default value is 5, should we increase this value using path_retry_timout to 10 too??

tks in advance,

View All

No Events found!