Start a Conversation

Unsolved

This post is more than 5 years old

547

October 16th, 2009 10:00

Emulex HBA's goes offline every day at 2:00am

Strange case from a customer. Have a Solaris 8 server with emulex LP-9002 HBA's, attached to brocade switches and a DMX3 array.

Ran an EMCGrab because of this issue where the host loss connectivity every day at 2:00am:

Oct 16 02:00:32 sunsdat06 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@19,600000/fibre-channel@1/sd@2,c (sd1091):
Oct 16 02:00:32 sunsdat06 offline
Oct 16 02:00:32 sunsdat06 scsi: [ID 107833 kern.warning] WARNING: /ssm@0,0/pci@1d,600000/fibre-channel@1/sd@2,c (sd1346):
Oct 16 02:00:32 sunsdat06 offline

Host HBA drivers are little old, I suggested the customer to update them to a most recent version (at least supported by EMC) currently at 6.02f and supported versions are from 6.10gx1 and up.

Don't know if the problem is because of the HBA's or something else within the host. I checked everything in the SAN and it appears to be working perfectly.

Verified the crontab to see if there's a script running that could be causing this and there is one running at XX:01 that apparently has something to do with verifying disk occupacy, but it runs at 2, 6, 10, 14, 18 and 22 (and the problem is only at 2am).

Any ideas about this? Something that I can check?

1 Rookie

 • 

20.4K Posts

October 17th, 2009 16:00

both HBAs go offline at the same time ?

341 Posts

October 19th, 2009 01:00

Hi Jose,

What is sd_io_time parameter set to in the /etc/system file?
What path failover software is in use?

Is it possible that the ECC agent is probing devices at 2am during its repository rebuild?

Conor

2.1K Posts

October 19th, 2009 08:00

Hmmm... good point Conor. We have at least two hosts confirmed in our environment where the Host Agent for ControlCenter took the HBA offline when it scanned during the DCP. We spent weeks troubleshooting and ended up leaving the Host agent off those hosts.
Oddly enough I had some "identical" hosts which didn't have a problem. We never did determine why it happened. Just left it alone.

22 Posts

October 30th, 2009 01:00

Hi. Sorry for the delay.

set sd:sd_io_time=0x3C / The host is using Powerpath 5.1 at this time.

I asked the UNIX personel to stop all ECC agents on the host to see if this solves the issue. If it does I'll see how to fix this agent-related issue.

I'll let you know.

Thank you.
No Events found!

Top