Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1984

September 25th, 2009 09:00

MDS 9509 Oversubscription

We have a few MDS 9509's and while in Device Manager I noticed a few blades that show we may be "Oversubscribed"...

v3.3(1c). I doubt this as one of the 24-port blades (port group 3) only has three active connections and they are all set for 2GB each shared. All of the other ports are in a down state and set for shared.

Other port groups have several 4GB connections but utilization is realitively low.

Is there an easy way to track oversubscription and get alerts when such an event occurs? Does anyone know how I could be oversubscribed with only 3 active ports running at 2GB?

Thanks

141 Posts

November 28th, 2017 07:00

Hi there,

In our efforts to clean up the forum, we came across your question / statement.

If the question / statement is still valid, not expired and you need an update please reach out again and we try to get it answered.

As for now we set it to “answered.”

Regards,

Jim

1 Rookie

 • 

5.7K Posts

September 28th, 2009 04:00

Oversubscription is not an event, but a configuration issue. If you config ports to be used at certain speeds and the sum of the theoretical speeds is over the total blade speed, this is called oversubscription.
This isn't nescesarily a bad thing as long as the hosts never use the total blade capacity. This gives you the opportunity to use relatively cheap (?) blades and create a high density switch.

If you absoluty need to be sure that certain ports can actually get the performance they need (like ISL's), you need to config the port as dedicated at a certain speed. This might mean that some ports need to be shut down as they can't be used at all since no shared bandwidth is left to play with.

I'm not sure, but I think a blade has a total of 48Gbps of bandwidth to share accross all ports. In the event you have a 24 port blade you can configure for example 20 ports to use a max of 2Gbps, 2 ports to a dedicated 4Gbps, which adds up to 48, leaving no room for the last 2 ports, which you need to shut down. Another solution is to config the 2 4Gbps ports as dedicated 4Gbps ports and leave the rest to shared. But you claim to have config'd all ports in shared mode.

A 24 port blade has 4 port groups of 6 ports each (12Gbps each).
A 48 port blade has 4 port groups of 12 ports each (12Gbps each).

My mistake a while ago was to think that ISL's work in shared mode, but they do not. An ISL is dedicated, so if you have a few 4 Gbps ISL's, take this into consideration.

32 Posts

September 28th, 2009 05:00

The message I get from Device Manager appears to be an event. Normally, It shows RX and TX speeds and as long as they are under 1.2GBytes per port group everything appears normal.

But, I sometimes get notification that we are "Oversubscribed...?" This counter appears to be 'real time' so I was trying to find a better way of getting this data on a periodic basis or somehow set up notification to inform us when we may have exceeded capacity.

I think the GUI isn't all that accurate since my example had 3 of the 6 ports active, shared and configured for 2GB all of the other ports are in a down state. I don't understand how that could ever be Oversubscribed"...

1 Rookie

 • 

5.7K Posts

September 28th, 2009 06:00

I'm speechless. I know what oversubscribing is, but I don't understand what you are seeing. Did you buy the 9509 from EMC ? Create a case and wait for an official answer from them; that's what I used to do when I simply needed an answer. Through a service request, which can be a question, EMC has to put something on paper to satisfy you.

;)

32 Posts

September 28th, 2009 07:00

Thanks. We did not purchase them through EMC, but I do have a support ticket open with Cisco. I'll pass along what I find out.

32 Posts

September 30th, 2009 09:00

The response I received from Cisco was they don't have a good way to monitor and report against port groups and specifically the overutilization of 'shared' bandwidth per port group.

They recommended that I export data from Fabric Manager Server or IBM TPC and manually sum up bandwidth utilizations based upon each group...

They also recommended that I not use Device Manager's GUI to monitor possible oversubscription conditions as 'it has some major reporting flaws' - But there doesn't appear to be any other way to monitor this...

They are telling me what I'm seeing is a 'cosmetic issue' and they don't know of any fix in the near future.

I don't like the answers I received, but there doesn't appear to be much I can do about it...

2.2K Posts

September 30th, 2009 09:00

Rob,
Your are correct, Cisco markets the 4Gb line cards as having 96Gbs of 'total bandwidth' which is full duplex so in reality that is 48Gbs in each direction. At a 4Gb line rate Cisco states that the 24-port card is 2:1 oversubscribed and the 48-port card is 4:1 oversubscribed.

There is a section in the following whitepaper that discusses this:
http://www.cisco.com/en/US/prod/collateral/modules/ps5991/prod_white_paper0900aecd8044c807_ps5990_Products_White_Paper.html

Aran

32 Posts

September 30th, 2009 10:00

I've started monitoring the switch a little closer due to port availabiliy constraints. We also have budget constraints that have prevented us from purchasing additional 4GB Blades for the forseeabl future.

I already have several Storage, ISL, and high bandwith connections spread out throughout my available 4GB port groups.

We have a bunch of ESX Hosts and DMX FA ports that need to be connected and I'm concerned about available bandwith especially when the GUI tells me from time to time that we've exceeded available bandwidth.

I was hoping for an easy way of determining what was happening at the switch level with trending reports like Fabric Manager Server has for ISL connections.

2.2K Posts

September 30th, 2009 10:00

Have you been digging into this because of the alert in Device Manager or because you are actually seeing performance bottlenecks on the switch?

The reason I ask is that I tend to monitor the host and storage array performance since that is the easier places to monitor and the switches, when properly configured, are usually not the bottleneck.

2.2K Posts

September 30th, 2009 12:00

I have similar challenges and at this point have nearly used up all the ports on the 48-port and 24-port modules in a pair of production MDS9509s. But even though just about every port is in use hardly any hosts every run at their actual line rate. You can perform spot checks of the bandwidth in use by right clicking on a module in Device Manager and selecting Check Oversubscription. It will list the bandwidth in use for each port group on the line card.

This is not a good monitoring method but if you perform a few spot checks during what you know to be heavy usage periods you may find that your fears are unfounded. I have seen in our environment, with lots of VMs (concentrated i/o) and heavy use OLTP databases, clariions, and dmx4, that while my ports are just about all filled up I still have plenty of bandwidth available.

32 Posts

September 30th, 2009 12:00

Thanks for the info...
This was how I found out we were 'oversubscribed'... Device manager indicates we 'oversubscribe' a few of our port groups from time to time.

Problem is, we only use 3 of the 6 ports per group in our 24 port blades... After sending screen shots and show tech support dumps, etc. Cisco came back and told us not to use Device Manager as it is not very reliable - Which I now agree with them based on the info posted above...

So, we don't have a good way of really knowing what the true aggregate throughput is without manual reporting via spreadsheets.

I agree that we probably are not really oversubscribed, but before I add FA Ports I really need to make sure - They don't appear to have a good way of monitoring this...

I would like to add a couple 12-port blades into each of our core 9509's for ISL and storage connectivity, but then we start to loose a lot of port density in the core and will need to move connections to the edge - Rewire & No Budget - Yuck

32 Posts

December 11th, 2009 06:00

Was reading through a couple release notes and found one 4.2(1a) that points to a possible solution:

Port Group Monitoring

The new port-group-monitor command allows you to monitor port groups that go above and below a configurable bandwidth threshold. When the traffic for a particular port group reaches 80 percent of the maximum supported bandwidth for that port group (rx and tx), NX-OS generates a syslog message that identifies individual port bandwidth (for rx and tx). A syslog message is generated for a rising threshold and for a falling threshold. When port group monitoring is enabled, monitoring occurs every ten seconds.

Where Documented

For information about the port group monitoring feature, see the Cisco MDS 9000 Family NX-OS Interfaces Configuration Guide.

We don't have any of our switches at this level so I can't test today, but wondered if anyone has tried this?

Thanks,

John

No Events found!

Top