Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

13253

October 5th, 2009 03:00

ISM Book Chapter 2 - performance and IOPS

Hey all,

I was going over chapter 2 in the ISM book once more, and I continue to have some trouble with chapter 2 of the book. In every other chapter you find explanations for things, but with the formulas (Little's law, utilization law, service time, etc) you get a lot of condensed info at once and I found (and still find) this one of the hardest parts to congest in the book.

So, to see if I actually understood the chapter, I'm just picking a random drive and crunched some numbers here to see if I actually understood everything, this might help others, and if I made a mistake somewhere other might be able to help me?

So, as a reference drive, I took the Seagate Cheatah 15K.6 300GB drive with a FC interface (Model ST3450856FC). Data spec sheet can be found here.

Specs:

Model Number ST3450856FC
Interface Fibre Channel 4 Gb/s
Cache 16MB
Capacity 450 GB
Areal density (avg) 165 Gbits/inch2
Guaranteed Sectors

879,097,968

Spindle Speed 15,000 rpm
Sustained data transfer rate 171,000 null
Average latency 2,0 msec
Random read seek time 3,4 msec
Random write seek time 3,9 msec

Looks like this should contain everything we need to do the calculation. Let's also assume we are connected to a SCSI3 U320 controller and running 4 disks in a raid 1+0 configuration. We have a server set up with a DB that has block sizes ranging from 4KB up to 64KB

The problem I face each time is that some of the formulas explained in the book seem to depend on other formulas, but the explanation where I actually get this information from is a bit unclear to me. It talks about the service time but never states how to calculate the service time, or where to get the service time from. Perhaps I overlooked it or perhaps it might be a good idea to provide some more explanation to these equations in a new print of the book?

As far as I came with my number crunching I got to around 250 IOPS for this drive. Now for one I am not certain that this is correct, and second of all somehow my calculations seem to vary a bit.

Can anyone tell me if this is correct, and perhaps explain how to calculate this number a bit more clearly than in the book?

October 6th, 2009 03:00

< Re-Post to change one letter in one of the quations below >     

Hello Sebastian,

I do not want to comment on the details of any particular disk. However, here is an explanation as to how you might approach some of these statistics (and ideally benchmark/test).

Let  S = Service Time = average time taken to complete an I/O request.

So,  S = {mechanical delay reading from platters} + {transfer time through the internal drive buffer onto the external transport}.

So,  S = M + T

Where     M = mechanical delay reading from platters
And          T = transfer time through the internal drive buffer onto the external transport

Consider M first.

M = {time to seek to correct track} + {average rotational latency}

M = {time to seek to correct track} + {time for 1/2 of a full revolution}   .............. (average of 1/2 is defined by ISM)

M = {3.6 ms} + { 1/2 x  60/15000 x 1000 ms}      ............. (I just chose 3.6 as an example)

M = {3.6 ms} + {2 ms} = 5.6ms

Now consider T. Well, we do not have enough information, but that is OK for the sake of this. Let's imagine that the drive internal transfer rate is a modest 40MB/s and that the application random I/O size is 4KB.

T = 4/(40 x 1000) = 0.1 ms     (i.e. A small percentage of S)

So, S = 5.6 + 0.1 ms/IO = 5.7 ms/IO    ................ <<<< reason for the re-post

Therefore IO/s = IOPS = 1000/5.7 = 175.

Now, around Little's law and the disk controller, you do not want disk controller Utilization to go above 70%, otherwise the response time increases exponentially.

So,

0.7 x 175 = 122 IOPS as a working limit for the disk. If you have an application requiring 1000 IOPS, then a recommended drive count (to manage random performance) would be 1000/122 ~ 9. Always nice to be able to benchmark and test these figures, especially to test our understanding.

Of course, we then need to build redundancy using a suitable RAID level.

HTH, best regards, Richard.

October 6th, 2009 03:00

Hello Sebastian,

I do not want to comment on the details of any particular disk. However, here is as an explanation as to how you might approach some of these statistics (and ideally benchmark/test).

Let  S = Service Time = average time taken to complete an I/O request.

So,  S = {mechanical delay reading from platters} + {transfer time through the internal drive buffer onto the external transport}.

So,  S = M + T

Where     M = mechanical delay reading from platters
And          T = transfer time through the internal drive buffer onto the external transport

Consider M first.

M = {time to seek to correct track} + {average rotational latency}

M = {time to seek to correct track} + {time for 1/2 of a full revolution}   .............     (average of 1/2 is defined by ISM)

M = {3.6 ms} + { 1/2 x  60/15000 x 1000 ms}      ............. (I just chose 3.6 as an example)

M = {3.6 ms} + {2 ms} = 5.6ms

Now consider T. Well, we do not have enough information, but that is OK for the sake of this. Let's imagine that the drive internal transfer rate is a modest 40MB/s and that the application random I/O size is 4KB.

T = 4/(40 x 1000) = 0.1 ms     (i.e. A small percentage of S)

So, M = 5.6 + 0.1 ms/IO = 5.7 ms/IO

Therefore IO/s = IOPS = 1000/5.7 = 175.

Now, around Little's law and the disk controller, you do not want disk controller Utilization to go above 70%, otherwise the response time increases exponentially.

So,

0.7 x 175 = 122 IOPS as a working limit for the disk. If you have an application requiring 1000 IOPS, then a recommended drive count (to manage random performance) would be 1000/122 ~ 9. Always nice to be able to benchmark and test these figures, especially to test our understanding.

Of course, we then need to build redundancy using a suitable RAID level.

HTH, best regards, Richard.

47 Posts

October 6th, 2009 04:00

Hey Richard,

first of all thanks so much for you answer!

I see now where part of my error lies. The calculation for the time needed for 1/2 of a full revolution. Somehow I got my math screwed up in that part, but regarding little's law, do we really just take the maximum amount of theoretical IOPS that a controller can handle, or do we take the maximum amount of IOPS for a single disk and just use 70% of that? Just from a theoretical point of view I would say that with one disk, the limit is not with the controller but with the physical disk itself? As soon as we start talking about arrays I would assume the limit would be located with the controller since we get a combined value for the I/O that an array of disks can handle. Or have I overlooked something (again)?

Cheers,

Bas

October 6th, 2009 05:00

Hi Bas,

Thanks.

Actually in my explanation I am referring to the disk controller (not a RAID controller). The disk controller is physically part of the drive. When sizing for random I/O we should use a figure of 70% of the maximum IOPS that the disk can theoretically support (disk here being the combination of platters, mechanics and disk controller). If we go above 70% of disk controller utilization then response times can increase very dramatically. In my example calculation, I have used 70%. (The disk controller is doing a lot of good work - e.g. command re-ordering / tag queuing etc).

You bring up a good point. If the drives are behind a RAID controller (with intelligent cache algorithms, and support for read-cache hits, and write-behind cache operations), then things can only improve. In my example calculation, you might decide to use 9 disks as a basis/foundation for a benchmark. And then you can re-benchmark with a different drive count, and work to understand the results. You will never really know for sure until you have performed a benchmark in some way.

Cheers, Richard.

47 Posts

October 6th, 2009 06:00

Hmm, this poses some interesting questions. When I take in to account things like smaller external disk arrays (for example HP MSA30 or MSA50) I don't always have the full spec on the disks used, and my performance will vary depending on the configuration of the array that I decide to implement. In such a case I need to go with the amount required for my application (or perhaps even calculate the maximum my controller or adapter can give me) and check with the vendor if they are able to match a product to my requirements.

One things that comes to mind is if the manufacturer of such solutions also calculated according to these calculations and gives out numbers with a certain percentage of saturation? As far as I can recollect I haven't seen that many manufacturers post numbers on performance in combination with assumed saturation (be it EMC or any other provider).

Could it be that it all boils down to "pinning down" providers to the SLA's or performance spec they guarantee? If so, this info is useful in a sense that it helps me understand performance wise, but not really relevant when implementing a new storage system? I calculate what performance I need and go to the manufacturer of my choice and ask if they are willing to put in writing that their solution can live up to the IOPS I require?

I know, probably the wrong place to ask, and not really something technical, but still interesting in my opinion.

Cheers,

Bas

October 6th, 2009 07:00

Well... I hadn't quite finished, but I have just learnt that if you hit (as I do automatically in my preferred editor of choice), then this will post the page. Anyway, I think I had just about finished, apart from a spell-check.

Cheers, Richard.

October 6th, 2009 07:00

Hello Bas,

More good points. I think the example calculation I gave is just to help you think about configuring the correct number of disks that are required in the stripe. If you cannot get any specifications on the disk capabilities, then that may be a problem. Ideally, you would understand the I/O profile of your application, and the disk capabilities as published by the disk vendor, and then size an appropriate number of disks for the stripe. Oh, and it would be very nice indeed if we always had the luxury and time to benchmark. I realise commercially, many engineers are under pressure to complete the task yesterday   I agree, I have not seen too much reporting of saturation, but the 70% guideline can be tested through the benchmarking process.

As you know, as far as performance is concerned, the performance 'stack' is huge. Consider that a software house writes a tele-sales order entry system application for a customer; and the order entry clerk always wants sub-second response times when keying-in the customer's order and moving from one screen to the next (effectlively defining an application SLA). Here are just some of the physical and logical objects in the performance 'stack' (I thought it might be a good place to reference an 'incomplete' list - I expect many people would want to add additional points).

- how well does the compiler optimize

- how well is the application coded

- has the relational database been designed according to sound relational database design principles

- how well are structured queries optimized (relating to the previous point)

- does the filesystem have a suitable block size

- tuning of kernel drivers

- server CPU utilization

- server memory utilization

- OS memory paging concerns

- qdepth on HBA ports

- HBA fcode optimization

- physical integrity of transport from host to FC switch (injected bit errors leading to kernel driver timeouts)

- ASIC layout on fabric switches

- switch <-> switch flow control

- supported queue depth on storage ports

- storage port utilization (cf. fan-out ratio)

- RAID controller cache algorithms

- how well has the RAID stripe been configured (this is partly covered in the discussion above)

- is the logical address space seen by the host application aligned on a stripe boundary

- physical placement of data on the platters

- drive mechancs and supported IOPS

In my example, if the software house is going to provide a sub-second SLA, then they would really need to work with the customer, and the customer's equipment in order to benchmark and understand the results.

47 Posts

October 6th, 2009 08:00

++ will do the same for you in Firefox, but I know what that feels like.

Let me start by agreeing with you. Benchmarking and proof of concepts are highly valuable in my opinion and be performed if you have the chance to do so. Also, the 70% rule is a good one (I found that it's the same for CPU load on a system. As soon as you go >70% load, your I/O performance will plummet) but it would be interesting to know if the manufacturers of the various products/arrays also implemented this 70% guideline and communicate their IOPS based on the 70% guideline. Perhaps you know how EMC handles this?

I just looked over your list, and as a true techy you list some valid points there. You could also include things like IO optimization of driver and kernel source code or just plain and simple things like RAM type (clock speeds of RAM, etc) and CPU speeds. Although some of those things will obviously have a greater impact on the total amount of max. IOPS than others.

If this is actually the only way to get data that you can rely upon you really have no other choice then working with people who know this by heart and are experts in the implementation scenario, benchmark all viable options you are considering, or perhaps even (taking new tech like cloud in to account) agree upon a certain service/performance level for parts of the stack and go see a service provider for your requirements.

Perhaps we can add to the list a bit more, and people might be able to comment on items and give some comments on the item itself and how they found it relates to the I/O performance of a system as a whole?

11 Posts

October 20th, 2009 00:00

Hello Richard

Small remark about Little’s Law.
If you check the classical definition of Little’s Law (wikipedia ) you will not find any information about exponential growth and saturation. Actual Little’s Law corresponds to linear dependency. Exponential response time growth follows from M/M/m model of Queuing theory with some presumptions. Of course it is connected with Law but not follows from it.

I think you will refer to the information given from EMC performance courses but from my perspective there is terminological inaccuracy there. We shouldn’t use term Little’s Law in that context.

Anyway you are right in the main idea. According to theoretical and experimental data, capacity zone of utilization is about 62-76% with saturation point in ~72%.

With regards
Vasily

December 2nd, 2010 11:00

Hi Vasily.... thanks for that (and sorry for the delay in my reply.... just looking through some old posts..... better late than never.... Cheers and best wishes).

5 Practitioner

 • 

274.2K Posts

January 21st, 2011 01:00

Hello Richard,

shall I ask you a question regarding the "Let's imagine that the drive internal transfer rate is a modest 40MB/s"?

I'm now trying to calculate the IOPS for a single disk. here is the calculation method in our ISM books:

the application I/O size is 4KB, the disk size is 73GB,rotation speed is 15000 rpm,average seek time is 5ms, transfer rate is 40MB/s.

(1) calculate the time for one I/O=seek time+latency+transfer time,即5+1/2*(60*1000)/15000+4*1000/(40*1024)=7.1 ms。

(2) calculate the max IOPS: 1*1000/7.1=140

however, in real cases, the transfer rate is not listed in the specifications. see doc here: http://www.emc.com/collateral/hardware/specification-sheet/h5506-emc-clariion-cx4-960-ss.pdf

How can I do the IOPS estimation in real cases? is 40MB/s a general speed we could use for this kind of estimation? is it interface transfer rate or the disk-to-buffer transfer rate?

Thanks,

Betty

January 21st, 2011 04:00

Hello Betty,

You would really need to look at the technical specification for the drive itself, as published by the drive manufacturer.

For example, the following link is for a Seagate drive that has a 'Typical Sustained' internal transfer rate of 150MB/s (page 2).

http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah_ns_2.pdf

Does that help?

Thanks, Richard.

January 21st, 2011 04:00

Hello again Betty,

As regards the second part of your question, it is Not the external/interface transfer rate, it is the internal transfer rate reading from the platters themselves.

Thanks, Richard.

August 25th, 2012 09:00

Am surprised how does it goes in RAID controller and SAN situations ? all above calculations ?

Richard and Sebestian touched it a one point !!

But anyway , Reall good info here.

also am not sure if OS manipilates the Read write size , as it caters to various applications at the same time and could combine the IO going to same target. 

They (Os) all use file system caching (in memory), so applications might not get affected at all .

Thanks

Nee

September 4th, 2014 01:00

Hello Nee,

Regarding your OS questions, there is a really excellent undergraduate text book written by Professor Andrew S. Tanenbaum called "Structured Computer Organization". It really is an excellent read (currently at its 6th edition). There is so much detail that could be discussed in your questions, this book would, I'm sure, provide all the answers you need (and much more). I was introduced to this book back in 1988, and I still have occasions today when I refer to it. (I believe the 6th edition update was 2012.)

Best regards, Richard.

No Events found!

Top