Hello. In this video I'd like to show you one way to examine the results of a storport trace. This is following up on a previous video that explained ‘What is a storport trace?’, and how to capture one. For this demonstration we'll be using a storport ETL file that has been previously collected.
If you don't have this file already I suggest checking out that previous video for details on how to obtain it. I will be putting a link to it in the description below. So, this is the storport .etl file that I will use for this demonstration. I will need PowerShell, Microsoft Excel, and Xperf. Xperf is a command line tool that is installed as part of the Windows Performance Toolkit which you can download directly from Microsoft.
So, here I have the storport .etl file. To open it in Microsoft Excel I first have to convert it to a comma delimited file, or CSV. To do so I must use Xperf. So, in Powershell I’ll type Xperf. I have previously installed the Windows Performance Toolkit on this system. You can see here some of the Xperf usage options. To convert the ETL into CSV I will need to type ‘xperf’, ‘-I’ to specify an input file and then the path and name of a ETL file that will be converted into CSV. Then ‘-o’ and then the path and name of the CSV, destination CSV. So I'll just hit enter.
I need to wait for this conversion to be completed. When this is done we have a storport .csv file that we can then open in Microsoft Excel. Just give it a few seconds. Storport traces tend to be pretty heavy, especially if the trace ran for an extended number of minutes. This is why it's recommended to keep the collection sample time to a minimum. So, you can see here it's finished doing the transformation into CSV. I'll just double click it to open it in Excel. Usually one minute of data is enough. Just remember the longer the storport runs the heavier the file will be, and the less easy will be to manipulate it to get the required information.
Another reason to keep the storport data sample time low is that there's a limited number of rows that you can work with in Microsoft Excel, and a long storport trace can easily have more rows than that limit. This is actually what's happening here. If your storport trace has more than one million rows you will get this message when trying to open the CSV file in Microsoft Excel. We can still work with this but I will show later how we can use PowerShell to filter the trace before opening it in Excel to view just the data we need for troubleshooting. For now we'll just click ‘OK’, the file opens and there's a lot of info. Here for instance, here we can see the OS version, the trace size.
We can also calculate the trace start time with this number. We see that the trace ran for six minutes, and the location of it. The actual storport measurements begin down here and if we go all the way down in this document you can see that this file or this storport trace has more rows than Microsoft Excel supports so it has trim the file short. Also, another detail I want to highlight here is that the actual, when troubleshooting a performance problem, the actual storport trace information that we're interested in, in this ETL file, they all start with this line right here – 'Microsoft-Windows-StorPort/Port/win:info'. So, basically this CSV file needs to be edited so that it shows only the lines that we're interested in. Before we can open it in Excel, for now I'll just close this file, don't save the changes, and I will go back to PowerShell and type this command right here.
I will be including it in the video description. So I'll just hit enter to begin running it while I explain what it does. Basically we're going to scan all of the lines in this storport.csv file that we originally created from the ETL and we're scanning it for the lines that show or start with 'Microsoft-Windows-StorPort/Port/win:info' and then we're putting all of those lines into a new file that I've named ‘filtered_storport.csv’. I've included, like I said, this command in the video description.
This will run for a few seconds or minutes depending on the size of the trace and in the end I will open this ‘filtered_storport.csv’ file. We can see that it's already completed so I will just double click this file, andwait for it to open. OK, so if we scroll all the way down we can see first of all that this file has way less rows than the original one, and this is this is because it's only showing the lines that we're interested in. And like I said the lines start with this text right here. If we go back to the top, the first two rows, you will notice that they're actually duplicated and these are the column titles. So I will remove one of them, we don't need the two of them, so just remove one line.
Then there are three columns that we will be discussing. First of all is ‘TimeStamp’. This is basically a clock, and then we have ‘RequestDuration’, expand it here. You can see that this is detailed in hundreds of nanoseconds, and we're also interested in this ‘Target’ column here. This represents the disk number in which the operation took place. You can see for instance - I'll add a filter first of all - and you can see here that this storport trace was collected on a computer that had a disk 0, disk 1 and also disc 2. Request duration is pretty much latency, how long the IO operation took to complete, and it is measured, like I said, in hundreds of nanoseconds. So let's highlight the ‘RequestDuration’ column and take a look down here.
We can see the average latency is this number reported, and remember this is in hundreds of nanoseconds. So, to convert it to milliseconds for instance we have to multiply by a hundred and then divide by a million, or it's just easier to shift this decimal dot four places. So, if we do that we have 8.8 milliseconds is the average latency of all disks in this storport trace. I will change or filter by ‘Target’ and just show the contents of disk or the data for disk number two. Go back to the RequestDuration’ column and take a look at the average.
Notice that this average latency here is just 0.06 milliseconds, but if I go back to ‘Target’ and filter now to show the data for disk number one. Highlight that same column, the average goes back to around the nine milliseconds that we saw earlier when all data for all disks was combined. So, since we have a clock we can insert the chart to analyze the trace in a graphical way. For instance we can use this ‘TimeStamp’ column here and ‘RequestDuration’ and I will insert the scatter chart. Let me move this to a worksheet.
Let me filter by this number two, and go back to the chart. Here we see a disk with latencies way below one millisecond number. Notice the number of zeros here in this scale. Disk number two in this computer where we captured the storport trace happens to be a Solid State Drive, and as you can see performance is really good. Let's go back to the ‘Target’ column and now filter to show data for disk number one, and then go back to the chart. Notice that the number of zeros here in the scale has increased and the graph shows that this disk has latency values mostly around 100 milliseconds.
Somewhere around 200, 300 some even make it to 600 or 700 milliseconds. So this, this number one in this server, this particular server happens to be a regular rotational hard drive and as you can see it's having issues keeping up with the workload. So this is how you can use Xperf, Powershell and Microsoft Excel to do a basic analysis of a storport trace.
I would like to thank Luis A. for his knowledge contribution for this video, and also thank you for watching.