Tutorial on Monitoring ScaleIO Systems for ScaleIO
Publicado el set. 05, 2024
This Video will demonstrate on the Monitoring ScaleIO Systems for the ScaleIO.
In this video, we will be demonstrating how to monitor your scale software defined storage system and investigate performance issues with the scale. There are many options available for managing and monitoring your so system foremost, among the monitoring options for most enterprises is SN MP and SCAL provides full support with the scale mid file included in the user guide. And in every scale cluster installation, this feature does need to be enabled as it is off by default. And depending on the SNP monitoring and management tool used, some configuration may be required to enable email alerts.
SCAL IO also exposes APIS through the software developer's kit allowing administrators to interact via their custom applications personally. My favorite tool for performance monitoring grana can be leveraged to show performance both live and historical, providing a relative view of system performance and allowing for more context when evaluating performance over time. The focus of our demonstration today is the scale aoi. This is the native scale monitoring and management tool. Let's go ahead and dive into our demo and see what the go allows us to monitor when you log into the scale oui you'll be presented with the dashboard, the dashboard shows a high level view of the system with capacity system performance and configuration info.
The number of volumes storage, consumers and suppliers, etcetera means this is the view that most operators or administrators will have open all of the time. Looking at these sections individually. The capacity diagram is color coded showing the possible states. The performance module shows both throughput ops and can be toggled to highlight either can also show an average of the last 10 seconds or simply display the latest measurement. Oh, sorry about that. Excuse me a minute. Hello. OK. Which application? All right, I'll get right on it. OK. An application owner just called and said the jobs are taking about 50% longer than usual to run. He wants us to see if there are any issues with the storage with our go. We already open.
We can change our dashboard to only show the specific storage pool that this application is using. We see the volumes are online and appear to be doing quite a bit of work. Certainly, no red flags here. So let's take a deeper look into the volume details. We can start by looking at the volume hosting application under the front end tab expanding out the SSD pool. We see our database volume right there, clicking the arrows up above shows us the performance data which looked to be in line with our expectations for both reads and writes. But this doesn't rule out a lower level issue. So we'll need to investigate the storage supplier details which we do by selecting the back end tab above.
We'll expand this out to show the SSD pool here. We can see that IO is distributed fairly evenly across all the devices in the storage pool. This defaults to showing performance for all activities including any back end data processing, showing just the application performance limits the display to just the storage consumer based IO the application owner said jobs were taking longer to run which points to a latency issue. Viewing the device latency will show us the response times while fairly busy. These response times are well under half a millisecond did not appear to be any problems. Here, we do have some system alerts as indicated on the dashboard earlier. Maybe one of those is impacting performance here.
We see that there are three nodes with an empty CPU socket, but we know they were ordered that way. So there's no reason for concern. Don't seem to be any performance red flags here. Let's view the hardware details in health. Before we provide our analysis and conclusions back to the application. This shows a selectable image of the nodes with the properties from the selected device shown on the right, the no properties show the IPs and software versions for the hypervisor and scale a virtual machine and allows us to verify there are no temperature or voltage issues, selecting an SSD will show us its details. We don't see any media errors and endurance indicates there's plenty of life left. Finally, let's check the data network interfaces to confirm they are all up and operational.
It appears all of our 10 gigabit data connections are up and healthy. Whatever the performance issue is, it does not appear to be the fault of the storage, having looked into the storage performance and found no issues. Our next step would be to report our findings to the application owner and suggest she contacts the virtualization and network teams for further investigation. We hope you found this demonstration of monitoring and troubleshooting performance within the scale gooey helpful and consider partnering with CAL O to power your data center transformation.