Demystifying IO Performance


In this article we are going to look at some of the indicators that help us determine the speed (performance) of an IO system.

Apart from data corruption, it is performance that gives sleepless nights to a storage developer. And this is rightly so — given the diverse workloads our applications generate these days and the high stakes involved, an IO (input/output) bottleneck can make or break a business.

Let us attempt to demystify those factors that make up the performance metrics.

1. IOPS: This is the number of inputs/outputs, or more simply, the number of times data has passed through a device in one second. So, in the case of a hard disk, this refers to the number of times data has been read from or written to disk in one second. IOPS gives a bird’s eye view of the underlying device’s capability. However, just as the top speed of a car does not reflect its real performance, IOPS is not a comprehensive metric to understand a storage system’s performance. IOPS is dependent on a number of factors which we will look at, one by one.

2. Latency: This is the delay (or time taken) to complete one IO and is measured in milli or microseconds. If I read or write 1KB data to a device, how much time it takes for that read or write operation to complete is the latency of my device.

If 10,000 of such write operations take one second to complete, my IOPS is 10,000 and latency is 0.0001 seconds, which is 100 microseconds or 0.1 milliseconds.

3. IO size: Having looked at IOPS and latency, I still do not get a complete picture of the storage system’s performance unless I look at the size of my IO. The size and pattern of IO are very important factors in defining a workload type which, in turn, dictates my storage system’s design and subsequent opex or capex investments. So, if Ajay claims his vehicle is very powerful because it runs 100 miles in one hour, Vijay counters it by saying his vehicle is even more powerful because it not only runs at 50 miles per hour but also carries a load of 2.5 tonnes. IO size is the amount of data that is being read or written in one IO request. It is comparable to the workload or payload that is being served.

A device that reads 1KB of data 10,000 times in one second is not necessarily faster than a device that writes 16KB of data 5000 times in one second. This is because writes are more intensive operations than reads, and 16KB of data is being written in the latter case, compared to just 1KB of data read in the case of the former.

4. IO pattern: Most common IO patterns are sequential and random, as well as various combinations of both. Sequential IOs read or write on disk blocks in a consecutive fashion. As a result, the head of the disk need not move back and forth to find the correct block to write to. This saves time and expensive disk platter rotation cycles. Sequential reads and writes can further be augmented by applying readahead or prefetch algorithms to predict or preempt the next read or write, and hence speed up the IO cycle further. In case of random IOs, the time to seek a block increases the time to perform read or write on that particular block. As a result, random read/write operations are taxing.

A device that writes 10KB of random data 5000 times in one second is much faster than a device that reads sequential data of 10KB 10,000 times in one second. However, it goes without saying that you will buy the device based on your workload. If your workload requires sequential reads, it will be a bad investment in terms of ROI to buy a device on the basis of its random write performance.

5. Queue depth: In a Utopian world, there would just be one thread doing read and write operations. However, in the real world, the IO world consists of hundreds or even thousands of threads performing IOs on multiple disks. These disks are connected to a controller which, in turn, could be connected to another controller. At its basic unit, an IO is just a SCSI (or relevant protocol) command sent to a SCSI device to perform an operation on a SCSI disk. Every IO request is an entry in the queue of the device. Queue depth is the maximum number of IO requests that can be executed at one time. The longer the queue depth, the larger the number of IO requests that can be submitted. However, if the device is not able to process those requests faster, we might end up in a QUEUE_FULL state, which results in increased latency. Queue depth is a tunable parameter from the OS controlling the disk controller. If the latency of a device is very high, it makes sense to examine its queue depth.

6. Throughput: I have deliberately kept this metric for the end as most people are familiar with it and evaluate a storage device based on throughput. To put it simply, throughput is the product of IOPS and IO size. So if a device reads and writes 4KB of data 10,000 times in one second, its throughput will be 40Mbps.

Then there are partial IOs, synchronous or asynchronous IOs, direct or cached IOs and a few other variants. However, they shift the gear towards application-specific workloads. A deep understanding of these performance indicators can help users make the right choices in storage devices, and create effective strategies to configure and tune the system such that there are no bottlenecks in an IO path, and applications give optimal performance and work seamlessly. Learning about configurations and tuning strategies could be a part of our next article on performance.


Please enter your comment!
Please enter your name here