How to Use a Disk Throughput Tester to Diagnose Slow Storage

Disk Throughput Tester: Tools, Methodology, and Best Practices

Measuring disk throughput accurately is essential for diagnosing storage bottlenecks, validating system performance, and sizing infrastructure. This article covers the tools to use, a step-by-step methodology for reliable results, and practical best practices to make your measurements meaningful and repeatable.

Key Concepts

Throughput: The volume of data transferred per second (typically MB/s or GB/s).
IOPS: Input/output operations per second; important for small-random workloads.
Sequential vs Random: Sequential reads/writes move contiguous blocks and show peak bandwidth; random patterns stress latency and IOPS.
Block Size (I/O size): Affects throughput—larger blocks favor higher throughput, smaller blocks increase IOPS demand.
Queue Depth: Number of outstanding I/O requests—higher depths can improve throughput on devices that support concurrency.
Read vs Write: Some storage performs differently for reads and writes; test both.
Warm vs Cold Cache: Cached hits inflate numbers; ensure you measure both cached and uncached conditions.

Recommended Tools

fio — Flexible I/O tester: supports many workloads, scripting, and output formats.
dd — Simple sequential read/write checks (useful for quick sanity checks).
iozone — Filesystem and file I/O benchmark with varied test types.
bonnie++ — Filesystem benchmark focusing on large-file operations.
CrystalDiskMark — GUI for Windows, easy sequential/random tests.
perf or blktrace (Linux) — For low-level tracing and deeper analysis.

Test Environment Preparation

Isolate the device: Unmount filesystems or run tests on raw block devices when possible to avoid filesystem effects unless you intend to measure filesystem performance.
Ensure reproducible state: Reboot or flush caches between test sets when needed.
Disable background jobs: Stop backups, indexing, antivirus scans, and other I/O-heavy services.
Record system specs: CPU, RAM, OS, kernel version, storage controller, device model, firmware, RAID config.
Measure baseline idle: Capture baseline I/O and CPU while idle (iostat, vmstat, top).

Methodology — Step-by-Step with fio (recommended)

Assumption: Linux environment, device at /dev/sdx. Adjust paths and sizes for your setup.

Prepare a test file or raw device
- For raw device: ensure it’s not mounted and you have backups.
- For file tests: create a file of appropriate size (≥ 2× RAM) to avoid caching.

Test sequential read

fio job example:

Code
[seq-read] rw=read bs=1M ioengine=libaio direct=1 size=10G runtime=60 numjobs=1 groupreporting

Run multiple times, increasing numjobs and queue depth to see scaling.

Test sequential write

Same as read but rw=write. For safety, use a disposable device/file.

Test random read/write (small block)

Typical settings:
Code
rw=randread bs=4k iodepth=32 size=10G runtime=60 numjobs=4 direct=1

Repeat for randwrite and mixed (rw=randrw with rwmixread=70).

Vary parameters systematically

Block sizes: 4k, 16k, 64k, 256k, 1M.

Queue depths: 1, 4, 8, 16, 32, 64.

Number of jobs: 1, 2, 4, 8.

Record metrics

Throughput (MB/s), IOPS, average/median/max latency, 99th/99.9th percentile latencies, CPU utilization.

Post-test validation

Verify no residual caching effects, check device SMART data, and compare results to vendor specs.

Interpreting Results

High sequential MB/s close to device spec indicates bandwidth is saturated.

High IOPS with low latencies for small-block random tests indicates good transactional performance.

Rising tail latencies (p99/p999) point to congestion or firmware issues even if average latency looks fine.

If throughput doesn’t scale with increased queue depth or jobs, controller, driver, or device limits may be present.

Common Pitfalls

Testing with cached I/O (not using direct I/O) — inflates numbers.

Using test file smaller than RAM — measures cache, not disk.

Running on mounted filesystem without accounting for filesystem effects.

Single-run conclusions — variability requires multiple runs.

Ignoring mixed-workload patterns that reflect real usage.

Best Practices

Use direct I/O (direct=1 in fio) to bypass page cache when measuring raw device performance.

Make test file size ≥ 2× RAM for file-based tests.

Run each test multiple times and report median plus variance.

Include latency percentiles (p95, p99, p999) alongside throughput.

Test real-world workload profiles (mixtures of read/write, burstiness, and think time).

Automate and script tests for consistency (bash, Ansible, or CI pipelines).

Compare with vendor specs and document firmware/driver versions.

Use monitoring (iostat, blktrace) concurrently to spot bottlenecks outside the disk (CPU, network, controller).

For cloud disks, test across instance types and AZs, and expect noisy neighbors—report ranges.

Example Report Structure

Test objective and environment specs

Tool and exact command lines used

Test matrix (block sizes, qdepths, jobs) in a table

Results: throughput, IOPS, latency percentiles per test (table or CSV)

Analysis: bottlenecks and actionable recommendations

Reproducibility notes and next steps

Quick Reference fio Command Examples

Sequential read:
Code
fio –name=seq-read –rw=read –bs=1M –size=10G –direct=1 –ioengine=libaio –runtime=60 –numjobs=1 –groupreporting

Random 4k mixed:
Code
fio –name=randmix –rw=randrw –bs=4k –rwmixread=70 –size=10G –iodepth=32 –numjobs=4 –direct=1 –runtime=60 –group_reporting

Conclusion

Accurate disk throughput measurement combines the right tools, a controlled methodology, and disciplined reporting. Use fio for flexible, scriptable tests, vary block sizes and queue depths to reveal different bottlenecks, record latency percentiles, and repeat tests to ensure reliability. Document environment and commands so results are reproducible and actionable.

How to Use a Disk Throughput Tester to Diagnose Slow Storage

Disk Throughput Tester: Tools, Methodology, and Best Practices

Key Concepts

Recommended Tools

Test Environment Preparation

Methodology — Step-by-Step with fio (recommended)

Interpreting Results

Common Pitfalls

Best Practices

Example Report Structure

Quick Reference fio Command Examples

Conclusion

Comments

Leave a Reply Cancel reply

More posts

Beginner’s Guide to SWF Sound Automation Tool: Features & Tips

Speed Test Internet: How to Measure Your True Download & Upload Speeds

Gmod Lua Lexer: A Beginner’s Guide to Tokenizing Garry’s Mod Scripts

10 DLLBased Best Practices for Stable Applications