blog dds

2008.04.18

LTO Tape Drive Compression Considered Harmful

I used to think that tape drive compression was a silly marketing trick used by manufacturers to inflate the advertised capacity of their tape drives. Apparently it is worse than that.

It all started when a Dell PowerVault LTO-2-L 200GB (native capacity) drive we use in our group returned a "No space left on device" error after writing about 7 GB less than its stated capacity. Initially I thought that I had mixed binary (1024-based) and decimal powers, but it turned out that the drive had stored considerably less than its 200 billion bytes of stated capacity.

My next idea was to see if uncompressible data was somehow interfering with the drive's compression algorithm. The theory didn't support this idea. Wikipedia stated that uncompressible data didn't have any overhead. I knew better than to trust Wikipedia, so checked the actual the Streaming Lossless Data Compression Algorithm - (SLDC) standard, which ECMA, to its credit, provides online for free on its site. It turns out that when data can't be compressed SLDC represents all bytes as themselves, except for 0xff, which has a zero bit appended to it. According to my calculations, this overhead is (1 / 8 / 256) i.e. 4.88e-4 or 0.0488%; probably what the standard refers to as 0.05%. This was orders of magnitude less than what I was seeing.

Given that the theory didn't support practice, my next step was to experiment. I tried to fill a tape using the OpenSSL rand(1) command, but to my dismay I realized that the command silently truncated the specified output length to 2GB. It was also slow, prohibiting the tape from streaming. I therefore wrote those 2GB into a file, and then had a small command sequence repeatedly block them correctly onto the tape:

while :
do
cat randbytes
done |
dd conv=sync obs=64k |
dd of=/dev/st0 bs=64k

The first time I run the command the drive stored just 190 billion bytes:

dd: writing `/dev/st0': No space left on device
2907774+0 records in
2907773+0 records out
190563811328 bytes (191 GB) copied, 9104.54 s, 20.9 MB/s
My colleague Georgios Gousios, who is the machine's administrator, then disabled the drive's compression (which was indeed enabled).
# mt -f /dev/st0 datcompression 1
Compression on.
# mt -f /dev/st0 datcompression 0
Compression off.
(For some strange reason this command is only accessible to the root user.) When I run the same script again the results were better than I expected:
dd: writing `/dev/st0': No space left on device
3241533+0 records in
3241532+0 records out
212437041152 bytes (212 GB) copied, 9104.37 s, 23.3 MB/s

The moral? Although in theory LTO compression should be harmless, in practice on this particular drive it has a 5% overhead for truly random data. Disabling the compression fixes this problem. Without access to the source code of the drive's firmware (if the compression algorithm is indeed implemented in software) this is all I can say. With compression disabled I can choose whether I want to compress the data (compressing MP3 audio or JPEG images is unlikely to deliver any substantial benefits), and, if I want to compress it, which algorithm to use. Furthermore, by using an open-source host-based compression scheme (like gzip or bzip2), I can always examine (and even fix) the compression algorithm's implementation.