lzbench Compression Benchmark (2024)

lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors.

The benchmark currently consists of 36 datasets, tested against 40 codecs at every compression level they offer. It is run on 1 test machine, yielding a grand total of 7200 datapoints.

Skip to results (pretty pictures!) Learn more about lzbench

The following benchmark cover the most common compression methods. It's tested against the low level C libraries with the available flags. The large amount of compressors and the similarity between them can cause confusion. It's easier to understand the comparison once you realize that a compressor is just the combination of the following 3 things.

An algorithm, with adjustable settings. (lzma, deflate, lzo, lz4, etc...)
An archive format. (.gz, tar.xz, tar.gz, .7z, .zip, etc...)
A tool or a library, also knows as the implementation. (gzip, tar, 7-zip, zlib, liblzma, libdeflate, etc...)

The algorithm family is the most defining characteristic by far, then comes the implementation. A C library well-optimized over a decade should do a bit better than a random java lib from github.

For example, gzip designates both the tool and its archive format (specific to that tool) but it's based on deflate. It has similar results to everything else that is based on deflate (particularly the zlib library).

There are some bugs or edge cases to account for so you should always test your implementation against your use case. For instance kafka have offered snappy compression for a few years (off by default) but the buffers are misconfigured and it cannot achieve any meaningful compression.

Let's split the compressors in categories: the slow, the medium and the fast:

The slow are in the 0 - 10 MB/s range at compression. It's mostly LZMA derivatives (LZMA, LZMA2, XZ, 7-zip default), bzip2 and brotli (from google).
The medium are in the 10 - 500 MB/s range at compression. It's mostly deflate (used by gzip) and zstd (facebook). Note that deflate is on the lower end while zstd is on the higher end.
The fast algorithms are around 1 GB/s and above, a whole gigabyte that is correct, at both compression and decompression. It's mostly lzo, lz4 (facebook) and snappy (google).

The strongest and slowest algorithms are ideal to compress a single time and decompress many times. For example, linux packages are distributed as packages.tar.xz (lzma) for the last few years. It used to be tar.gz historically, the switch to stronger compression must have saved a lot of bandwidth on the Linux mirrors.

The medium algorithms are ideal to save storage space and/or network transfer at the expense of CPU time. For example, backups or logs are often gzip'ed on archival. Static web assets can be compressed on the fly by some web servers to save bandwidth (html, css, js).

The fastest algorithms are ideal to reduce storage/disk/network usage and make application more efficient. The compression and decompression speed is actually faster than most I/O. Using compression can reduce I/O and it will make the application faster if I/O was the bottleneck. For example, ElasticSearch (Lucene) compresses indexes with lz4 by default.