lzbench Compression Benchmark (2024)

lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors.

The benchmark currently consists of 36 datasets, tested against 40 codecs at every compression level they offer. It is run on 1 test machine, yielding a grand total of 7200 datapoints.

Skip to results (pretty pictures!) Learn more about lzbench

The following benchmark cover the most common compression methods. It's tested against the low level C libraries with the available flags. The large amount of compressors and the similarity between them can cause confusion. It's easier to understand the comparison once you realize that a compressor is just the combination of the following 3 things.

  • An algorithm, with adjustable settings. (lzma, deflate, lzo, lz4, etc...)
  • An archive format. (.gz, tar.xz, tar.gz, .7z, .zip, etc...)
  • A tool or a library, also knows as the implementation. (gzip, tar, 7-zip, zlib, liblzma, libdeflate, etc...)

The algorithm family is the most defining characteristic by far, then comes the implementation. A C library well-optimized over a decade should do a bit better than a random java lib from github.

For example, gzip designates both the tool and its archive format (specific to that tool) but it's based on deflate. It has similar results to everything else that is based on deflate (particularly the zlib library).

There are some bugs or edge cases to account for so you should always test your implementation against your use case. For instance kafka have offered snappy compression for a few years (off by default) but the buffers are misconfigured and it cannot achieve any meaningful compression.

Let's split the compressors in categories: the slow, the medium and the fast:

  • The slow are in the 0 - 10 MB/s range at compression. It's mostly LZMA derivatives (LZMA, LZMA2, XZ, 7-zip default), bzip2 and brotli (from google).
  • The medium are in the 10 - 500 MB/s range at compression. It's mostly deflate (used by gzip) and zstd (facebook). Note that deflate is on the lower end while zstd is on the higher end.
  • The fast algorithms are around 1 GB/s and above, a whole gigabyte that is correct, at both compression and decompression. It's mostly lzo, lz4 (facebook) and snappy (google).

The strongest and slowest algorithms are ideal to compress a single time and decompress many times. For example, linux packages are distributed as packages.tar.xz (lzma) for the last few years. It used to be tar.gz historically, the switch to stronger compression must have saved a lot of bandwidth on the Linux mirrors.

The medium algorithms are ideal to save storage space and/or network transfer at the expense of CPU time. For example, backups or logs are often gzip'ed on archival. Static web assets can be compressed on the fly by some web servers to save bandwidth (html, css, js).

The fastest algorithms are ideal to reduce storage/disk/network usage and make application more efficient. The compression and decompression speed is actually faster than most I/O. Using compression can reduce I/O and it will make the application faster if I/O was the bottleneck. For example, ElasticSearch (Lucene) compresses indexes with lz4 by default.

The slower the compression, the better the ratio, however it is not necessarily a good idea to waste entire minutes just to save a few megabytes. It's all about balance.

If you ever try to "7-zip ultra" a 4 GB DVD content, or "gzip --strong" a 100 GB database dump, you might realize that it takes 20 hours to run. 20 hours of wasted CPU and electricty and heat, notwistanding 20 hours too long to run it daily, as the daily backup it was supposed to be. That's when zstd and lz4 come in handy and save the day.

Not shown here but to keep in mind is the memory usage. Stronger compression usually comes at the cost of higher memory usage for both compression and decompression. The worst case is probably LZMA that requires a gigabyte of memory per core at the strongest levels, then a bit less for decompression. It prevents usage on low-end machines, mobiles and embedded devices.

We're in the 3rd millenimum and there was surprisingly little progress in general compression in the past decades. deflate, lzma and lzo are from the 90's, the origin of lz compression traces back to at least the 70's.

Actually, it's not true that nothing happened. Google and Facebook have people working on compression, they have a lot of data and a ton to gain by shaving off a few percents here and there.

Facebook in particular has hired the top compression research scientist and rolled 2 compressors based on a novel compression approach that is doing wonder. That could very well be the biggest advance in computing in the last decade.

See zstd (medium) and lz4 (fast):

  • zstd blows deflate out of the water, achieving a better compression ratio than gzip while being multiple times faster to compress.
  • lz4 blows lzo and google snappy by all metrics, by a fair margin.

Better yet, they come with a wide range of compression levels that can adjust speed/ratio almost linearly. The slower end pushes against the other slow algorithms, while the fast end pushes against the other faster algorithms. It's incredibly friendly as a developer or a user. All it takes is a single algorithm to support (zstd) with a single tunable setting (1 to 20) and it's possible to accurately tradeoff speed for compression. It's unprecedented.

Of course one could say that gzip already offerred tunable compression levels (1-9) however it doesn't cover a remotely comparable range of speed/ratio. Not to mention that the upper half is hardly useful, it's already slow and making it slower for little benefit.

Note that Google released 3 options, none of which are noteworthy in my opinion: brotli (comparable to xz and zstd, a resources hog), gipfeli (medium-fast, never had a release on github) and snappy (fast, stricly inferior to lz4 by all metrics).

To conclude this. Don't take my word for it. Check out the benchmark results below. You can also download the lzbench project && make && ./lzbench -eall file.txt to see for yourself.

Will you add «insert compression codec»?

In order to be included in the benchmark, the software must be supported by lzbench.

If the codec is reliable, works on Linux, is accessible from C or C++, and open-source, the odds are good that it can be added.

Will you include proprietary codecs?

If it is available on github and can be integrated into lzbench, as stated above, it could be added. Either way, you can clone the repository and test any codec on your own, without redistributing it or the results. The only widespread compression library I know if is Oodle from RAD Game Tools. Proprietary compression tools are often delivered as executable and with a graphical user interface, whereas the current testing methodology can only apply to a library.

How are the values calculated?

The benchmark collects the compressed size, compression time, and decompression time. Those are then used to calculate the values used in the benchmark:

Ratio

uncompressed size / compressed size

Compression Speed

uncompressed size / compression time

Decompression Speed

uncompressed size / decompression time

Round Trip Speed

(2 × uncompressed size) / (compression time + decompression time)

Sizes are presented using binary prefixes—1 KiB is 1024 bytes, 1 MiB is 1024 KiB, and so on.

What about memory usage?

I would really like to include that data. If you have a good way to capture that data on the C side, please have it merged into lzbench.

Is time CPU time or wall-clock?

lzbench captures wall clock time and runs with realtime priority.

What about multi-threading?

That's a tricky question. For one thing it would explode the number of data points, up to 32 times for a server-class CPU with 32 cores. The benchmark would take even longer to run and the visualizations would be meaningless.

Can you switch the graphs to use a logarithmic scale?

You can. Just click on the label for either axis and it will toggle between linear and logarithmic. It's not intuitive and I would be happy to merge a PR to improve that.

I don't want to switch the default because I think that linear is probably better for most people. Logarithmic tends to be better if all you care about is compression ratio, not speed.

Can you make it easier to compare machines or datasets (instead of codecs)?

I would love to. Feel free to submit a pull request with more dynamic charts. In the meantime, the raw data can be downloaded and loaded into an Excel Pivot Chart.

My library isn't performing as well as I think it should!

Refer to lzbench to find out how the library is used. It may or may not be optimal.

Can you add «insert machine, CPU, architecture, OS, etc.»?

Only if I have, or at least have access to, a machine which fits that description.

I included what I had available. If you would like to donate other hardware I'm willing to add it to the benchmark.

What compiler flags were used?

Refer to lzbench. As far as performance is concerned, most plugins are compiled with -O3 except a few that only work with -O2.

How long does it take to run the benchmark?

For each level of each codec, the benchmark will compress the data repeatedly until 5 seconds has elapsed, then do the same for decompression. Therefore, each level executes for a minimum of 10 seconds. There are {{datasets.length|number}} datasets, for a total of {{(data_points_per_machine*datasets.length*10)|formatDuration}} minumum, per machine.

That said, for larger datasets not all codecs will be able to complete even one iteration in 5 seconds. In practice, it takes a whole day on a desktop, more or less. days.

Will you add different sets of compiler flags?

No. There are a huge number of different possible options, which can be combined in any way, leading to a combinatorial explosion of the number of times the benchmark would have to be run. Given the time it takes to run the benchmark, this is simply not feasible.

If you are curious about specific flags you should run the benchmark yourself. Or, even better, create your own benchmark with the codec from your software tested against your data.

This doesn't work in my browser.

This should work in any modern browser, including Internet Explorer, Firefox, Chrome and mobile equivalents. Make sure javascript is enabled.

Can I have the raw data?

Of course! The table in the "choose a machine" section includes a link to a CSV which can be imported into your favorite spreadsheet application.

It's also available from the data folder of the git repository.

If you do something interesting with it please let us know! Or, even better, submit a pull request so everyone can benefit from your brilliance!

Can I link to a specific configuration?

Some things can be configured by passing parameters in the query string:

dataset
Dataset to show (currently ), the default is selected randomly
machine
Machine to show (currently ), the default is selected randomly
speed
Transfer speed (in KiB/s) for the Transfer + Processing chart (currently )
speed-scale
The default scale for the speed axis of charts (linear or logarithmic, currently )
visible-plugins
A comma-separated list of plugins to show in the scatter plots. All other plugins will be disabled, though they can be re-enabled by clicking on their entry in the legend.
hidden-plugins
A comma-separated list of plugins to hide in the scatter plots. Note that, if used, this parameter overrides visible-plugins

For example, the current configuration is: .

lzbench Compression Benchmark (2024)
Top Articles
Barclays Center Brooklyn | Sports, Concerts, Seats, Tickets, Hotel
Regal Entertainment Group : Hosts the Twilight Saga Marathon at 450 Theatres Nationwide
Libiyi Sawsharpener
Algebra Calculator Mathway
Paris 2024: Kellie Harrington has 'no more mountains' as double Olympic champion retires
Crocodile Tears - Quest
BULLETIN OF ANIMAL HEALTH AND PRODUCTION IN AFRICA
City Of Spokane Code Enforcement
Comenity Credit Card Guide 2024: Things To Know And Alternatives
What’s the Difference Between Cash Flow and Profit?
Miami Valley Hospital Central Scheduling
104 Whiley Road Lancaster Ohio
Studentvue Columbia Heights
Mbta Commuter Rail Lowell Line Schedule
Daily Voice Tarrytown
Eva Mastromatteo Erie Pa
Dignity Nfuse
Me Cojo A Mama Borracha
Diamond Piers Menards
Charter Spectrum Store
Www Craigslist Milwaukee Wi
Little Caesars 92Nd And Pecos
How many days until 12 December - Calendarr
Conscious Cloud Dispensary Photos
European city that's best to visit from the UK by train has amazing beer
Southwest Flight 238
Drying Cloths At A Hammam Crossword Clue
14 Top-Rated Attractions & Things to Do in Medford, OR
Marokko houdt honderden mensen tegen die illegaal grens met Spaanse stad Ceuta wilden oversteken
Vivification Harry Potter
6465319333
First Light Tomorrow Morning
Joplin Pets Craigslist
Vitals, jeden Tag besser | Vitals Nahrungsergänzungsmittel
Tgh Imaging Powered By Tower Wesley Chapel Photos
2012 Street Glide Blue Book Value
Wednesday Morning Gifs
11 Pm Pst
The best Verizon phones for 2024
Is The Nun Based On a True Story?
Carteret County Busted Paper
Hovia reveals top 4 feel-good wallpaper trends for 2024
Mbfs Com Login
Pain Out Maxx Kratom
Sound Of Freedom Showtimes Near Amc Mountainside 10
Human Resources / Payroll Information
Keci News
Neil Young - Sugar Mountain (2008) - MusicMeter.nl
Haunted Mansion Showtimes Near Millstone 14
Jigidi Jigsaw Puzzles Free
Sdn Dds
Honeybee: Classification, Morphology, Types, and Lifecycle
Latest Posts
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6277

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.