Beating QOI - Part 1

February 11, 2024 - images rust

This is the first part of a series on how the png crate was able to acheive performance on-par or better than QOI, while remaining fully compatible with the PNG standard.

The purpose of lossless image compression is to reduce the size of an image while retaining the full pixel information. Modern formats generally use the same couple conceptual steps...

First for each pixel in the image the encoder computes a "prediction" of the color based on the previous pixels. There are various possible predictors, but a simple one is to take the (wrapping) difference from the previous pixel.
Next LZ compression is applied: The encoder searches for repeated runs of values and replaces them with back-references that indicate the position and number of bytes to repeat.
And finally entropy coding is applied to reduce the number of bits required to store the data. A common choice here is to apply huffman coding that encodes each literal byte and back-reference with a different prefix code. In most cases, this is the step that produces the largest reduction in image size.

The QOI format

The Quite OK Image format (QOI) was proposed as an extremely simple image format. It achieves compression nearly as good as PNG while being many times faster than prior PNG encoders and decoders.

A quick read of the QOI spec might suggest that it is using a very different approach. But in fact, the format actually fits into the same framework.

In QOI, the entropy coding is done using a fixed set of prefix codes all of which are a multiple of eight bits. A code word starting with 0b11 indicates a run, 0b01 indicates small delta from the predicted value (i.e. the difference from the previous pixel), 0xFF indicates a literal, etc.

Using a fixed set of codes reduces the maximum possible compression ratio, but it turns out to have less of an impact than one might expect. The QOI format achieve quite respectable compression ratios across a wide range of images.

Applying fixed codes to PNG

It turns out that we can use the same strategy for encoding PNGs. Ordinarily, PNG encoding requires two passes over the image data: once to gather statistics to determine the ideal entropy code, and then a second time to process the image data. However, we can instead use a offline process to build a single entropy code trained on a large corpus. At runtime, our encoder can always pick this same code, thereby skipping an entire pass over the image contents.

Unlike with QOI, the resulting codes do not have to be multiples of 8-bits. Small deltas from the predicted value and smaller RLE counts get assigned shorter code lengths because they occur more frequently. Another adavantage of the PNG format is that it allows the "Paeth" predictor which incorporates three prevously seen pixel values and tends to produce slightly better predictions than QOI's simple "previous" predictor (which PNG also supports).

At the same time, there are some limitations. QOI has the concept of a indexed lookup of a previously seen pixel value which PNG lacks. Similarly, to acheive QOI-class encoding performance, we cannot devote much time to finding back-references and so only RLE encoding is used.