Tuesday, 29 March 2016

Identifying a Possible Optimization for the Zopfli Library

The Zopfli Compression Algorithm is a compression library programmed in C to perform very well, but slow for deflate or zlib compression. The library is only able to compress, not decompress.

I chose this library as the community was the largest and most active on Github. There are some pros and cons for this. One being that any problems that occur are reported and able to be fixed quicker as it's a quick and easy way to post the issue on Github. However this can also be a problem because although there are many issues on how to improve the library, many of them are solved and closed for the community being so active.

Tests:

The follow is done by creating a random size text file on the ARM64 platform with the command

dd if=/dev/urandom of=10mb.txt count=1M bs=10
and transferring it over to the x86_64 platform so that the tests were running on the same data. I tested both a 1M and 10M file on the platforms. Here are the results ran with time ./zopfli 10mb.txt:

System1M10M
x86_64real    0m3.309sreal    0m33.970s
ARMv8real    0m27.127sreal    4m26.325s

They weren't kidding when they said it was "slow". Although any changes I do, good or bad, are more easily noticeable.

 

Optimizations Choices:

There are three avenues I believe I can follow in a priority order for possible ways to optimize the library. In the deflate.c there is a TODO of

static void AddBits(unsigned symbol, unsigned length,
                    unsigned char* bp, unsigned char** out, size_t* outsize) {
  /* TODO(lode): make more efficient (add more bits at once). */
  unsigned i;
  for (i = 0; i < length; i++) {
    unsigned bit = (symbol >> i) & 1;
    if (*bp == 0) ZOPFLI_APPEND_DATA(0, out, outsize);
    (*out)[*outsize - 1] |= bit << *bp;
    *bp = (*bp + 1) & 7;
  }
}

for the AddBits function call. Perhaps there's something I can change to help the code along.

My second choice afterwards will be trying different flags for the build:

gcc src/zopfli/*.c -O2 -W -Wall -Wextra -ansi -pedantic -lm -o zopfli
as of right now, there may be some other other flags in O3 optimization that may help some edge cases, but could make it unstable in other areas. This will require patience a lot of compiling and test running.

Finally, the least enjoyable one would be trying to make it platform optimized for the ARM64. No issues were made on the Github so this might be a platform that has been overlooked.

No comments:

Post a Comment