Tuesday 29 March 2016

Identifying a Possible Optimization for the Zopfli Library

The Zopfli Compression Algorithm is a compression library programmed in C to perform very well, but slow for deflate or zlib compression. The library is only able to compress, not decompress.

I chose this library as the community was the largest and most active on Github. There are some pros and cons for this. One being that any problems that occur are reported and able to be fixed quicker as it's a quick and easy way to post the issue on Github. However this can also be a problem because although there are many issues on how to improve the library, many of them are solved and closed for the community being so active.

Tests:

The follow is done by creating a random size text file on the ARM64 platform with the command

dd if=/dev/urandom of=10mb.txt count=1M bs=10
and transferring it over to the x86_64 platform so that the tests were running on the same data. I tested both a 1M and 10M file on the platforms. Here are the results ran with time ./zopfli 10mb.txt:

System1M10M
x86_64real    0m3.309sreal    0m33.970s
ARMv8real    0m27.127sreal    4m26.325s

They weren't kidding when they said it was "slow". Although any changes I do, good or bad, are more easily noticeable.

 

Optimizations Choices:

There are three avenues I believe I can follow in a priority order for possible ways to optimize the library. In the deflate.c there is a TODO of

static void AddBits(unsigned symbol, unsigned length,
                    unsigned char* bp, unsigned char** out, size_t* outsize) {
  /* TODO(lode): make more efficient (add more bits at once). */
  unsigned i;
  for (i = 0; i < length; i++) {
    unsigned bit = (symbol >> i) & 1;
    if (*bp == 0) ZOPFLI_APPEND_DATA(0, out, outsize);
    (*out)[*outsize - 1] |= bit << *bp;
    *bp = (*bp + 1) & 7;
  }
}

for the AddBits function call. Perhaps there's something I can change to help the code along.

My second choice afterwards will be trying different flags for the build:

gcc src/zopfli/*.c -O2 -W -Wall -Wextra -ansi -pedantic -lm -o zopfli
as of right now, there may be some other other flags in O3 optimization that may help some edge cases, but could make it unstable in other areas. This will require patience a lot of compiling and test running.

Finally, the least enjoyable one would be trying to make it platform optimized for the ARM64. No issues were made on the Github so this might be a platform that has been overlooked.

Monday 14 March 2016

fipa-pure-const & fdse Compiler Options

fipa-pure-const

The fipa-pure-const option allows the compiler to "Discover which functions are pure or constant. Enabled by default at -O and higher."

To understand pure functions there are two requirements:
  1. The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.
  2. Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
Pure functions examples are:

  • sin(x), returning the sine of a number x
  • length(s), returning the size of a string s
Impure functions would be functions that are not the same value if the conditions are the same, like the day of the week or a random().

Implication

for(int i = 0; i < fibonacci(100); ++i)

with the fipa-pure-const the loop becomes

for
(int i = 0,end = fibonacci(100); i < end; ++i)

as the fibonacci() doesn't require to be calculated more than once.

fdse

The fdse option allows the compiler to "Perform dead store elimination (DSE) on RTL. Enabled by default at -O and higher."

Dead store is a variable that is assigned a value but is not read by any subsequent instruction. Dead store values waste processor time and memory.

Implication


function func(a, b) {
    var x;
    var i = 300;
    while (i--) {
        x = a + b; // dead store
    }
}
 
Removing dead store values can have the unintended consequence of not allowing 
overwrites to happen. 

Sources: 

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
https://en.wikipedia.org/wiki/Pure_function
https://en.wikipedia.org/wiki/Constant_function
https://en.wikipedia.org/wiki/Dead_store