Sunday 10 April 2016

The Trials and Tribulations of Optimizing Zpofli

What Didn't Work

When I first went about doing this, my methods of testing were incorrect and there was a lot of research I should have done better on.

For example, reading more about the zopfli compression, the reason it is slow was because that was the entire point of why it was made. It was meant to be 80x slower than gzip while only providing an up to 8% boost. So it became a tall order to try to better it.

Here's comparing the size differences between the two.

gzipzopfli
75993247550970

As you can see, in this case for the 10mb file made with the command:

base64 /dev/urandom | head -c 10000000 > 10mb.txt

it can only provide a 0.6% increase. Although with less random text, it can probably provide better performance. Still, we're dealing with razor thin margins.

What I first tried to do was follow this guide but what ended up happening was that I was getting an even worse performance than by doing the makefile. And this wasn't a permanent solution either as I wouldn't be able to port the exe file to linux.

Next I tried to change the code anywhere I could to see even just a bit of a boost. Changing any data types where I thought it could help. The only problem was that it wouldn't compile correctly. The code was so advanced and accurate, it was difficult to pinpoint where and what to change for everything. So I thought it was a bust to try to even attempt this.

What Did Work 

So I thought my only chance was maybe changing the compiler options around to see what sticks. In the makefile, and the documentation it's suggested that the program is compiled with an optimize of -o2. I thought this was odd, as why not go for the full o3? As well as some other options that allow it to be compiled correctly and such.

Here are the results when comparing -o2 and -o3 on the xerxes server.

-o2-o3
real    0m29.852sreal    0m29.847s

As you can see, no difference at all that can really be assumed thanks to the higher optimization level. As I ran it many times between the two, with -o2 sometimes being faster than -o3 and vice versa. I can only assume the reason the documentation suggests -o2 over -o3 that it just compiles quicker, and maybe it uses less memory, although I did not test for this.

So again I was stumped, as that means the optimization levels won't matter, until I remembered that I should try it on the aarchie server! And the results were terrific.

-o2-o3
real    3m56.909sreal    3m50.971s

I tested multiple times on both levels to make sure it wasn't just a fluke, but nope, results were always +/- 1 second from their respective levels. This means that changing between the levels can provide as much as ~2% difference in speed! And when something takes this long to compress, that is something that is noticeable when you're working with things that could take hours.
  
Testing with larger files on the xerxes server with different optimizing levels showed the same speed between them all. Unlike showing a difference like it did on the ARM64. Which means that changing it is an ARM64 improvement only. This is probably why it was not noticed or included in the makefile.

What Now?

Before submitting these findings to the upstream I'll have to gather more research for why this happens. There must one or two only flags in the level 3 optimization flag that help it only on the ARM64 platform. Once I find this out, I could then push an ARM64 only makefile so others could see the same performance boost.

No comments:

Post a Comment