Thursday 21 April 2016

Providing for the Upstream

I have submitted my work to the upstream through github.

https://github.com/google/zopfli/issues/104

 I thought it would be best practice to first show my finding to the community for them to comment on the benefits and cons that I don't know about. Unfortunately, even 4 days after posting no one has responded yet. I assume even if I had made a pull request, the same thing would have happened at this point in time as no new pull requests were answered either.

For the sake of time though, if the work had been accepted, I would have had to went through their guidelines for contributing. Which follows a Google Individual Contributor License Agreement (CLA). The following is stated:

Before you contribute

Before we can use your code, you must sign the Google Individual Contributor License Agreement (CLA), which you can do online. The CLA is necessary mainly because you own the copyright to your changes, even after your contribution becomes part of our codebase, so we need your permission to use and distribute your code. We also need to be sure of various other things—for instance that you'll tell us if you know that your code infringes on other people's patents. You don't have to sign the CLA until after you've submitted your code for review and a member has approved it, but you must do it before we can put your code into our codebase. Before you start working on a larger contribution, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you. Coordinating up front makes it much easier to avoid frustration later on.

Code reviews

All submissions, including submissions by project members, require review. We use Github pull requests for this purpose.

The small print

Contributions made by corporations are covered by a different agreement than the one above, the Software Grant and Corporate Contributor License Agreement.

 I followed their suggestion of:

Before you start working on a larger contribution, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you. Coordinating up front makes it much easier to avoid frustration later on.

But since they have not answered me yet, I was not able to continue with the signing of the CLA.

Sunday 10 April 2016

The Trials and Tribulations of Optimizing Zpofli

What Didn't Work

When I first went about doing this, my methods of testing were incorrect and there was a lot of research I should have done better on.

For example, reading more about the zopfli compression, the reason it is slow was because that was the entire point of why it was made. It was meant to be 80x slower than gzip while only providing an up to 8% boost. So it became a tall order to try to better it.

Here's comparing the size differences between the two.

gzipzopfli
75993247550970

As you can see, in this case for the 10mb file made with the command:

base64 /dev/urandom | head -c 10000000 > 10mb.txt

it can only provide a 0.6% increase. Although with less random text, it can probably provide better performance. Still, we're dealing with razor thin margins.

What I first tried to do was follow this guide but what ended up happening was that I was getting an even worse performance than by doing the makefile. And this wasn't a permanent solution either as I wouldn't be able to port the exe file to linux.

Next I tried to change the code anywhere I could to see even just a bit of a boost. Changing any data types where I thought it could help. The only problem was that it wouldn't compile correctly. The code was so advanced and accurate, it was difficult to pinpoint where and what to change for everything. So I thought it was a bust to try to even attempt this.

What Did Work 

So I thought my only chance was maybe changing the compiler options around to see what sticks. In the makefile, and the documentation it's suggested that the program is compiled with an optimize of -o2. I thought this was odd, as why not go for the full o3? As well as some other options that allow it to be compiled correctly and such.

Here are the results when comparing -o2 and -o3 on the xerxes server.

-o2-o3
real    0m29.852sreal    0m29.847s

As you can see, no difference at all that can really be assumed thanks to the higher optimization level. As I ran it many times between the two, with -o2 sometimes being faster than -o3 and vice versa. I can only assume the reason the documentation suggests -o2 over -o3 that it just compiles quicker, and maybe it uses less memory, although I did not test for this.

So again I was stumped, as that means the optimization levels won't matter, until I remembered that I should try it on the aarchie server! And the results were terrific.

-o2-o3
real    3m56.909sreal    3m50.971s

I tested multiple times on both levels to make sure it wasn't just a fluke, but nope, results were always +/- 1 second from their respective levels. This means that changing between the levels can provide as much as ~2% difference in speed! And when something takes this long to compress, that is something that is noticeable when you're working with things that could take hours.
  
Testing with larger files on the xerxes server with different optimizing levels showed the same speed between them all. Unlike showing a difference like it did on the ARM64. Which means that changing it is an ARM64 improvement only. This is probably why it was not noticed or included in the makefile.

What Now?

Before submitting these findings to the upstream I'll have to gather more research for why this happens. There must one or two only flags in the level 3 optimization flag that help it only on the ARM64 platform. Once I find this out, I could then push an ARM64 only makefile so others could see the same performance boost.