Sunday 28 February 2016

Vectorization

When programs require crunching of numbers a bottleneck that can always occur is how long it takes. Sometimes these things can't be avoided, especially if the issue is just have a lot of elements to parse through. But there are some tricks that programmers can use to make sure the problem doesn't occur. One of these is Vectorization. Vectorization is the process of rewriting a loop so that instead of processing a single element, it can process multiple elements simultaneously. 

To show this in action I used the following code:


#include <stdio.h>
#include <stdlib.h>
#define N 1000
int main()
{
    int array1[N], array2[N];
    srand(time(NULL));
    int i;
    for (i = 0; i < N; i++) {
            array1[i] = rand() % (1000 + 1 - 0) + 0;
    }  
    for (i = 0; i < N; i++) {
            array2[i] = rand() % (1000 + 1 - 0) + 0;
    }
  
    long int total = 0;
    int array3[N];
    for (i = 0; i < N; i++){
        array3[N] = array1[i] + array2[i];
    }
    for (i = 0; i < N; i++){
        total += array3[N];
    }
    printf("\nThe total is: %d\n", total);
}
   
The code above creates two 1000-element integer arrays and fills them with random numbers, then sums those two arrays to a third array, and finally sums the third array to a long int and prints the result.

I compiled the code with the command line: gcc -O3 -o vect vect.c
The -O3 - allows options for vectorization.

Here is a portion of the objdump of main:


  402388: 91420222  add x2, x0, #0x80, lsl #12
  40238c: 3c407c00  ld1 {v0.2d}, [x0]
  402390: 3cdf7c21  ld1 {v1.2d}, [x1], #16
  402394: 3e61d400  fadd v0.2d, v0.2d, v1.2d
  402398: 4c9f2c00  st1 {v0.2d}, [x0], #16
  40239c: eb02001f  cmp x0, x2

Looking at the objdump, the intrsuctions new to me are st1 and ld1. Looking it up,st1 stores a single element structure to one lane of of one register. ld1 on the other hand loads a single element structure instead. Those two commands seem to help the process of vectorization.

No comments:

Post a Comment