The following code is the cut down version for easier understanding and would not run directly as is:
#define MAX 1550000One thing to keep in mind for the code to work is that many of the variables need to be used in some shape or form. Some of the optimizations of the compiler cause certain operations to not waste time calculating if not used in another place. I've removed those parts above in the sample code to reduce verbose.
#define FACTOR 0.500
int main() {
int16_t* meth1 = (int16_t*) malloc(MAX * 2);
int16_t* meth2 = (int16_t*) malloc(MAX * 2);
int i;
sample = (int16_t*) malloc(32768 * 2);
int16_t* sound;
sound = (int16_t*) malloc(MAX * 2);
for(i=0;i<MAX;i++){
sound[i] = rand() % 65536 - 32768;
}
for(i=0;i<MAX;i++){
meth1[i] = method1(sound[i],FACTOR);
}
for(i=0;i<32769;i++){
sample[i] = i * FACTOR;
}
for(i=0;i<MAX;i++){
meth2[i] = method2(sound[i],FACTOR);
}
return 0;
}
// Method 1 Calculate result each time
int16_t method1(int16_t s,float f){
return s*f;
}
// Method 2 Using Lookup Table to get result;
int16_t method2(int16_t s,float f){
int16_t result;
if(s >= 0){
result = sample[s];
}
else{
result = -sample[-s];
}
return result;
}
The results between the x86_64 system and ARMv8 system are as following:
System | Scaling | Lookup |
---|---|---|
x86_64 | ~0.169ms | ~0.166ms |
ARMv8 | ~0.95ms | ~0.44ms |
From the results above we can see that the implementation of the problem was best run on the x86_64 system. As it probably has better hardware for faster processing. However, what's interesting to note is that the lookup performed much better on the ARMv8 compared to the scaling. The x86_64 system didn't see as much as a performance difference.
No comments:
Post a Comment