Lab6

In this lab we are going to be looking at the vector capabilities of the arm64 processor and how it is possible to speed up our code.

In this experiment we are going to take two 1000 long integer arrays, fill them with random numbers and then sum them into another array and print them. In the first attempt I filled the arrays with random numbers, summed them into a long and printed them in the same loop.*Note that array3 is of type long and array 1 & 2 are of type int.

for(){
array1[i]=rand();
array2[i]=rand();
array3[i]=array1[i]+array2[i];
printf():
}
Compiling this code with -O3 (to turn on auto vectorization) yielded no loop getting vectorized. Recompling with a flag to see compiler vector attemps(-ftree-vectorizer-verbose=2) it was clear that the compiler could not vectorize this loop because of the function calls. I could see that the loop was not vectorized because verbose told me it could not vectorize and when I objdump -d the elf file, you could see that vector registers were not used.

To get the compiler to vectorize the code I moved the filling of array 1 and 2 into a separate loop, the printf into a separate loop as well as the summing into a separate loop. Compiling with the same flags shows a message that the compiler was able to vectorize the summation loop. Looking at the objdump -d of the file you can see vector registers being used. It looks like this:

ldl {v1.4s}, [x21]
ldl {v0.4s}, [x20]
add v0.4s, v1.4s, v

What this code is doing is it is pulling 4 32-bit values into v0 (v0.4s), pulling 4 32-bit values into v1 and then summing the two values in one instruction effectively summing 4 index’s of the array with one instruction.

To vectorize our previous lab we could make array1 our sample sounds, we can fill array2, an identical array in size with the volume scaling factor and finally copy the code we have above but instead of adding we multiply. We could also try just recompiling our code with -O3 the way we have it as the only instruction in the loop is the scaling so it is possible -O3 will suffice in vectorizing our code the way we currently have it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s