...
With increasing models/batch sizes we expect it to be dominated by the actual matrix operations.
Convolutional benchmark
AlexNet
Let's take a look at the smaller AlexNet, since it's expected to show the most differences.
Control group shows as expected almost no difference between different setups – againrecall, we use same OpenMP and precompiled MKL.
...
Same behaviour we see in the treatment group no matter which OpenMP is used.
Control group
Treatment group shows no difference other than that "GCC-swing". Normalizing the data gives us average scores with ~1% difference, which is close to standard error.
Treatment group
ResNet152
Now we can observe a beautiful saturation of the throughput. Optimal batch size is between 16 and 32.
...
We can see pretty obvious patterns.
- Newer compilers perform better than the older.
- GOMP is slower than IOMP.
But the overall differences are pretty close to standard error and don't even reach 2%.
faster-rcnn Benchmark
...
As we can see, GOMP delivers ~3-5% worse performance than OMP.
...