Page History

...

With increasing models/batch sizes we expect it to be dominated by the actual matrix operations.

Convolutional benchmark

AlexNet

Let's take a look at the smaller AlexNet, since it's expected to show the most differences.

Control group shows as expected almost no difference between different setups – againrecall, we use same OpenMP and precompiled MKL.

...

Same behaviour we see in the treatment group no matter which OpenMP is used.

Control group Image Modified

Control group

Treatment group shows no difference other than that "GCC-swing". Normalizing the data gives us average scores with ~1% difference, which is close to standard error.

Treatment group

ResNet152

Now we can observe a beautiful saturation of the throughput. Optimal batch size is between 16 and 32.

...

We can see pretty obvious patterns.

Newer compilers perform better than the older.
GOMP is slower than IOMP.

But the overall differences are pretty close to standard error and don't even reach 2%.

faster-rcnn Benchmark

...

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Convolutional benchmark

AlexNet

ResNet152

faster-rcnn Benchmark

As we can see, GOMP delivers ~3-5% worse performance than OMP.
...

Page tree

Page History

Versions Compared

Old Version 4

New Version 5

Key

Convolutional benchmark

AlexNet

ResNet152

faster-rcnn Benchmark

As we can see, GOMP delivers ~3-5% worse performance than OMP. ...

As we can see, GOMP delivers ~3-5% worse performance than OMP.
...