Deep learning inference engines

by Erik Smistad · Published January 25, 2019 · Updated April 11, 2019

I have been working a lot lately with different deep learning inference engines, integrating them into the FAST framework. Specifically I have been working with Google’s TensorFlow (with cuDNN acceleration), NVIDIA’s TensorRT and Intel’s OpenVINO. In this post, I compare these three engines, their pros and cons, as well as tricks on how to convert models from keras/tensorflow to run on these engines.

TensorFlow 1.10 (Google)

Pros

Supports many layers
NVIDIA cuDNN acceleration
Open-source and freely available on Github. However cuDNN is proprietary and requires registration to download.

Cons

Huge library with big footprint
Many third party dependencies
Very long compile time especially on windows
Uses Google’s bazel for compilation

TensorRT 5.0 (NVIDIA)

Pros

Performs auto tuning when loading the model – gives better performance than TensorFlow with cuDNN. Note however that this will increase the loading time of the model, this can reduced by caching/serializing the tuned model, see here for a guide on how to do this.
Lightweight

Cons

Supports very few TensorFlow layers (better support for ONNX), see the list of supported ops here.
Model converter is not able to convert from channel last to channel first ordering (NHWC -> NCWH), this means you have to retrain network with channel first ordering.
You need to provide the name of inputs and outputs nodes and their shape manually.
Proprietary – Requires registration to download

Caveats

Currently, TensorRT only works with channel first ordering (NCHW).
Has issues with TensorFlow softmax layer in channel first ordering. Thus have to convert to channel last, and then two channel first like so:

x = Permute((2, 3, 1))(x) # Convert tensor to HWC
# Using tensorflow native softmax, due to issues with tensors with dimensions > 2
def tf_softmax(x):
    return tf.nn.softmax(x, axis=-1)
x = Lambda(tf_softmax)(x)
x = Permute((3, 1, 2))(x) # Convert tensor to CHW

Upsampling with tensorflow is not supported, so for a U-net you will need to use deconvolution aka ConvTranspose atm.
Only fused batch normalization is supported with tensorflow. If you use Keras, you need to use tf.keras with BatchNormalization parameter fused=True in order to get this.

OpenVINO 2018 R5 (Intel)

Pros

Nice API
Can automatically find both input and output nodes after loading the model as well as their shapes.
Lightweight
Supports most layers – here is a list of supported TensorFlow layers.

Cons

Proprietary – Requires registration to download

Caveats

CMake bug

OpenVINO currently has a bug on windows when using CMake, with error saying something like:

CMake Error at C:/Intel/computer_vision_sdk/inference_engine/src/extension/cmake/CPUID.cmake
    file STRINGS file "C:/path/cpudid.txt" cannot be read.

The issue is that OpenVINO cmake tries to write a file cpuid.txt with a list of your CPU’s capabilities. But it fails to write this file. You can hack this by going into C:/Intel/computer_vision_sdk/inference_engine/src/extension/cmake/CPUID.cmake and change line 251 to write to a specific path instead of the current path:

std::ofstream of(\"cpuid.txt\");

std::ofstream of(\"${CMAKE_BINARY_DIR}/cpuid.txt\");

It should work after this, but note that you will have to delete your project’s cmake cache for this change to take effect. (In the cmake GUI you can to this by: File->Delete Cache).

Resample layer

If you use the UpSampling2D layer in Keras (ResizeNearestNeighbor op in tensorflow) in your model, the model conversion will success, but you will get an error message when trying to execute the neural network saying:

Unsupported primitive of type: Resample name: up_sampling2d_4/ResizeNearestNeighbor

even though this layer is listed by Intel as being supported. This only happens on the CPU, on the GPU with the OpenCL NEO driver I just get all zeros with this layer.
The solution so far has been to use deconvolution or transpose convolution instead.

Incorrect output

If your neural network outputs the wrong answer; try to update your driver (Intel HD graphics driver). On windows, I have experienced that using the drivers from 2016, give the wrong neural network output. While updating to the newest 2017 version solves this.

Deploying Keras models

Keras for some reason, doesn’t use the native TensorFlow softmax operation if you have dimensions > 2. Thus if you have an U-net with softmax activation it will create multiple operations consisting of exponential (Exp) and Sum operations, which are not supported in neither TensorRT nor OpenVINO atm. and you will get error messages like these:

[ ERROR ]  List of operations that cannot be converted to IE IR:
[ ERROR ]      Exp (1)
[ ERROR ]          conv2d_23/Exp
[ ERROR ]      Sum (1)
[ ERROR ]          conv2d_23/Sum
[ ERROR ]  Part of the nodes was not translated to IE. Stopped.

The solution in Keras is to replace your final U-net layer:

x = Convolution2D(nr_of_classes, (1, 1), activation='softmax')(x)

with:

x = Convolution2D(nr_of_classes, (1, 1), activation='linear')(x)
def tf_softmax(x):
    return tf.nn.softmax(x, axis=-1)
x = Lambda(tf_softmax)(x)

Deep learning inference engines

TensorFlow 1.10 (Google)

Pros

Cons

TensorRT 5.0 (NVIDIA)

Pros

Cons

Caveats

OpenVINO 2018 R5 (Intel)

Pros

Cons

Caveats

CMake bug

Resample layer

Incorrect output

Deploying Keras models

You may also like...

Leave a Reply Cancel reply

Recent comments

Popular posts

Deep learning inference engines

TensorFlow 1.10 (Google)

Pros

Cons

TensorRT 5.0 (NVIDIA)

Pros

Cons

Caveats

OpenVINO 2018 R5 (Intel)

Pros

Cons

Caveats

CMake bug

Resample layer

Incorrect output

Deploying Keras models

You may also like...

Using deep neural networks to automate cardiac measurements in 2D ultrasound in real-time

Highlighting nerves and blood vessels for ultrasound axillary nerve blocks using neural networks

Competitive and cooperative interactions in biological inspired AI

Leave a Reply Cancel reply

Recent comments

Popular posts