Deep learning inference engines

I have been working a lot lately with different deep learning inference engines, integrating them into the FAST framework. Specifically I have been working with Google’s TensorFlow (with cuDNN acceleration), NVIDIA’s TensorRT and Intel’s OpenVINO. In this post, I compare these three engines, their pros and cons, as well as tricks on how to convert models from keras/tensorflow to run on these engines.

TensorFlow 1.10 (Google)


  • Supports many layers
  • NVIDIA cuDNN acceleration
  • Open-source and freely available on Github. However cuDNN is proprietary and requires registration to download.


  • Huge library with big footprint
  • Many third party dependencies
  • Very long compile time especially on windows
  • Uses Google’s bazel for compilation

TensorRT 5.0 (NVIDIA)


  • Performs auto tuning when loading the model – gives better performance than TensorFlow with cuDNN. Note however that this will increase the loading time of the model, this can reduced by caching/serializing the tuned model, see here for a guide on how to do this.
  • Lightweight


  • Supports very few TensorFlow layers (better support for ONNX), see the list of supported ops here.
  • Model converter is not able to convert from channel last to channel first ordering (NHWC -> NCWH), this means you have to retrain network with channel first ordering.
  • You need to provide the name of inputs and outputs nodes and their shape manually.
  • Proprietary – Requires registration to download


Currently, TensorRT only works with channel first ordering (NCHW).
Has issues with TensorFlow softmax layer in channel first ordering. Thus have to convert to channel last, and then two channel first like so:

x = Permute((2, 3, 1))(x) # Convert tensor to HWC
# Using tensorflow native softmax, due to issues with tensors with dimensions > 2
def tf_softmax(x):
    return tf.nn.softmax(x, axis=-1)
x = Lambda(tf_softmax)(x)
x = Permute((3, 1, 2))(x) # Convert tensor to CHW

Upsampling with tensorflow is not supported, so for a U-net you will need to use deconvolution aka ConvTranspose atm.
Only fused batch normalization is supported with tensorflow. If you use Keras, you need to use tf.keras with BatchNormalization parameter fused=True in order to get this.

OpenVINO 2018 R5 (Intel)



  • Proprietary – Requires registration to download


CMake bug

OpenVINO currently has a bug on windows when using CMake, with error saying something like:

CMake Error at C:/Intel/computer_vision_sdk/inference_engine/src/extension/cmake/CPUID.cmake
    file STRINGS file "C:/path/cpudid.txt" cannot be read.

The issue is that OpenVINO cmake tries to write a file cpuid.txt with a list of your CPU’s capabilities. But it fails to write this file. You can hack this by going into C:/Intel/computer_vision_sdk/inference_engine/src/extension/cmake/CPUID.cmake and change line 251 to write to a specific path instead of the current path:

std::ofstream of(\"cpuid.txt\");


std::ofstream of(\"${CMAKE_BINARY_DIR}/cpuid.txt\");

It should work after this, but note that you will have to delete your project’s cmake cache for this change to take effect. (In the cmake GUI you can to this by: File->Delete Cache).

Resample layer

If you use the UpSampling2D layer in Keras (ResizeNearestNeighbor op in tensorflow) in your model, the model conversion will success, but you will get an error message when trying to execute the neural network saying:

Unsupported primitive of type: Resample name: up_sampling2d_4/ResizeNearestNeighbor

even though this layer is listed by Intel as being supported. This only happens on the CPU, on the GPU with the OpenCL NEO driver I just get all zeros with this layer.
The solution so far has been to use deconvolution or transpose convolution instead.

Incorrect output

If your neural network outputs the wrong answer; try to update your driver (Intel HD graphics driver). On windows, I have experienced that using the drivers from 2016, give the wrong neural network output. While updating to the newest 2017 version solves this.

Deploying Keras models

Keras for some reason, doesn’t use the native TensorFlow softmax operation if you have dimensions > 2. Thus if you have an U-net with softmax activation it will create multiple operations consisting of exponential (Exp) and Sum operations, which are not supported in neither TensorRT nor OpenVINO atm. and you will get error messages like these:

[ ERROR ]  List of operations that cannot be converted to IE IR:
[ ERROR ]      Exp (1)
[ ERROR ]          conv2d_23/Exp
[ ERROR ]      Sum (1)
[ ERROR ]          conv2d_23/Sum
[ ERROR ]  Part of the nodes was not translated to IE. Stopped.

The solution in Keras is to replace your final U-net layer:

x = Convolution2D(nr_of_classes, (1, 1), activation='softmax')(x)


x = Convolution2D(nr_of_classes, (1, 1), activation='linear')(x)
def tf_softmax(x):
    return tf.nn.softmax(x, axis=-1)
x = Lambda(tf_softmax)(x)

You may also like...

Leave a Reply

Your email address will not be published.