Improve TensorFlow Serving Performance with GPU Support


TensorFlow is an open source software toolkit developed by Google for machine learning research. It has widespread applications for research, education and business and has been used in projects ranging from real-time language translation to identification of promising drug candidates.

Bitnami offers cloud and container images, virtual machines and native installers for TensorFlow Serving and the TensorFlow Inception model, which is a model for machine-based image recognition. These images and installers make it easy to get started immediately with TensorFlow Serving. And if your host system has a NVIDIA GPU, you can leverage its additional processing capabilities to increase the performance of your TensorFlow Serving deployment.

This guide will walk you through the process of rebuilding the Bitnami TensorFlow Serving stack with NVIDIA GPU support.

Assumptions and prerequisites

This guide makes the following assumptions:

Step 1: Compile TensorFlow Serving with NVIDIA GPU support

To enable NVIDIA GPU support in TensorFlow Serving, follow these steps:

  • Install the build tools and Git (if not already installed):

    $ sudo apt-get install git build-essential
  • Install the kernel sources for your running kernel:

    $ sudo apt-get source linux-source
    $ sudo apt-get source linux-image-$(uname -r)
    $ sudo apt-get install linux-headers-$(uname -r)
  • Download the CUDA Toolkit and latest patches for your platform.
  • Run the following command to install the CUDA Toolkit:

    $ chmod +x cuda_X.Y.Z_linux-run
    $ sudo ./cuda_X.Y.Z_linux-run

    Read and confirm your acceptance of the EULA, and answer the pre-installation questions when prompted. Make a note of the CUDA Toolkit installation directory.

    NOTE: The remaining steps in this section will assume that the CUDA Toolkit was installed to the default location of /usr/local/cuda.
    To troubleshoot issues related to your CUDA installation, refer to this helpful troubleshooting guide by Victor Antonino.
  • Repeat the previous step for any CUDA Toolkit patches that were downloaded as well.
  • Once the CUDA Toolkit is installed, sign up for the free NVIDIA Developer Program (if you are not already a member) to download the NVIDIA CUDA Deep Neural Network library (cuDNN) v6.0.

    NOTE: The cuDNN v6.0 library is available for different versions of the CUDA Toolkit. Ensure that you download the cuDNN v6.0 library that also matches the previously-installed CUDA Toolkit version.
  • Run the following commands to install the cuDNN library:

    $ tar -xzvf cudnn-X.Y-linux-x64.tgz
    $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
    $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
  • Download and install the latest NCCL library from its GitHub repository:

    $ git clone
    $ cd nccl/
    $ make CUDA_HOME=/usr/local/cuda
    $ sudo make install
    $ sudo mkdir -p /usr/local/include/external/nccl_archive/src
    $ sudo ln -s /usr/local/include/nccl.h /usr/local/include/external/nccl_archive/src/nccl.h
  • Download TensorFlow Serving from its GitHub repository into your home directory using the command below:

    $ git clone --recurse-submodules ~/serving
  • Configure the build, making sure to say "Yes" when prompted to enable GPU processing. Leave the remaining options at their default values.

    $ cd ~/serving/tensorflow
    $ ./configure
  • Edit the tools/bazel.rc file in the repository root directory and make the following changes:

    • Due to a bug, change @org_tensorflow//third_party/gpus/crosstool to @local_config_cuda//crosstool:toolchain.

    • Update all instances of the PYTHON_BIN_PATH variable to use the Python binary included in the Bitnami TensorFlow Serving Stack at /opt/bitnami/python/bin/python.

    After making these changes, the edited bazel.rc file should look like this:

    build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain
    build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
    build --force_python=py2
    build --python2_path=/opt/bitnami/python/bin/python
    build --action_env PYTHON_BIN_PATH="/opt/bitnami/python/bin/python"
    build --define PYTHON_BIN_PATH=/opt/bitnami/python/bin/python
    test --define PYTHON_BIN_PATH=/opt/bitnami/python/bin/python
    run --define PYTHON_BIN_PATH=/opt/bitnami/python/bin/python
    build --spawn_strategy=standalone --genrule_strategy=standalone
    test --spawn_strategy=standalone --genrule_strategy=standalone
    run --spawn_strategy=standalone --genrule_strategy=standalone    
  • Compile TensorFlow Serving with GPU support with the commands below. Depending on the server specification, this process can take an hour or longer.

    $ cd ~/serving
    $ bazel clean --expunge && export TF_NEED_CUDA=1
    $ bazel build --config=opt --config=cuda tensorflow_serving/...
  • Stop the TensorFlow Serving service:

    $ sudo /opt/bitnami/ stop tensorflowserving
  • Copy the newly-compiled binary files and libraries for TensorFlow Serving into the Bitnami stack directory:

    $ sudo mv /opt/bitnami/tensorflow-serving/bin/tensorflow_model_server /opt/bitnami/tensorflow-serving/bin/tensorflow_model_server.old
    $ sudo cp ~/serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server_test_client.runfiles/local_config_cuda/cuda/cuda/lib/* /lib
    $ sudo cp ~/serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /opt/bitnami/tensorflow-serving/bin/
  • Start the TensorFlow Serving service:

    $ sudo /opt/bitnami/ start tensorflowserving

You should now be able to use TensorFlow Serving with GPU support enabled.

Step 2: Test TensorFlow Serving

Confirm that the TensorFlow Serving service is running with NVIDIA GPU support using either of these methods:

  • Use the ldd utility on the TensorFlow Serving binary and confirm that the output lists the CUDA, cuDNN and NVIDIA libraries, as shown in the example below:

     $ ldd /opt/bitnami/tensorflow-serving/bin/tensorflow_model_server   (0x00007ffdb69d1000) => /lib/ (0x00007fb5efe90000) => /lib/ (0x00007fb5ece4b000) => /usr/lib/x86_64-linux-gnu/ (0x00007fb5ec454000) => /lib/ (0x00007fb5e2ef2000) => /lib/ (0x00007fb5da0a3000) => /lib/ (0x00007fb5d612c000) => /lib/ (0x00007fb5d5ec6000) => /lib/x86_64-linux-gnu/ (0x00007fb5d5cbe000) => /lib/x86_64-linux-gnu/ (0x00007fb5d5aa0000) => /lib/x86_64-linux-gnu/ (0x00007fb5d589c000) => /usr/lib/x86_64-linux-gnu/ (0x00007fb5d5686000) => /lib/x86_64-linux-gnu/ (0x00007fb5d5384000) => /usr/lib/x86_64-linux-gnu/ (0x00007fb5d5079000) => /lib/x86_64-linux-gnu/ (0x00007fb5d4e63000) => /lib/x86_64-linux-gnu/ (0x00007fb5d4ab7000) => /usr/lib/x86_64-linux-gnu/ (0x00007fb5d486b000)
     /lib64/ (0x000055e6cf710000) 
  • Check the server log for messages indicating that the NVIDIA GPU modules have been loaded, as shown in the example below:

     $ tail -f /var/log/messages
     kernel: [ 2780.447221] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 250      

To learn more about the topics in this guide, visit the following links: