awstensorflow-serving

Enable NVIDIA GPU support

Enable NVIDIA GPU support in TensorFlow Serving

NOTE: The steps below require you to download various libraries and recompile TensorFlow Serving with GPU support. Before proceeding, ensure that the host system has the necessary disk space, CPU and RAM to handle heavy compilation workloads.

To enable NVIDIA GPU support in TensorFlow Serving, follow these steps:

  • Install the build tools and Git (if not already installed):

      $ sudo apt-get install git build-essential
    
  • Install the kernel sources for your running kernel:

      $ sudo apt-get source linux-source
      $ sudo apt-get source linux-image-$(uname -r)
      $ sudo apt-get install linux-headers-$(uname -r)
    
  • Download the CUDA Toolkit and latest patches for your platform.

  • Run the following command to install the CUDA Toolkit:

      $ chmod +x cuda_X.Y.Z_linux-run
      $ sudo ./cuda_X.Y.Z_linux-run
    

    Read and confirm your acceptance of the EULA, and answer the pre-installation questions when prompted. Make a note of the CUDA Toolkit installation directory.

    NOTE: The remaining steps in this section will assume that the CUDA Toolkit was installed to the default location of /usr/local/cuda.

    : To troubleshoot issues related to your CUDA installation, refer to this helpful troubleshooting guide by Victor Antonino.

  • Repeat the previous step for any CUDA Toolkit patches that were downloaded as well.

  • Once the CUDA Toolkit is installed, sign up for the free NVIDIA Developer Program (if you are not already a member) to download the NVIDIA CUDA Deep Neural Network library (cuDNN) v6.0.

    NOTE: The cuDNN v6.0 library is available for different versions of the CUDA Toolkit. Ensure that you download the cuDNN v6.0 library that also matches the previously-installed CUDA Toolkit version.

  • Run the following commands to install the cuDNN library:

      $ tar -xzvf cudnn-X.Y-linux-x64.tgz
      $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
      $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
      $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
    
  • Download and install the latest NCCL library from its GitHub repository:

      $ git clone https://github.com/NVIDIA/nccl.git
      $ cd nccl/
      $ make CUDA_HOME=/usr/local/cuda
      $ sudo make install
      $ sudo mkdir -p /usr/local/include/external/nccl_archive/src
      $ sudo ln -s /usr/local/include/nccl.h /usr/local/include/external/nccl_archive/src/nccl.h
    
  • Download TensorFlow Serving from its GitHub repository into your home directory using the command below:

      $ git clone --recurse-submodules https://github.com/tensorflow/serving ~/serving
    
  • Configure the build, making sure to say “Yes” when prompted to enable GPU processing. Leave the remaining options at their default values.

      $ cd ~/serving/tensorflow
      $ ./configure
    
  • Edit the tools/bazel.rc file in the repository root directory and make the following changes:

    • Due to a bug, change @org_tensorflow//third_party/gpus/crosstool to @local_config_cuda//crosstool:toolchain.

    • Update all instances of the PYTHON_BIN_PATH variable to use the Python binary included in the Bitnami TensorFlow Serving Stack at /opt/bitnami/python/bin/python.

      After making these changes, the edited bazel.rc file should look like this:

        build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain
        build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
        build --force_python=py2
        build --python2_path=/opt/bitnami/python/bin/python
        build --action_env PYTHON_BIN_PATH="/opt/bitnami/python/bin/python"
        build --define PYTHON_BIN_PATH=/opt/bitnami/python/bin/python
        test --define PYTHON_BIN_PATH=/opt/bitnami/python/bin/python
        run --define PYTHON_BIN_PATH=/opt/bitnami/python/bin/python
        build --spawn_strategy=standalone --genrule_strategy=standalone
        test --spawn_strategy=standalone --genrule_strategy=standalone
        run --spawn_strategy=standalone --genrule_strategy=standalone
      
  • Compile TensorFlow Serving with GPU support with the commands below. Depending on the server specification, this process can take an hour or longer.

      $ cd ~/serving
      $ bazel clean --expunge && export TF_NEED_CUDA=1
      $ bazel build --config=opt --config=cuda tensorflow_serving/...
    
  • Stop the TensorFlow Serving service:

      $ sudo /opt/bitnami/ctlscript.sh stop tensorflowserving
    
  • Copy the newly-compiled binary files and libraries for TensorFlow Serving into the Bitnami stack directory:

      $ sudo mv /opt/bitnami/tensorflow-serving/bin/tensorflow_model_server /opt/bitnami/tensorflow-serving/bin/tensorflow_model_server.old
      $ sudo cp ~/serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server_test_client.runfiles/local_config_cuda/cuda/cuda/lib/* /lib
      $ sudo cp ~/serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /opt/bitnami/tensorflow-serving/bin/
    
  • Start the TensorFlow Serving service:

      $ sudo /opt/bitnami/ctlscript.sh start tensorflowserving
    

You should now be able to use TensorFlow Serving with GPU support enabled.

Check that TensorFlow Serving is running with NVIDIA GPU support

Confirm that the TensorFlow Serving service is running with NVIDIA GPU support using either of these methods:

  • Use the ldd utility on the TensorFlow Serving binary and confirm that the output lists the CUDA, cuDNN and NVIDIA libraries, as shown in the example below:

      $ ldd /opt/bitnami/tensorflow-serving/bin/tensorflow_model_server
      linux-vdso.so.1 (0x00007ffdb69d1000)
      libcusolver.so.8.0 => /lib/libcusolver.so.8.0 (0x00007fb5efe90000)
      libcublas.so.8.0 => /lib/libcublas.so.8.0 (0x00007fb5ece4b000)
      libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fb5ec454000)
      libcudnn.so.6 => /lib/libcudnn.so.6 (0x00007fb5e2ef2000)
      libcufft.so.8.0 => /lib/libcufft.so.8.0 (0x00007fb5da0a3000)
      libcurand.so.8.0 => /lib/libcurand.so.8.0 (0x00007fb5d612c000)
      libcudart.so.8.0 => /lib/libcudart.so.8.0 (0x00007fb5d5ec6000)
      librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb5d5cbe000)
      libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb5d5aa0000)
      libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb5d589c000)
      libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fb5d5686000)
      libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb5d5384000)
      libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb5d5079000)
      libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb5d4e63000)
      libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb5d4ab7000)
      libnvidia-fatbinaryloader.so.375.26 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.26 (0x00007fb5d486b000)
      /lib64/ld-linux-x86-64.so.2 (0x000055e6cf710000)
    
  • Check the server log for messages indicating that the NVIDIA GPU modules have been loaded, as shown in the example below:

      $ tail -f /var/log/messages
      kernel: [ 2780.447221] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 250
    
Last modification December 21, 2022