Tensorflow-GPU in Docker with Nvidia

  1. Instructions to install and run Tensorflow with Nvidia GPU inside Docker container.

  2. My setting
     NVidia Driver: 450.80.02    
     CUDA: 11.0
     Python: 3.7
     Tensorflow: tensorflow-2.4.0
  3. Add the Docker Engine repository’s key and address to apt’s repository index:
     $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - && sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  4. Update package index and install the Docker engine:
     $ sudo apt-get update && sudo apt-get install docker-ce docker-ce-cli containerd.io
  5. Run hello-world Docker image:
     $ sudo docker run hello-world
  6. Uninstall any previous existing NVidia installations:
     # To uninstall run file based installtion
     $ sudo ./NVIDIA-Linux-x86-310.19.run --uninstall
     # To uninstall package manager based installation
     $ sudo apt-get remove nvidia-430

    More uninstallation instructions available at NVidia.

  7. Install recommended drivers for your GPU:
     $ sudo ubuntu-drivers autoinstall
  8. Launch NVIDIA system management interface:
     $ nvidia-smi
  9. Add NVIDIA Container Toolkit key and address to apt:
     $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 
     $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - 
     $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  10. Install NVIDIA Container Toolkit:
     $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
  11. Restart Docker:
     $ sudo systemctl restart docker
  12. Check for compatible Tensorflow, NVIDIA driver, and Cuda versions
    • Tensorflow: https://www.tensorflow.org/install/source#gpu
    • NVidia: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver
  13. Run the NVIDIA system management interface inside of a CUDA Docker container:
     $ docker run -u $(id -u):$(id -g) --gpus all --rm nvidia/cuda:11.0-base nvidia-smi
  14. Run bash inside a Tensorflow-GPU-Docker container. Container uses host machine’s GPU. Optionally, it maps a volume from local host to docker container.
     $ sudo docker run -u $(id -u):$(id -g) --gpus all -it --network=host -v /home/kyber/workspaces/rl-tfagents/:/src/ tensorflow/tensorflow:2.4.0-gpu bash

Other docker commands

  1. List running Docker containers
     $ sudo docker container ls
  2. Stop Docker container (replace CONTAINERID with ID of container to be stopped):
     sudo docker stop <CONTAINERID>

Leave a comment