Nvidia
Tensorflow-GPU in Docker with Nvidia
-
Instructions to install and run Tensorflow with Nvidia GPU inside Docker container.
- My setting
NVidia Driver: 450.80.02 CUDA: 11.0 Python: 3.7 Tensorflow: tensorflow-2.4.0
- Add the Docker Engine repository’s key and address to apt’s repository index:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - && sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
- Update package index and install the Docker engine:
$ sudo apt-get update && sudo apt-get install docker-ce docker-ce-cli containerd.io
- Run hello-world Docker image:
$ sudo docker run hello-world
- Uninstall any previous existing NVidia installations:
# To uninstall run file based installtion $ sudo ./NVIDIA-Linux-x86-310.19.run --uninstall # To uninstall package manager based installation $ sudo apt-get remove nvidia-430
More uninstallation instructions available at NVidia.
- Install recommended drivers for your GPU:
$ sudo ubuntu-drivers autoinstall
- Launch NVIDIA system management interface:
$ nvidia-smi
- Add NVIDIA Container Toolkit key and address to apt:
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
- Install NVIDIA Container Toolkit:
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
- Restart Docker:
$ sudo systemctl restart docker
- Check for compatible Tensorflow, NVIDIA driver, and Cuda versions
- Tensorflow: https://www.tensorflow.org/install/source#gpu
- NVidia: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver
- Run the NVIDIA system management interface inside of a CUDA Docker container:
$ docker run -u $(id -u):$(id -g) --gpus all --rm nvidia/cuda:11.0-base nvidia-smi
- Run bash inside a Tensorflow-GPU-Docker container. Container uses host machine’s GPU. Optionally, it maps a volume from local host to docker container.
$ sudo docker run -u $(id -u):$(id -g) --gpus all -it --network=host -v /home/kyber/workspaces/rl-tfagents/:/src/ tensorflow/tensorflow:2.4.0-gpu bash
Other docker commands
- List running Docker containers
$ sudo docker container ls
- Stop Docker container (replace CONTAINERID with ID of container to be stopped):
sudo docker stop <CONTAINERID>
Leave a comment