Nvidia CUDA开发环境 Docker容器启用显卡

Home > 系统管理, 语言编程 > Nvidia CUDA开发环境 Docker容器启用显卡

Nvidia CUDA开发环境 Docker容器启用显卡

April 28th, 2020 李大仁 Leave a comment Go to comments

Nvidia CUDA开发环境 Docker容器启用显卡

1.准备docker>19.03 环境，配置好nvidia-container-toolkit
2.确定本机已安装的显卡驱动版本，匹配需要的容器版本
3.Pull基础docker镜像，可以从官方或者dockerhub下载
https://ngc.nvidia.com/catalog/containers/nvidia:cuda/tags
https://gitlab.com/nvidia/container-images/cuda

cuda10-py36-conda的Dockerfile

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
MAINTAINER Limc <limc@limc.com.cn>
 
#close frontend
ENV DEBIAN_FRONTEND noninteractive
 
# add cuda user
# --disabled-password = Don't assign a password
# using root group for OpenShift compatibility
ENV CUDA_USER_NAME=cuda10
ENV CUDA_USER_GROUP=root
 
# add user
RUN adduser --system --group --disabled-password --no-create-home --disabled-login $CUDA_USER_NAME
RUN adduser $CUDA_USER_NAME $CUDA_USER_GROUP
 
# Install basic dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        cmake \
        git \
        wget \
        libopencv-dev \
        libsnappy-dev \
        python-dev \
        python-pip \
        #tzdata \
        vim
 
# Install conda for python
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh -O ~/miniconda.sh && \
    /bin/bash ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh
 
# Set locale
ENV LANG C.UTF-8 LC_ALL=C.UTF-8
 
ENV PATH /opt/conda/bin:$PATH
 
RUN ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh  && \
    echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
    echo "conda activate base" >> ~/.bashrc && \
    find /opt/conda/ -follow -type f -name '*.a' -delete && \
    find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
    /opt/conda/bin/conda clean -afy
 
 
# copy entrypoint.sh
#COPY ./entrypoint.sh /entrypoint.sh
# install 
#ENTRYPOINT ["/entrypoint.sh"]
 
# Initialize workspace
COPY ./app /app
# make workdir
WORKDIR /app
 
# update pip if nesseary
#RUN pip install --upgrade --no-cache-dir pip
# install gunicorn
# RUN pip install --no-cache-dir -r ./requirements.txt
 
# install use conda
#RUN conda install --yes --file ./requirements.txt
RUN while read requirement; do conda install --yes $requirement; done < requirements.txt
 
 
# copy entrypoint.sh
COPY ./entrypoint.sh /entrypoint.sh
# install 
ENTRYPOINT ["/entrypoint.sh"]
 
# switch to non-root user
USER $CUDA_USER_NAME

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 MAINTAINER Limc <[email protected]> #close frontend ENV DEBIAN_FRONTEND noninteractive # add cuda user # --disabled-password = Don't assign a password # using root group for OpenShift compatibility ENV CUDA_USER_NAME=cuda10 ENV CUDA_USER_GROUP=root # add user RUN adduser --system --group --disabled-password --no-create-home --disabled-login $CUDA_USER_NAME RUN adduser $CUDA_USER_NAME $CUDA_USER_GROUP # Install basic dependencies RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ cmake \ git \ wget \ libopencv-dev \ libsnappy-dev \ python-dev \ python-pip \ #tzdata \ vim # Install conda for python RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh -O ~/miniconda.sh && \ /bin/bash ~/miniconda.sh -b -p /opt/conda && \ rm ~/miniconda.sh # Set locale ENV LANG C.UTF-8 LC_ALL=C.UTF-8 ENV PATH /opt/conda/bin:$PATH RUN ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \ echo "conda activate base" >> ~/.bashrc && \ find /opt/conda/ -follow -type f -name '*.a' -delete && \ find /opt/conda/ -follow -type f -name '*.js.map' -delete && \ /opt/conda/bin/conda clean -afy # copy entrypoint.sh #COPY ./entrypoint.sh /entrypoint.sh # install #ENTRYPOINT ["/entrypoint.sh"] # Initialize workspace COPY ./app /app # make workdir WORKDIR /app # update pip if nesseary #RUN pip install --upgrade --no-cache-dir pip # install gunicorn # RUN pip install --no-cache-dir -r ./requirements.txt # install use conda #RUN conda install --yes --file ./requirements.txt RUN while read requirement; do conda install --yes $requirement; done < requirements.txt # copy entrypoint.sh COPY ./entrypoint.sh /entrypoint.sh # install ENTRYPOINT ["/entrypoint.sh"] # switch to non-root user USER $CUDA_USER_NAME

运行容器Makefile

 
IMG:=`cat Name`
GPU_OPT:=all
MOUNT_ETC:=
MOUNT_LOG:=
MOUNT_APP:=-v `pwd`/work/app:/app
MOUNT:=$(MOUNT_ETC) $(MOUNT_LOG) $(MOUNT_APP)
EXT_VOL:=
PORT_MAP:=
LINK_MAP:=
RESTART:=no
CONTAINER_NAME:=docker-cuda10-py36-hello
 
echo:
    echo $(IMG)
 
run:
    docker rm $(CONTAINER_NAME) || echo
    docker run -d --gpus $(GPU_OPT) --name $(CONTAINER_NAME) $(LINK_MAP) $(PORT_MAP) --restart=$(RESTART) \
                         $(EXT_VOL) $(MOUNT) $(IMG)
 
run_i:
    docker rm $(CONTAINER_NAME) || echo
    docker run -i -t --gpus $(GPU_OPT) --name $(CONTAINER_NAME) $(LINK_MAP) $(PORT_MAP) \
                         $(EXT_VOL) $(MOUNT) $(IMG) /bin/bash 
 
exec_i:
    docker exec -i -t --name $(CONTAINER_NAME)  /bin/bash 
 
stop:
    docker stop $(CONTAINER_NAME)
 
rm: stop
    docker rm $(CONTAINER_NAME)

Entrypoint.sh

set -e
 
# Add python as command if needed
if [ "${1:0:1}" = '-' ]; then
    set -- python "$@"
fi
 
# Drop root privileges if we are running gunicorn
# allow the container to be started with `--user`
if [ "$1" = 'python' -a "$(id -u)" = '0' ]; then
    # Change the ownership of user-mutable directories to gunicorn
    for path in \
        /app \
        /usr/local/cuda/ \
    ; do
        chown -R cuda10:root "$path"
    done
 
    set -- su-exec python "$@"
    #exec su-exec elasticsearch "$BASH_SOURCE" "$@"
fi
 
# As argument is not related to gunicorn,
# then assume that user wants to run his own process,
# for example a `bash` shell to explore this image
exec "$@"

几个注意点
1.显卡运行需要root用户权限，否则会出现以下，
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345
考虑安全性可以在容器内创建新用户并加入到root组
2.本机显卡驱动和CUDA必须匹配官方容器的版本，cudnn则不需要匹配，可以使用多个不同版本的cudnn，但是必须满足显卡要求的使用范围
3.docker运行容器非正常结束时会占用显卡，如果卡死，会造成容器外部无法使用，重启docker-daemon也无效，这时只能重启电脑

完整的源代码
https://github.com/limccn/ultrasound-nerve-segmentation-in-tensorflow/commit/d7de1cbeb641d2fae4f5a78ff590a0254667b398

参考
https://gitlab.com/nvidia/container-images/cuda

Categories: 系统管理, 语言编程 Tags:

李大仁博客

Nvidia CUDA开发环境 Docker容器启用显卡

推荐阅读：

Recent Posts

Random Post

Archives

Categories

Blogroll

Recent Comments

Meta