h2osiri / docs /INSTALL-DOCKER.md
ariel0330's picture
Upload folder using huggingface_hub
7e60a5e

A newer version of the Gradio SDK is available: 5.12.0

Upgrade

Containerized Installation for Inference on Linux GPU Servers

  1. Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see NVIDIA Containers for more examples.

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
        && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
        && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base
    sudo apt install nvidia-container-runtime
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
    
  2. Build the container image:

    docker build -t h2ogpt .
    
  3. Run the container (you can also use finetune.py and all of its parameters as shown above for training):

    For the fine-tuned h2oGPT with 20 billion parameters:

    docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
        -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
        --base_model=h2oai/h2ogpt-oasst1-512-20b
    

    if have a private HF token, can instead run:

    docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \
    -e HUGGINGFACE_API_TOKEN=<HUGGINGFACE_API_TOKEN> \
    -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \
     -c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True'
    

    For your own fine-tuned model starting from the gpt-neox-20b foundation model for example:

    docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
        -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
        --base_model=EleutherAI/gpt-neox-20b \
        --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot
    
  4. Open https://localhost:7860 in the browser

Docker Compose Setup & Inference

  1. (optional) Change desired model and weights under environment in the docker-compose.yml

  2. Build and run the container

    docker-compose up -d --build
    
  3. Open https://localhost:7860 in the browser

  4. See logs:

    docker-compose logs -f
    
  5. Clean everything up:

    docker-compose down --volumes --rmi all