Spaces:

ariel0330
/

h2osiri

Runtime error

App Files Files Community

h2osiri / docs /INSTALL-DOCKER.md

ariel0330

Upload folder using huggingface_hub

7e60a5e over 1 year ago

preview code

raw

history blame contribute delete

2.81 kB

	### Containerized Installation for Inference on Linux GPU Servers

	1. Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see [NVIDIA Containers](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) for more examples.

	```bash
	distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
	&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
	&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list \| \
	sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \| \
	sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
	sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base
	sudo apt install nvidia-container-runtime
	sudo nvidia-ctk runtime configure --runtime=docker
	sudo systemctl restart docker
	```

	2. Build the container image:

	```bash
	docker build -t h2ogpt .
	```

	3. Run the container (you can also use `finetune.py` and all of its parameters as shown above for training):

	For the fine-tuned h2oGPT with 20 billion parameters:
	```bash
	docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
	-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
	--base_model=h2oai/h2ogpt-oasst1-512-20b
	```

	if have a private HF token, can instead run:
	```bash
	docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \
	-e HUGGINGFACE_API_TOKEN=<HUGGINGFACE_API_TOKEN> \
	-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \
	-c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True'
	```

	For your own fine-tuned model starting from the gpt-neox-20b foundation model for example:
	```bash
	docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
	-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
	--base_model=EleutherAI/gpt-neox-20b \
	--lora_weights=h2ogpt_lora_weights --prompt_type=human_bot
	```

	4. Open `https://localhost:7860` in the browser

	### Docker Compose Setup & Inference

	1. (optional) Change desired model and weights under `environment` in the `docker-compose.yml`

	2. Build and run the container

	```bash
	docker-compose up -d --build
	```

	3. Open `https://localhost:7860` in the browser

	4. See logs:

	```bash
	docker-compose logs -f
	```

	5. Clean everything up:

	```bash
	docker-compose down --volumes --rmi all
	```