File size: 2,813 Bytes
7e60a5e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
### Containerized Installation for Inference on Linux GPU Servers
1. Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see [NVIDIA Containers](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) for more examples.
```bash
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base
sudo apt install nvidia-container-runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
2. Build the container image:
```bash
docker build -t h2ogpt .
```
3. Run the container (you can also use `finetune.py` and all of its parameters as shown above for training):
For the fine-tuned h2oGPT with 20 billion parameters:
```bash
docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
--base_model=h2oai/h2ogpt-oasst1-512-20b
```
if have a private HF token, can instead run:
```bash
docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \
-e HUGGINGFACE_API_TOKEN=<HUGGINGFACE_API_TOKEN> \
-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \
-c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True'
```
For your own fine-tuned model starting from the gpt-neox-20b foundation model for example:
```bash
docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \
-v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \
--base_model=EleutherAI/gpt-neox-20b \
--lora_weights=h2ogpt_lora_weights --prompt_type=human_bot
```
4. Open `https://localhost:7860` in the browser
### Docker Compose Setup & Inference
1. (optional) Change desired model and weights under `environment` in the `docker-compose.yml`
2. Build and run the container
```bash
docker-compose up -d --build
```
3. Open `https://localhost:7860` in the browser
4. See logs:
```bash
docker-compose logs -f
```
5. Clean everything up:
```bash
docker-compose down --volumes --rmi all
```
|