|
## h2oGPT Installation Help |
|
|
|
Follow these instructions to get a working Python environment on a Linux system. |
|
|
|
### Install Python environment |
|
|
|
Download Miniconda, for [Linux](https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh) or MACOS [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers) or Windows [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe). Then, install conda and setup environment: |
|
```bash |
|
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh # for linux x86-64 |
|
# follow license agreement and add to bash if required |
|
``` |
|
Enter new shell and should also see `(base)` in prompt. Then, create new env: |
|
```bash |
|
conda create -n h2ogpt -y |
|
conda activate h2ogpt |
|
conda install -y mamba -c conda-forge # for speed |
|
mamba install python=3.10 -c conda-forge -y |
|
conda update -n base -c defaults conda -y |
|
``` |
|
You should see `(h2ogpt)` in shell prompt. Test your python: |
|
```bash |
|
python --version |
|
``` |
|
should say 3.10.xx and: |
|
```bash |
|
python -c "import os, sys ; print('hello world')" |
|
``` |
|
should print `hello world`. Then clone: |
|
```bash |
|
git clone https://github.com/h2oai/h2ogpt.git |
|
cd h2ogpt |
|
``` |
|
Then go back to [README](../README.md) for package installation and use of `generate.py`. |
|
|
|
### Installing CUDA Toolkit |
|
|
|
E.g. CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local) |
|
|
|
E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands: |
|
```bash |
|
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin |
|
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 |
|
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb |
|
sudo dpkg -i cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb |
|
sudo cp /var/cuda-repo-ubuntu2004-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/ |
|
sudo apt-get update |
|
sudo apt-get -y install cuda |
|
``` |
|
|
|
Then set the system up to use the freshly installed CUDA location: |
|
```bash |
|
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/cuda/lib64/" >> ~/.bashrc |
|
echo "export CUDA_HOME=/usr/local/cuda" >> ~/.bashrc |
|
echo "export PATH=\$PATH:/usr/local/cuda/bin/" >> ~/.bashrc |
|
source ~/.bashrc |
|
conda activate h2ogpt |
|
``` |
|
|
|
Then reboot the machine, to get everything sync'ed up on restart. |
|
```bash |
|
sudo reboot |
|
``` |
|
|
|
### Compile bitsandbytes |
|
|
|
For fast 4-bit and 8-bit training, one needs bitsandbytes. [Compiling bitsandbytes](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) is only required if you have different CUDA than built into bitsandbytes pypi package, |
|
which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1. Here we compile for 12.1 as example. |
|
```bash |
|
git clone http://github.com/TimDettmers/bitsandbytes.git |
|
cd bitsandbytes |
|
git checkout 7c651012fce87881bb4e194a26af25790cadea4f |
|
CUDA_VERSION=121 make cuda12x |
|
CUDA_VERSION=121 python setup.py install |
|
cd .. |
|
``` |
|
|
|
### Install nvidia GPU manager if have multiple A100/H100s. |
|
```bash |
|
sudo apt-key del 7fa2af80 |
|
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') |
|
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb |
|
sudo dpkg -i cuda-keyring_1.0-1_all.deb |
|
sudo apt-get update |
|
sudo apt-get install -y datacenter-gpu-manager |
|
sudo apt-get install -y libnvidia-nscq-530 |
|
sudo systemctl --now enable nvidia-dcgm |
|
dcgmi discovery -l |
|
``` |
|
See [GPU Manager](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html) |
|
|
|
### Install and run Fabric Manager if have multiple A100/100s |
|
|
|
```bash |
|
sudo apt-get install cuda-drivers-fabricmanager |
|
sudo systemctl start nvidia-fabricmanager |
|
sudo systemctl status nvidia-fabricmanager |
|
``` |
|
See [Fabric Manager](https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html) |
|
|
|
Once have installed and reboot system, just do: |
|
|
|
```bash |
|
sudo systemctl --now enable nvidia-dcgm |
|
dcgmi discovery -l |
|
sudo systemctl start nvidia-fabricmanager |
|
sudo systemctl status nvidia-fabricmanager |
|
``` |
|
|
|
### Tensorboard (optional) to inspect training |
|
|
|
```bash |
|
tensorboard --logdir=runs/ |
|
``` |
|
|
|
### Flash Attention |
|
|
|
Update: this is not needed anymore, see https://github.com/h2oai/h2ogpt/issues/128 |
|
|
|
To use flash attention with LLaMa, need cuda 11.7 so flash attention module compiles against torch. |
|
|
|
E.g. for Ubuntu, one goes to [cuda toolkit](https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=runfile_local), then: |
|
```bash |
|
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run |
|
sudo bash ./cuda_11.7.0_515.43.04_linux.run |
|
``` |
|
Then No for symlink change, say continue (not abort), accept license, keep only toolkit selected, select install. |
|
|
|
If cuda 11.7 is not your base installation, then when doing pip install -r requirements.txt do instead: |
|
```bash |
|
CUDA_HOME=/usr/local/cuda-11.7 pip install -r reqs_optional/requirements_optional_flashattention.txt |
|
``` |
|
|