There was an error on during your training (CUDA on Huggingface)

#92
by MrKan1ster - opened

I made several Models using this space one Year ago.
Now i duplicated it again and tried exactly the same as in the past. Simply uploading pictures, creating a token, upgrading the GPU and click on "Train"

Everythin on this Huggingface WebUI.
But now i get the Message:

Unfortunately there was an error during training your Randy404 model.
Please check it out below. Feel free to report this issue to Dreambooth Training:


    CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs aboveto fix your environment!
    If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
    https://github.com/TimDettmers/bitsandbytes/issues
            ```


And Google only tells me stuff i dont understand. 
I dont have something locally nor do i know how to update anything inside the space.... 
In the past it worked just by duplicationg it here.... what has changed?
deleted

Same issue for me - hope to use it again soon.

Nothing? I would really like to train new models...ist there an alternative?

Same x10. space is a waste of time at the momnet.

Same here - there seemsto be unsolved dependency issues due to partly upgraded packages ...

@Hugging Face - Team: Please resolve theses issues :)

===== Application Startup at 2024-05-13 08:10:44 =====

ERROR: xformers-0.0.15.dev0+1515f77.d20221130-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform.

[notice] A new release of pip available: 22.3.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/gradio/blocks.py:1222: UserWarning: The default_enabled parameter of queue has no effect and will be removed in a future version of gradio.
warnings.warn(
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().
Starting single training...
Namespace(pretrained_model_name_or_path='/home/user/.cache/huggingface/hub/models--multimodalart--sd-fine-tunable/snapshots/9dabd4dbbdd4c72e2ffbc8fb4e28debef0254949', tokenizer_name=None, instance_data_dir='instance_images', class_data_dir=None, instance_prompt='', class_prompt='', with_prior_preservation=False, prior_loss_weight=1.0, num_class_images=100, output_dir='output_model', seed=42, resolution=512, center_crop=False, train_text_encoder=True, train_batch_size=1, sample_batch_size=4, num_train_epochs=1, max_train_steps=750, gradient_accumulation_steps=1, gradient_checkpointing=False, learning_rate=2e-06, scale_lr=False, lr_scheduler='polynomial', lr_warmup_steps=0, use_8bit_adam=True, adam_beta1=0.9, adam_beta2=0.999, adam_weight_decay=0.01, adam_epsilon=1e-08, max_grad_norm=1.0, push_to_hub=False, hub_token=None, hub_model_id=None, logging_dir='logs', mixed_precision='fp16', save_n_steps=0, save_starting_step=1, stop_text_encoder_training=225, image_captions_filename=True, dump_only_text_encoder=False, train_only_unet=False, cache_latents=False, Session_dir='', local_rank=-1)
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/bitsandbytes/cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/etc/pyenv.d'), PosixPath('/usr/lib/pyenv/hooks'), PosixPath('/etc/pyenv.d')}
warn(msg)
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/bitsandbytes/cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Europe/Paris')}
warn(msg)
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/bitsandbytes/cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('tcp'), PosixPath('443'), PosixPath('//172.20.0.1')}
warn(msg)
/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/bitsandbytes/cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('cnywt/dreambooth-training')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 123
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone [email protected]:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 123
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone [email protected]:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 123
CUDA SETUP: Required library version not found: libbitsandbytes_cuda123.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone [email protected]:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone [email protected]:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=123
python setup.py install
Adding Safety Checker to the model...
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/gradio/routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/gradio/blocks.py", line 1015, in process_api
result = await self.call_function(
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/gradio/blocks.py", line 833, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/gradio/helpers.py", line 584, in tracked_fn
response = fn(*args)
File "/home/user/app/app.py", line 345, in train
push(model_name, where_to_upload, hf_token, which_model, True)
File "/home/user/app/app.py", line 365, in push
convert("output_model", "model.ckpt")
File "/home/user/app/convertosd.py", line 270, in convert
unet_state_dict = torch.load(unet_path, map_location="cpu")
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'output_model/unet/diffusion_pytorch_model.bin'

Nothing? Is there an alternative to train models for dreambooth?

Sign up or log in to comment