Apply for community grant: Academic project (gpu)

#1
by yrshi - opened

This space is for the online demo of the ACL 2024 paper ReactXT: Understanding Molecular β€œReaction-ship” via Reaction-Contextualized Molecule-Text Pretraining

We kindly apply for online GPU resources to deploy the demo. This is an open-sourced project for academic purposes.

Hi @yrshi , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Thanks for your help @hysts ! Now I can see the free grant ZeroGPU choice in settings/space hardware.

However there's a rotating loading mark at the right top of the ZeroGPU card, and I cannot select ZeroGPU on this page. Does this mean the GPU I applied for is still in the queue and requires waiting?

image.png

@yrshi No, the hardware is already assigned, so I think it means the Space is still building (or failed to launch in this case). FYI, it should look like the following screenshot (which is taken from other granted Space) once you fix the build error.

I see the following error in the log:

Collecting flash_attn
  Downloading flash_attn-2.5.9.post1.tar.gz (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 358.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  Γ— python setup.py egg_info did not run successfully.
  β”‚ exit code: 1
  ╰─> [20 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      /tmp/pip-install-s0a7kjdi/flash-attn_efa6a25f031e41fc80fbaf9954824612/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
        warnings.warn(
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-s0a7kjdi/flash-attn_efa6a25f031e41fc80fbaf9954824612/setup.py", line 134, in <module>
          CUDAExtension(
        File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1074, in CUDAExtension
          library_dirs += library_paths(cuda=True)
        File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1201, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2407, in _join_cuda_home
          raise OSError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
      
      
      torch.__version__  = 2.2.0+cu121
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

I think it's because you added flash_attn in your requirements.txt here, but on ZeroGPU, CUDA is not available at build time, so I think you need to install it like this at startup.

Got it, I'll try to fix it. Thanks for your reply (οΎŸβˆ€οΎŸ)

@hysts Sorry for the bothering, but I encountered some problems with spaces.GPU and I didn't find the solution in the docs.

My model class uses a nn.Module from another file in my space, thus the imported nn.Module can not run on GPU.

In my app.py, I do:

@spaces.GPU


@torch
	.no_grad()
def predict(self, rxn_dict, temperature=1):
    graphs, prompt_tokens = self.tokenize(rxn_dict)
    result_dict = rxn_dict
    samples = {'graphs': graphs, 'prompt_tokens': prompt_tokens}
    prediction = self.model.blip2opt.generate(
        samples,
        do_sample=self.args.do_sample,
        num_beams=self.args.num_beams,
        max_length=self.args.max_inference_len,
        min_length=self.args.min_inference_len,
        num_captions=self.args.num_generate_captions,
        temperature=temperature,
        use_graph=True
    )[0]
    for k, v in result_dict['extracted_molecules'].items():
        prediction = prediction.replace(v, k)
    result_dict['prediction'] = prediction
    return result_dict

Here self.model.blip2opt uses graph_encoder (which is a instance of GNN in model/gin_model.py, line 213) to encode samples['graphs'], and uses an OPTForCausalLM to encode the text.

When running the above code, I got the following error:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 216, in thread_wrapper
res = future.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/app/tmp.py", line 205, in predict
prediction = self.model.generate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/app/model/blip2_opt.py", line 378, in generate
graph_embeds, graph_masks = self.graph_encoder(graphs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/app/model/gin_model.py", line 275, in forward
x = self.x_embedding1(x[:,0]) + self.x_embedding2(x[:,1])
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Traceback (most recent call last):
File "/home/user/app/tmp.py", line 284, in
main(args)
File "/home/user/app/tmp.py", line 277, in main
online_chat(example_inputs[0])
File "/home/user/app/tmp.py", line 272, in online_chat
result = infer_runner.predict(data_item, temperature=temperature)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 177, in gradio_handler
raise res.value
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

I've checked that the input tensors have been put on GPU correctly, but the model params are still on CPU.

I tried wrap the forward function of class GNN, but got this:

/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py:77: UserWarning: Using a ZeroGPU function outside of Gradio caching or request might block the app
warnings.warn("Using a ZeroGPU function outside of Gradio caching or request might block the app")

Oh, I think I fix it by manually moving the model to cuda during runtime.

    @spaces.GPU
    

@torch
	.no_grad()
    def predict(self, rxn_dict, temperature=1):
        graphs, prompt_tokens = self.tokenize(rxn_dict)
        self.model.blip2opt = self.model.blip2opt.to('cuda')
        result_dict = rxn_dict
        samples = {'graphs': graphs, 'prompt_tokens': prompt_tokens}
        prediction = self.model.blip2opt.generate(
            samples,
            do_sample=self.args.do_sample,
            num_beams=self.args.num_beams,
            max_length=self.args.max_inference_len,
            min_length=self.args.min_inference_len,
            num_captions=self.args.num_generate_captions,
            temperature=temperature,
            use_graph=True
        )[0]
        for k, v in result_dict['extracted_molecules'].items():
            prediction = prediction.replace(v, k)
        result_dict['prediction'] = prediction
        return result_dict

Thanks for your patience.

yrshi changed discussion status to closed

Sign up or log in to comment