Model doesnot run in Google Colab Free Tier
#2
by
sanjeev-bhandari01
- opened
Install dependencies in the colab.
flash-attention
is not supporterd in Google Colab free tier
!pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed -q
!pip install auto-gptq optimum -q
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen-7B', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen-1_8B-Chat-Int8",
device_map="auto",
trust_remote_code=True
).eval()
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。
# Qwen-1.8B-Chat现在可以通过调整系统指令(System Prompt),实现角色扮演,语言风格迁移,任务设定,行为设定等能力。
# Qwen-1.8B-Chat can realize roly playing, language style transfer, task setting, and behavior setting by system prompt.
response, _ = model.chat(tokenizer, "你好呀", history=None, system="请用二次元可爱语气和我说话")
print(response)
It gives error as:
----> 9 response, history = model.chat(tokenizer, "你好", history=None)
10 print(response)
11 # 你好!很高兴为你提供帮助。
4 frames
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py in sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2758 # sample
2759 probs = nn.functional.softmax(next_token_scores, dim=-1)
-> 2760 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
2761
2762 # finished sentences should have their next token be a padding token
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Above code runs without problem in Kaggle without any modification. Can you tell me whats the problem of not running in the Google Colab Free Tier?
Is there problem of Old version of GPU or other-thing?
Thank you.