Xhaheen commited on
Commit
8d9a93c
·
verified ·
1 Parent(s): 28f8798

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -0
README.md CHANGED
@@ -19,4 +19,116 @@ base_model: unsloth/gemma-7b-bnb-4bit
19
 
20
  This gemma model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
19
 
20
  This gemma model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
+
23
+
24
+ # Inference With Unsloth on colab
25
+
26
+
27
+ ```python3
28
+
29
+
30
+ import torch
31
+ major_version, minor_version = torch.cuda.get_device_capability()
32
+
33
+
34
+ !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
35
+ if major_version >= 8:
36
+ # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
37
+ !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
38
+ else:
39
+ # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
40
+ !pip install --no-deps xformers trl peft accelerate bitsandbytes
41
+ pass
42
+
43
+
44
+
45
+ from unsloth import FastLanguageModel
46
+ import torch
47
+ max_seq_length = 2048
48
+ dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
49
+ load_in_4bit = False
50
+ model, tokenizer = FastLanguageModel.from_pretrained(
51
+ model_name = "Xhaheen/Gemma_Urdu_Shaheen_1_epoch",
52
+ max_seq_length = max_seq_length,
53
+ dtype = dtype,
54
+ load_in_4bit = load_in_4bit,
55
+ device_map="auto"
56
+ )
57
+ FastLanguageModel.for_inference(model) # Enable native 2x faster inference
58
+
59
+ input_prompt = """
60
+ ### Instruction:
61
+ {}
62
+
63
+ ### Input:
64
+ {}
65
+
66
+ ### Response:
67
+ {}"""
68
+
69
+ input_text = input_prompt.format(
70
+ "دیئے گئے موضوع کے بارے میں ایک مختصر پیراگراف لکھیں۔", # instruction
71
+ "قابل تجدید توانائی کے استعمال کی اہمیت", # input
72
+ "", # output - leave this blank for generation!
73
+ )
74
+
75
+ inputs = tokenizer([input_text], return_tensors = "pt").to("cuda")
76
+
77
+ outputs = model.generate(**inputs, max_new_tokens = 300, use_cache = True)
78
+
79
+ response = tokenizer.batch_decode(outputs)
80
+
81
+ ```
82
+
83
+
84
+
85
+ # Inference With Inference with HuggingFace transformers
86
+
87
+
88
+
89
+
90
+ ```python3
91
+
92
+ from peft import AutoPeftModelForCausalLM
93
+ from transformers import AutoTokenizer
94
+
95
+ model = AutoPeftModelForCausalLM.from_pretrained(
96
+ "Xhaheen/Gemma_Urdu_Shaheen_1_epoch",
97
+ load_in_4bit = False
98
+ )
99
+ tokenizer = AutoTokenizer.from_pretrained("Xhaheen/Gemma_Urdu_Shaheen_1_epoch")
100
+
101
+
102
+ input_prompt = """
103
+ ### Instruction:
104
+ {}
105
+
106
+ ### Input:
107
+ {}
108
+
109
+ ### Response:
110
+ {}"""
111
+
112
+
113
+
114
+ input_text = input_prompt.format(
115
+ "دیئے گئے موضوع کے بارے میں ایک مختصر پیراگراف لکھیں۔", # instruction
116
+ "قابل تجدید توانائی کے استعمال کی اہمیت", # input
117
+ "", # output - leave this blank for generation!
118
+ )
119
+
120
+ inputs = tokenizer([input_text], return_tensors = "pt").to("cuda")
121
+
122
+ outputs = model.generate(**inputs, max_new_tokens = 300, use_cache = True)
123
+ response = tokenizer.batch_decode(outputs)[0]
124
+
125
+ ```
126
+
127
+
128
+
129
+
130
+
131
+
132
+
133
+
134
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)