Xhaheen
/

Shaheen_Gemma_Urdu_

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Xhaheen commited on Mar 15, 2024

Commit

457334e

·

verified ·

1 Parent(s): aeb4b57

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -24,10 +24,13 @@ This gemma model was trained 2x faster with [Unsloth](https://github.com/unsloth
 # Inference With Unsloth on colab
-%%capture
 import torch
 major_version, minor_version = torch.cuda.get_device_capability()
-# Must install separately since Colab has torch 2.2.1, which breaks packages
 !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
 if major_version >= 8:
     # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
@@ -72,9 +75,10 @@ input_text = input_prompt.format(
 inputs = tokenizer([input_text], return_tensors = "pt").to("cuda")
 outputs = model.generate(**inputs, max_new_tokens = 300, use_cache = True)
-response = tokenizer.batch_decode(outputs)

 # Inference With Unsloth on colab
+```python3
 import torch
 major_version, minor_version = torch.cuda.get_device_capability()
 !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
 if major_version >= 8:
     # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
 inputs = tokenizer([input_text], return_tensors = "pt").to("cuda")
 outputs = model.generate(**inputs, max_new_tokens = 300, use_cache = True)
+response = tokenizer.batch_decode(outputs)
+```