amanrangapur
commited on
Commit
•
0836bb8
1
Parent(s):
0de69f3
Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ For faster performance, you can quantize the model using the following method:
|
|
56 |
```python
|
57 |
AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B",
|
58 |
torch_dtype=torch.float16,
|
59 |
-
load_in_8bit=True) # Requires bitsandbytes
|
60 |
```
|
61 |
The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
|
62 |
```python
|
|
|
56 |
```python
|
57 |
AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B",
|
58 |
torch_dtype=torch.float16,
|
59 |
+
load_in_8bit=True) # Requires bitsandbytes
|
60 |
```
|
61 |
The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
|
62 |
```python
|