chrisc36 commited on
Commit
35f1df0
·
verified ·
1 Parent(s): b14fbed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -98,16 +98,20 @@ print(generated_text)
98
 
99
  To make inference more efficient, run with autocast:
100
 
 
101
  with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
102
  output = model.generate_from_batch(
103
  inputs,
104
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
105
  tokenizer=processor.tokenizer
106
  )
 
 
107
  We did most of our evaluations in this setting (autocast on, but float32 weights)
108
 
109
  To even further reduce the memory requirements, the model can be run with bfloat16 weights:
110
 
 
111
  model.to(dtype=torch.bfloat16)
112
  inputs["images"] = inputs["images"].to(torch.bfloat16)
113
  output = model.generate_from_batch(
@@ -115,6 +119,8 @@ output = model.generate_from_batch(
115
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
116
  tokenizer=processor.tokenizer
117
  )
 
 
118
  Note that this can sometimes change the output of the model compared to running with float32 weights.
119
 
120
 
 
98
 
99
  To make inference more efficient, run with autocast:
100
 
101
+ ```python
102
  with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
103
  output = model.generate_from_batch(
104
  inputs,
105
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
106
  tokenizer=processor.tokenizer
107
  )
108
+ ```
109
+
110
  We did most of our evaluations in this setting (autocast on, but float32 weights)
111
 
112
  To even further reduce the memory requirements, the model can be run with bfloat16 weights:
113
 
114
+ ```python
115
  model.to(dtype=torch.bfloat16)
116
  inputs["images"] = inputs["images"].to(torch.bfloat16)
117
  output = model.generate_from_batch(
 
119
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
120
  tokenizer=processor.tokenizer
121
  )
122
+ ```
123
+
124
  Note that this can sometimes change the output of the model compared to running with float32 weights.
125
 
126