chrisc36 commited on
Commit
b14fbed
·
verified ·
1 Parent(s): 9eb32aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -96,6 +96,28 @@ print(generated_text)
96
  # wooden deck. The deck's planks, which are a mix of light and dark brown with ...
97
  ```
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ## Evaluations
100
 
101
  | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |
 
96
  # wooden deck. The deck's planks, which are a mix of light and dark brown with ...
97
  ```
98
 
99
+ To make inference more efficient, run with autocast:
100
+
101
+ with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
102
+ output = model.generate_from_batch(
103
+ inputs,
104
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
105
+ tokenizer=processor.tokenizer
106
+ )
107
+ We did most of our evaluations in this setting (autocast on, but float32 weights)
108
+
109
+ To even further reduce the memory requirements, the model can be run with bfloat16 weights:
110
+
111
+ model.to(dtype=torch.bfloat16)
112
+ inputs["images"] = inputs["images"].to(torch.bfloat16)
113
+ output = model.generate_from_batch(
114
+ inputs,
115
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
116
+ tokenizer=processor.tokenizer
117
+ )
118
+ Note that this can sometimes change the output of the model compared to running with float32 weights.
119
+
120
+
121
  ## Evaluations
122
 
123
  | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |