joanllop commited on
Commit
bd317e4
·
verified ·
1 Parent(s): 13c5593

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -1
README.md CHANGED
@@ -134,7 +134,141 @@ The accelerated partition is composed of 1,120 nodes with the following specific
134
 
135
  ## How to use
136
 
137
- <span style="color:red">TODO</span>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
  ---
140
 
 
134
 
135
  ## How to use
136
 
137
+ ### Inference
138
+ This section covers different methods for running inference, including utilizing Huggingface's Text Generation Pipeline, multi-GPU setups, and vLLM for efficient and scalable generation. Each approach is accompanied by step-by-step instructions to ensure a smooth setup.
139
+
140
+ #### Inference with Huggingface's Text Generation Pipeline
141
+ The Huggingface Text Generation Pipeline provides a simple and straightforward way to run inference using the Salamandra-7b model.
142
+
143
+ ```bash
144
+ pip install -U transformers
145
+ ```
146
+ <details>
147
+ <summary>Show code</summary>
148
+
149
+ ```python
150
+ from transformers import pipeline, set_seed
151
+
152
+ model_id = "projecte-aina/salamandra-7b"
153
+
154
+ # Sample prompts
155
+ prompts = [
156
+ "Las fiestas de San Isidro Labrador de Yecla son",
157
+ "El punt més alt del Parc Natural del Montseny és",
158
+ "Sentence in English: The typical chance of such a storm is around 10%. Sentence in Catalan:",
159
+ "Si le monde était clair",
160
+ "The future of AI is",
161
+ ]
162
+
163
+ # Create the pipeline
164
+ generator = pipeline("text-generation", model_id, device_map="auto")
165
+ generation_args = {
166
+ "temperature": 0.1,
167
+ "top_p": 0.95,
168
+ "max_new_tokens": 25,
169
+ "repetition_penalty": 1.2,
170
+ "do_sample": True
171
+ }
172
+
173
+ # Fix the seed
174
+ set_seed(1)
175
+ # Generate texts
176
+ outputs = generator(prompts, **generation_args)
177
+ # Print outputs
178
+ for output in outputs:
179
+ print(output[0]["generated_text"])
180
+
181
+ ```
182
+ </details>
183
+
184
+ #### Inference with single / multi GPU
185
+ Inference code for Huggingface’s AutoModel.
186
+
187
+ ```bash
188
+ pip install transformers torch accelerate sentencepiece protobuf
189
+ ```
190
+
191
+ <details>
192
+ <summary>Show code</summary>
193
+
194
+ ```python
195
+ from transformers import AutoTokenizer, AutoModelForCausalLM
196
+ import torch
197
+
198
+ model_id = "projecte-aina/salamandra-7b"
199
+
200
+ # Input text
201
+ text = "El mercat del barri és"
202
+
203
+ # Load the tokenizer
204
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
205
+ # Load the model
206
+ model = AutoModelForCausalLM.from_pretrained(
207
+ model_id,
208
+ device_map="auto",
209
+ torch_dtype=torch.bfloat16
210
+ )
211
+
212
+ generation_args = {
213
+ "temperature": 0.1,
214
+ "top_p": 0.95,
215
+ "max_new_tokens": 25,
216
+ "repetition_penalty": 1.2,
217
+ "do_sample": True
218
+ }
219
+
220
+ inputs = tokenizer(text, return_tensors="pt")
221
+ # Generate texts
222
+ output = model.generate(input_ids=inputs["input_ids"].to(model.device), attention_mask=inputs["attention_mask"], **generation_args)
223
+ # Print outputs
224
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
225
+ ```
226
+
227
+ </details>
228
+
229
+ #### Inference with vLLM
230
+ vLLM is an efficient library for inference that enables faster and more scalable text generation.
231
+
232
+ ```bash
233
+ pip install vllm
234
+ ```
235
+
236
+ <details>
237
+ <summary>Show code</summary>
238
+
239
+ ```python
240
+ from vllm import LLM, SamplingParams
241
+
242
+ model_id = "projecte-aina/salamandra-7b"
243
+
244
+ # Sample prompts
245
+ prompts = [
246
+ "Las fiestas de San Isidro Labrador de Yecla son",
247
+ "El punt més alt del Parc Natural del Montseny és",
248
+ "Sentence in English: The typical chance of such a storm is around 10%. Sentence in Catalan:",
249
+ "Si le monde était clair",
250
+ "The future of AI is",
251
+ ]
252
+ # Create a sampling params object
253
+ sampling_params = SamplingParams(
254
+ temperature=0.1,
255
+ top_p=0.95,
256
+ seed=1,
257
+ max_tokens=25,
258
+ repetition_penalty=1.2)
259
+
260
+ # Create an LLM
261
+ llm = LLM(model=model_id)
262
+ # Generate texts
263
+ outputs = llm.generate(prompts, sampling_params)
264
+ # Print outputs
265
+ for output in outputs:
266
+ prompt = output.prompt
267
+ generated_text = output.outputs[0].text
268
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
269
+ ```
270
+
271
+ </details>
272
 
273
  ---
274