aashish1904 commited on
Commit
205120d
·
verified ·
1 Parent(s): d9a8268

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +443 -0
README.md ADDED
@@ -0,0 +1,443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ base_model: google/gemma-2-9b-it
5
+ datasets:
6
+ - DiTy/function-calling
7
+ language:
8
+ - en
9
+ library_name: transformers
10
+ license: apache-2.0
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - conversational
14
+ - gemma2
15
+ - function-calling
16
+ - trl
17
+
18
+ ---
19
+
20
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
21
+
22
+
23
+ # QuantFactory/gemma-2-9b-it-function-calling-GGUF-GGUF
24
+ This is quantized version of [DiTy/gemma-2-9b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF) created using llama.cpp
25
+
26
+ # Original Model Card
27
+
28
+
29
+ # DiTy/gemma-2-9b-it-function-calling-GGUF
30
+
31
+ This model is a fine-tuned version of [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) for the **Function Calling** task on non-synthetic data,
32
+ fully annotated by humans only, on the English version of the <ins>*DiTy/function-calling*</ins> dataset.
33
+ <!-- Provide a quick summary of what the model is/does. -->
34
+
35
+ > [!NOTE]
36
+ > NB: This model has a fairly high quality, but you might want to try a big guy [DiTy/gemma-2-27b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-27b-it-function-calling-GGUF).
37
+
38
+ In addition to **safetensors**, the model is available in **GGUF** formats (in this case, you need to download only a single file (*[how to inference GGUF model](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#high-level-api)*)):
39
+
40
+ | Filename | Quant type | File Size | Description |
41
+ | -------- | ---------- | --------- | ----------- |
42
+ | [gemma-2-9B-it-function-calling-F16.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-F16.gguf) | F16 | 18.5GB | Base model with float16 |
43
+ | [gemma-2-9B-it-function-calling-Q8_0.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q8_0.gguf) | Q8_0 | 9.83GB | Extremely high quality, generally unneeded but max available quant. |
44
+ | [gemma-2-9B-it-function-calling-Q6_K.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q6_K.gguf) | Q6_K | 7.59GB | Very high quality, near perfect, *recommended*. |
45
+ | [gemma-2-9B-it-function-calling-Q5_K_M.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q5_K_M.gguf) | Q5_K_M | 6.65GB | High quality, very usable. |
46
+ | [gemma-2-9B-it-function-calling-Q5_K_S.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q5_K_S.gguf) | Q5_K_S | 6.48GB | High quality, very usable. |
47
+
48
+
49
+ ## Model card tree
50
+
51
+ * [How prepare your functions (tools) for *Function Calling*](#prepare_func_call)
52
+ * [Just use chat template for generation](#just_chat_template)
53
+ * [Prompt structure and expected content](#roles)
54
+ * [Evaluation of function calling models](#eval)
55
+
56
+ ## Usage (HuggingFace Transformers)
57
+
58
+ Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
59
+ ```bash
60
+ pip install -U transformers
61
+ ```
62
+
63
+ ### <a name="prepare_func_call"></a>Prepare your functions for *Function Calling*
64
+
65
+ You should write the functions (tools) used by the model in *Python code* and make sure to add *Python docstrings* as in the example below:
66
+ ```python
67
+ def get_weather(city: str):
68
+ """
69
+ A function that returns the weather in a given city.
70
+
71
+ Args:
72
+ city: The city to get the weather for.
73
+ """
74
+ import random
75
+
76
+ return "sunny" if random.random() > 0.5 else "rainy"
77
+
78
+
79
+ def get_sunrise_sunset_times(city: str):
80
+ """
81
+ A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].
82
+
83
+ Args:
84
+ city: The city to get the sunrise and sunset times for.
85
+ """
86
+
87
+ return ["6:00 AM", "6:00 PM"]
88
+ ```
89
+
90
+ ### <a name="just_chat_template"></a>Just use chat template
91
+
92
+ Next, you need to download the model and tokenizer:
93
+ ```python
94
+ import torch
95
+ from transformers import AutoTokenizer, AutoModelForCausalLM
96
+
97
+ model = AutoModelForCausalLM.from_pretrained(
98
+ "DiTy/gemma-2-9b-it-function-calling-GGUF",
99
+ device_map="auto",
100
+ torch_dtype=torch.bfloat16, # use float16 or float32 if bfloat16 is not available to you.
101
+ cache_dir=PATH_TO_MODEL_DIR, # optional
102
+ )
103
+ tokenizer = AutoTokenizer.from_pretrained(
104
+ "DiTy/gemma-2-9b-it-function-calling-GGUF",
105
+ cache_dir=PATH_TO_MODEL_DIR, # optional
106
+ )
107
+ ```
108
+
109
+ To get the result of generation, just use `apply_chat_template`. In order to take into account our written functions (tools),
110
+ we need to pass them as a list through the `tools` attribute and also use `add_prompt_generation=True`.
111
+ ```python
112
+ history_messages = [
113
+ {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
114
+ {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
115
+ ]
116
+
117
+ inputs = tokenizer.apply_chat_template(
118
+ history_messages,
119
+ tokenize=False,
120
+ add_generation_prompt=True, # adding prompt for generation
121
+ tools=[get_weather, get_sunrise_sunset_times], # our functions (tools)
122
+ )
123
+
124
+ print(inputs)
125
+ ```
126
+
127
+ Then our `inputs` will look like this:
128
+ ```
129
+ <bos><start_of_turn>user
130
+ You are a helpful assistant with access to the following functions. Use them if required - {
131
+ "name": "get_weather",
132
+ "description": "A function that returns the weather in a given city.",
133
+ "parameters": {
134
+ "type": "object",
135
+ "properties": {
136
+ "city": {
137
+ "type": "string",
138
+ "description": "The city to get the weather for."
139
+ }
140
+ },
141
+ "required": [
142
+ "city"
143
+ ]
144
+ }
145
+ },
146
+ {
147
+ "name": "get_sunrise_sunset_times",
148
+ "description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].",
149
+ "parameters": {
150
+ "type": "object",
151
+ "properties": {
152
+ "city": {
153
+ "type": "string",
154
+ "description": "The city to get the sunrise and sunset times for."
155
+ }
156
+ },
157
+ "required": [
158
+ "city"
159
+ ]
160
+ }
161
+ }
162
+
163
+ Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn>
164
+ <start_of_turn>model
165
+
166
+ ```
167
+
168
+ Now we can generate a model's response.
169
+ Be careful because, after `apply_chat_template`, there is no need to *add special tokens* during tokenization. So, use `add_special_tokens=False`:
170
+ ```python
171
+ terminator_ids = [
172
+ tokenizer.eos_token_id,
173
+ tokenizer.convert_tokens_to_ids("<end_of_turn>"),
174
+ ]
175
+
176
+ prompt_ids = tokenizer.encode(inputs, add_special_tokens=False, return_tensors='pt').to(model.device)
177
+ generated_ids = model.generate(
178
+ prompt_ids,
179
+ max_new_tokens=512,
180
+ eos_token_id=terminator_ids,
181
+ bos_token_id=tokenizer.bos_token_id,
182
+ )
183
+ generated_response = tokenizer.decode(generated_ids[0][prompt_ids.shape[-1]:], skip_special_tokens=False) # `skip_special_tokens=False` for debug
184
+
185
+ print(generated_response)
186
+ ```
187
+
188
+ We get the generation as a function call:
189
+ ```
190
+ Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn>
191
+ ```
192
+
193
+ Great, now we can pick up and process the results with our *called function*, and then provide the model with the *function's response*:
194
+ ```python
195
+ history_messages = [
196
+ {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
197
+ {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
198
+ {"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'},
199
+ {"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'}, # a hypothetical response from our function
200
+ ]
201
+
202
+ inputs = tokenizer.apply_chat_template(
203
+ history_messages,
204
+ tokenize=False,
205
+ add_generation_prompt=True, # adding prompt for generation
206
+ tools=[get_weather, get_sunrise_sunset_times], # our functions (tools)
207
+ )
208
+
209
+ print(inputs)
210
+ ```
211
+
212
+ Let's make sure the `inputs` are correct:
213
+ ```
214
+ <bos><start_of_turn>user
215
+ You are a helpful assistant with access to the following functions. Use them if required - {
216
+ "name": "get_weather",
217
+ "description": "A function that returns the weather in a given city.",
218
+ "parameters": {
219
+ "type": "object",
220
+ "properties": {
221
+ "city": {
222
+ "type": "string",
223
+ "description": "The city to get the weather for."
224
+ }
225
+ },
226
+ "required": [
227
+ "city"
228
+ ]
229
+ }
230
+ },
231
+ {
232
+ "name": "get_sunrise_sunset_times",
233
+ "description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].",
234
+ "parameters": {
235
+ "type": "object",
236
+ "properties": {
237
+ "city": {
238
+ "type": "string",
239
+ "description": "The city to get the sunrise and sunset times for."
240
+ }
241
+ },
242
+ "required": [
243
+ "city"
244
+ ]
245
+ }
246
+ }
247
+
248
+ Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn>
249
+ <start_of_turn>model
250
+ Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn>
251
+ <start_of_turn>user
252
+ Function response: {"times_list": ["6:00 AM", "6:00 PM"]}<end_of_turn>
253
+ <start_of_turn>model
254
+
255
+ ```
256
+
257
+ Similarly, we generate a response from the model:
258
+ ```python
259
+ prompt_ids = tokenizer.encode(inputs, add_special_tokens=False, return_tensors='pt').to(model.device)
260
+ generated_ids = model.generate(
261
+ prompt_ids,
262
+ max_new_tokens=512,
263
+ eos_token_id=terminator_ids,
264
+ bos_token_id=tokenizer.bos_token_id,
265
+ )
266
+ generated_response = tokenizer.decode(generated_ids[0][prompt_ids.shape[-1]:], skip_special_tokens=False) # `skip_special_tokens=False` for debug
267
+
268
+ print(generated_response)
269
+ ```
270
+
271
+ As a result, we get the model's response:
272
+ ```
273
+ The sunrise time in Los Angeles is 6:00 AM.<end_of_turn>
274
+ ```
275
+
276
+ ## Usage via transformers `pipeline`
277
+
278
+ <details>
279
+ <summary>
280
+ Generation via pipeline
281
+ </summary>
282
+
283
+ ```python
284
+ from transformers import pipeline
285
+
286
+
287
+ generation_pipeline = pipeline(
288
+ "text-generation",
289
+ model="DiTy/gemma-2-9b-it-function-calling-GGUF",
290
+ model_kwargs={
291
+ "torch_dtype": torch.bfloat16, # use float16 or float32 if bfloat16 is not supported for you.
292
+ "cache_dir": PATH_TO_MODEL_DIR, # OPTIONAL
293
+ },
294
+ device_map="auto",
295
+ )
296
+
297
+ history_messages = [
298
+ {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
299
+ {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
300
+ {"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'},
301
+ {"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'},
302
+ ]
303
+
304
+ inputs = generation_pipeline.tokenizer.apply_chat_template(
305
+ history_messages,
306
+ tokenize=False,
307
+ add_generation_prompt=True,
308
+ tools=[get_weather, get_sunrise_sunset_times],
309
+ )
310
+
311
+ terminator_ids = [
312
+ generation_pipeline.tokenizer.eos_token_id,
313
+ generation_pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
314
+ ]
315
+
316
+ outputs = generation_pipeline(
317
+ inputs,
318
+ max_new_tokens=512,
319
+ eos_token_id=terminator_ids,
320
+ )
321
+
322
+ print(outputs[0]["generated_text"][len(inputs):])
323
+ ```
324
+
325
+ </details>
326
+
327
+ ## <a name="roles"></a>Prompt structure and expected content
328
+
329
+ For the most correct operation of the model, it is assumed that `apply_chat_template` will be used.
330
+ It is necessary to transmit the message history in a certain format.
331
+ ```python
332
+ history_messages = [
333
+ {"role": "...", "content": "..."},
334
+ ...
335
+ ]
336
+ ```
337
+
338
+ The following roles are available for use:
339
+
340
+ * `system` - an optional role, its content is always placed at the very beginning and before listing the functions available to the model (tools).
341
+ You can always use the standard option that was used during the training: ***"You are a helpful assistant with access to the following functions. Use them if required - "***
342
+ * `user` - the user's request is transmitted through this role.
343
+ * `function-call` - The body of the function call is passed through this role.
344
+ Although the model is trained to generate a function call in the form of ***"Function call: {...}\<end_of_turn\>"***, you should still pass only the body ***"{...}"***
345
+ to the *"content"* field, since using `apply_chat_template`, the postscript in the instructions is added automatically.
346
+ * `function-response` - in this role, we must pass the response of our function in the *"content"* field as a dictionary ***'{"name_returnable_value": value}'***.
347
+ * `model` - the content under this role is considered to be the generated text of the model.
348
+
349
+ ### Chat history with *Function Calling*
350
+
351
+ ```
352
+ [
353
+ {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
354
+ {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
355
+ {"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'},
356
+ {"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'},
357
+ ]
358
+ ```
359
+
360
+ It looks like:
361
+ ```
362
+ <bos><start_of_turn>user
363
+ You are a helpful assistant with access to the following functions. Use them if required - {
364
+ "name": "get_weather",
365
+ "description": "A function that returns the weather in a given city.",
366
+ "parameters": {
367
+ "type": "object",
368
+ "properties": {
369
+ "city": {
370
+ "type": "string",
371
+ "description": "The city to get the weather for."
372
+ }
373
+ },
374
+ "required": [
375
+ "city"
376
+ ]
377
+ }
378
+ },
379
+ {
380
+ "name": "get_sunrise_sunset_times",
381
+ "description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].",
382
+ "parameters": {
383
+ "type": "object",
384
+ "properties": {
385
+ "city": {
386
+ "type": "string",
387
+ "description": "The city to get the sunrise and sunset times for."
388
+ }
389
+ },
390
+ "required": [
391
+ "city"
392
+ ]
393
+ }
394
+ }
395
+
396
+ Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn>
397
+ <start_of_turn>model
398
+ Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn>
399
+ <start_of_turn>user
400
+ Function response: {"times_list": ["6:00 AM", "6:00 PM"]}<end_of_turn>
401
+ ```
402
+
403
+
404
+ ### Chat history with a standard user-model template
405
+
406
+ ```
407
+ [
408
+ {"role": "system", "content": "You are a helpful assistant"},
409
+ {"role": "user", "content": "Tell me about California"},
410
+ ]
411
+ ```
412
+
413
+ It looks like:
414
+ ```
415
+ <bos><start_of_turn>user
416
+ You are a helpful assistant
417
+
418
+ Tell me about California<end_of_turn>
419
+ ```
420
+
421
+ ## <a name="eval"></a>Evaluation
422
+
423
+ During the learning process, the validation error was approximated to the following values:
424
+
425
+ | **Model** | **Generation Language** | **Approximately Validation Loss** |
426
+ | :-----: | :-----: | :-----: |
427
+ | [DiTy/gemma-2-27b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-27b-it-function-calling-GGUF) | EN | 0.47 |
428
+ | [DiTy/gemma-2-9b-it-russian-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-9b-it-russian-function-calling-GGUF) | RU | 0.57 |
429
+ | [**DiTy/gemma-2-9b-it-function-calling-GGUF**](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF) | **EN** | **0.5** |
430
+ | [DiTy/gemma-2-2b-it-function-calling](https://huggingface.co/DiTy/gemma-2-2b-it-function-calling) | EN | 0.66 |
431
+
432
+ ## Citation
433
+
434
+ ```none
435
+ @article{gemma_2024,
436
+ title={Gemma},
437
+ url={https://www.kaggle.com/m/3301},
438
+ DOI={10.34740/KAGGLE/M/3301},
439
+ publisher={Kaggle},
440
+ author={Gemma Team},
441
+ year={2024}
442
+ }
443
+ ```