edumunozsala commited on
Commit
43d3864
·
1 Parent(s): 7b69d18

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ - code
5
+ - coding
6
+ - llama-2
7
+ model-index:
8
+ - name: Llama-2-7b-4bit-python-coder
9
+ results: []
10
+ license: apache-2.0
11
+ language:
12
+ - code
13
+ datasets:
14
+ - iamtarun/python_code_instructions_18k_alpaca
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+
19
+ # LlaMa 2 7b 4-bit Python Coder 👩‍💻
20
+
21
+ **LlaMa-2 7b** fine-tuned on the **CodeAlpaca 20k instructions dataset** by using the method **QLoRA** in 4-bit with [PEFT](https://github.com/huggingface/peft) library.
22
+
23
+ ## Pretrained description
24
+
25
+ [Llama-2](https://huggingface.co/meta-llama/Llama-2-7b)
26
+
27
+ Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
28
+
29
+ Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety
30
+
31
+ ## Training data
32
+
33
+ [python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca)
34
+
35
+ The dataset contains problem descriptions and code in python language. This dataset is taken from sahil2801/code_instructions_120k, which adds a prompt column in alpaca style.
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following `bitsandbytes` quantization config was used during training:
40
+ - load_in_8bit: False
41
+ - load_in_4bit: True
42
+ - llm_int8_threshold: 6.0
43
+ - llm_int8_skip_modules: None
44
+ - llm_int8_enable_fp32_cpu_offload: False
45
+ - llm_int8_has_fp16_weight: False
46
+ - bnb_4bit_quant_type: nf4
47
+ - bnb_4bit_use_double_quant: False
48
+ - bnb_4bit_compute_dtype: float16
49
+
50
+ **SFTTrainer arguments**
51
+ ```py
52
+ # Number of training epochs
53
+ num_train_epochs = 1
54
+ # Enable fp16/bf16 training (set bf16 to True with an A100)
55
+ fp16 = False
56
+ bf16 = True
57
+ # Batch size per GPU for training
58
+ per_device_train_batch_size = 4
59
+ # Number of update steps to accumulate the gradients for
60
+ gradient_accumulation_steps = 1
61
+ # Enable gradient checkpointing
62
+ gradient_checkpointing = True
63
+ # Maximum gradient normal (gradient clipping)
64
+ max_grad_norm = 0.3
65
+ # Initial learning rate (AdamW optimizer)
66
+ learning_rate = 2e-4
67
+ # Weight decay to apply to all layers except bias/LayerNorm weights
68
+ weight_decay = 0.001
69
+ # Optimizer to use
70
+ optim = "paged_adamw_32bit"
71
+ # Learning rate schedule
72
+ lr_scheduler_type = "cosine" #"constant"
73
+ # Ratio of steps for a linear warmup (from 0 to learning rate)
74
+ warmup_ratio = 0.03
75
+ ```
76
+ ### Framework versions
77
+ - PEFT 0.4.0
78
+
79
+ ### Example of usage
80
+ ```py
81
+ import torch
82
+ from transformers import AutoModelForCausalLM, AutoTokenizer
83
+
84
+ model_id = "mrm8488/llama-2-coder-7b"
85
+
86
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
87
+
88
+ model = AutoModelForCausalLM.from_pretrained(model_id).to("cuda")
89
+
90
+ sample = dataset[randrange(len(dataset))]
91
+
92
+ prompt = f"""### Instruction:
93
+ Use the Task below and the Input given to write the Response, which is a programming code that can solve the Task.
94
+
95
+ ### Task:
96
+ {sample['instruction']}
97
+
98
+ ### Input:
99
+ {sample['input']}
100
+
101
+ ### Response:
102
+ """
103
+
104
+ input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
105
+ # with torch.inference_mode():
106
+ outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5)
107
+
108
+ print(f"Prompt:\n{prompt}\n")
109
+ print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
110
+ print(f"Ground truth:\n{sample['output']}")
111
+
112
+ ```
113
+
114
+ ### Citation
115
+
116
+ ```
117
+ @misc {edumunozsala_2023,
118
+ author = { {Eduardo Muñoz} },
119
+ title = { llama-2-7b-int4-python-coder (Revision d30d193) },
120
+ year = 2023,
121
+ url = { https://huggingface.co/edumunozsala/llama-2-7b-int4-python-18k-alpaca },
122
+ doi = { 10.57967/hf/0931 },
123
+ publisher = { Hugging Face }
124
+ }
125
+ ```