bugdaryan commited on
Commit
bfc73d2
·
1 Parent(s): 8725cd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -153
README.md CHANGED
@@ -1,196 +1,125 @@
1
  ---
2
  license: llama2
 
 
 
 
 
3
  ---
 
4
 
5
- # Model Card for Code-Llama-2-13B-instruct-text2sql
6
 
7
- <!-- Provide a quick summary of what the model is/does. -->
8
 
9
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
10
 
11
- ## Model Details
 
 
12
 
13
- ### Model Description
14
 
15
- <!-- Provide a longer summary of what this model is. -->
 
 
16
 
 
17
 
 
 
 
 
18
 
19
- - **Developed by:** [More Information Needed]
20
- - **Shared by [optional]:** [More Information Needed]
21
- - **Model type:** [More Information Needed]
22
- - **Language(s) (NLP):** [More Information Needed]
23
- - **License:** [More Information Needed]
24
- - **Finetuned from model [optional]:** [More Information Needed]
25
 
26
- ### Model Sources [optional]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- <!-- Provide the basic links for the model. -->
29
 
30
- - **Repository:** [More Information Needed]
31
- - **Paper [optional]:** [More Information Needed]
32
- - **Demo [optional]:** [More Information Needed]
33
 
34
- ## Uses
35
 
36
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
37
 
38
- ### Direct Use
39
 
40
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
41
 
42
- [More Information Needed]
 
 
43
 
44
- ### Downstream Use [optional]
45
 
46
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
47
 
48
- [More Information Needed]
49
 
50
- ### Out-of-Scope Use
51
 
52
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
53
 
54
- [More Information Needed]
55
 
56
- ## Bias, Risks, and Limitations
57
 
58
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
59
 
60
- [More Information Needed]
61
 
62
- ### Recommendations
63
 
64
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
65
 
66
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
67
 
68
- ## How to Get Started with the Model
69
 
70
- Use the code below to get started with the model.
71
 
72
- [More Information Needed]
 
 
 
 
 
 
73
 
74
- ## Training Details
75
 
76
- ### Training Data
 
77
 
78
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
79
 
80
- [More Information Needed]
81
 
82
- ### Training Procedure
83
 
84
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
85
 
86
- #### Preprocessing [optional]
 
 
87
 
88
- [More Information Needed]
89
-
90
-
91
- #### Training Hyperparameters
92
-
93
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
94
-
95
- #### Speeds, Sizes, Times [optional]
96
-
97
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
98
-
99
- [More Information Needed]
100
-
101
- ## Evaluation
102
-
103
- <!-- This section describes the evaluation protocols and provides the results. -->
104
-
105
- ### Testing Data, Factors & Metrics
106
-
107
- #### Testing Data
108
-
109
- <!-- This should link to a Data Card if possible. -->
110
-
111
- [More Information Needed]
112
-
113
- #### Factors
114
-
115
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
116
-
117
- [More Information Needed]
118
-
119
- #### Metrics
120
-
121
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
122
-
123
- [More Information Needed]
124
-
125
- ### Results
126
-
127
- [More Information Needed]
128
-
129
- #### Summary
130
-
131
-
132
-
133
- ## Model Examination [optional]
134
-
135
- <!-- Relevant interpretability work for the model goes here -->
136
-
137
- [More Information Needed]
138
-
139
- ## Environmental Impact
140
-
141
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
142
-
143
-
144
- - **Hardware Type:** [More Information Needed]
145
- - **Hours used:** [More Information Needed]
146
- - **Cloud Provider:** [More Information Needed]
147
- - **Compute Region:** [More Information Needed]
148
- - **Carbon Emitted:** [More Information Needed]
149
-
150
- ## Technical Specifications [optional]
151
-
152
- ### Model Architecture and Objective
153
-
154
- [More Information Needed]
155
-
156
- ### Compute Infrastructure
157
-
158
- [More Information Needed]
159
-
160
- #### Hardware
161
-
162
- [More Information Needed]
163
-
164
- #### Software
165
-
166
- [More Information Needed]
167
-
168
- ## Citation [optional]
169
-
170
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
171
-
172
- **BibTeX:**
173
-
174
- [More Information Needed]
175
-
176
- **APA:**
177
-
178
- [More Information Needed]
179
-
180
- ## Glossary [optional]
181
-
182
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
183
-
184
- [More Information Needed]
185
-
186
- ## More Information [optional]
187
-
188
- [More Information Needed]
189
-
190
- ## Model Card Authors [optional]
191
-
192
- [More Information Needed]
193
-
194
- ## Model Card Contact
195
-
196
- [More Information Needed]
 
1
  ---
2
  license: llama2
3
+ datasets:
4
+ - bugdaryan/sql-create-context-instruction
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
  ---
9
+ # **Code-Llama-2-13B-instruct-text2sql Model Card**
10
 
11
+ **Model Name**: Code-Llama-2-13B-instruct-text2sql
12
 
13
+ **Description**: This model is a fine-tuned version of the Code Llama 2 with 13 billion parameters, specifically tailored for text-to-SQL tasks. It has been trained to generate SQL queries given a database schema and a natural language question.
14
 
15
+ ## Model Information
16
 
17
+ - **Base Model**: [codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf)
18
+ - **Finetuning Dataset**: [bugdaryan/sql-create-context-instruction](Dataset URL)
19
+ - **Training Time**: Approximately 4 hours on 2 V100 32GB GPUs
20
 
21
+ ## LoRA Parameters
22
 
23
+ - **lora_r**: 64
24
+ - **lora_alpha**: 16
25
+ - **lora_dropout**: 0.1
26
 
27
+ ## bitsandbytes Parameters
28
 
29
+ - **use_4bit**: True
30
+ - **bnb_4bit_compute_dtype**: float16
31
+ - **bnb_4bit_quant_type**: nf4
32
+ - **use_nested_quant**: False
33
 
34
+ ## Training Parameters
 
 
 
 
 
35
 
36
+ - **Output Directory**: ./results
37
+ - **Number of Training Epochs**: 1
38
+ - **Mixed-Precision Training (fp16/bf16)**: False
39
+ - **Batch Size per GPU for Training**: 4
40
+ - **Batch Size per GPU for Evaluation**: 4
41
+ - **Gradient Accumulation Steps**: 1
42
+ - **Gradient Checkpointing**: True
43
+ - **Maximum Gradient Norm (Gradient Clipping)**: 0.3
44
+ - **Initial Learning Rate**: 2e-4
45
+ - **Weight Decay**: 0.001
46
+ - **Optimizer**: paged_adamw_32bit
47
+ - **Learning Rate Scheduler Type**: cosine
48
+ - **Max Steps**: -1
49
+ - **Warmup Ratio**: 0.03
50
+ - **Group Sequences by Length**: True
51
+ - **Save Checkpoint Every X Update Steps**: 0
52
+ - **Log Every X Update Steps**: 25
53
 
54
+ ## License
55
 
56
+ This model is governed by a custom commercial license from Code Llama. For details, please visit: [Custom Commercial License](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
 
 
57
 
58
+ ## Intended Use
59
 
60
+ **Intended Use Cases**: This model is intended for commercial and research use in English. It is designed for text-to-SQL tasks, enabling users to generate SQL queries from natural language questions.
61
 
62
+ **Out-of-Scope Uses**: Any use that violates applicable laws or regulations, use in languages other than English, or any other use prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.
63
 
64
+ ## Model Capabilities
65
 
66
+ - Code completion.
67
+ - Infilling.
68
+ - Instructions / chat.
69
 
70
+ ## Model Architecture
71
 
72
+ Code-Llama-2-13B-instruct-text2sql is an auto-regressive language model that uses an optimized transformer architecture.
73
 
74
+ ## Model Dates
75
 
76
+ This model was trained between January 2023 and July 2023.
77
 
78
+ ## Ethical Considerations and Limitations
79
 
80
+ Code-Llama-2-13B-instruct-text2sql is a powerful language model, but it may produce inaccurate or objectionable responses in some instances. Safety testing and tuning are recommended before deploying this model in specific applications.
81
 
82
+ ## Hardware and Software
83
 
84
+ - **Training Libraries**: Custom training libraries
85
+ - **Training Hardware**: 2 V100 32GB GPUs
86
+ - **Carbon Footprint**: Training all Code Llama models required 400K GPU hours on A100-80GB hardware with emissions offset by Meta's sustainability program.
87
 
88
+ ## Training Data
89
 
90
+ This model was trained and fine-tuned on the same data as Llama 2 with different weights.
91
 
92
+ ## Evaluation Results
93
 
94
+ For evaluation results, please refer to Section 3 and safety evaluations in Section 4 of the research paper.
95
 
96
+ ## Example Code
97
 
98
+ You can use the Code-Llama-2-13B-instruct-text2sql model to generate SQL queries from natural language questions, as demonstrated in the following code snippet:
99
 
100
+ ```python
101
+ from transformers import (
102
+ AutoModelForCausalLM,
103
+ AutoTokenizer,
104
+ pipeline
105
+ )
106
+ import torch
107
 
108
+ model_name = 'bugdaryan/Code-Llama-2-13B-instruct-text2sql'
109
 
110
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')
111
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
112
 
113
+ pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)
114
 
115
+ table = "CREATE TABLE sales ( sale_id number PRIMARY KEY, product_id number, customer_id number, salesperson_id number, sale_date DATE, quantity number, FOREIGN KEY (product_id) REFERENCES products(product_id), FOREIGN KEY (customer_id) REFERENCES customers(customer_id), FOREIGN KEY (salesperson_id) REFERENCES salespeople(salesperson_id)); CREATE TABLE product_suppliers ( supplier_id number PRIMARY KEY, product_id number, supply_price number, FOREIGN KEY (product_id) REFERENCES products(product_id)); CREATE TABLE customers ( customer_id number PRIMARY KEY, name text, address text ); CREATE TABLE salespeople ( salesperson_id number PRIMARY KEY, name text, region text ); CREATE TABLE product_suppliers ( supplier_id number PRIMARY KEY, product_id number, supply_price number );"
116
 
117
+ question = 'Find the salesperson who made the most sales.'
118
 
119
+ prompt = f"[INST] Write SQLite query to answer the following question given the database schema. Please wrap your code answer using ```: Schema: {table} Question: {question} [/INST] Here is the SQLite query to answer to the question: {question}: ``` "
120
 
121
+ ans = pipe(prompt, max_new_tokens=100)
122
+ print(ans[0]['generated_text'].split('```')[2])
123
+ ```
124
 
125
+ This code demonstrates how to utilize the model for generating SQL queries based on a provided database schema and a natural language question. It showcases the model's capability to assist in SQL query generation for text-to-SQL tasks.