kurumuz commited on
Commit
00b1935
·
1 Parent(s): 00c908b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - pytorch
6
+ - causal-lm
7
+ license: apache-2.0
8
+ datasets:
9
+ - the Pile
10
+ ---
11
+
12
+ # Genji-python 6B
13
+
14
+ ## Model Description
15
+
16
+ Genji is a transformer model finetuned on EleutherAI's GPT-J 6B model. This particular model is trained on python only code approaching 4GB in size.
17
+ Split model has the checkpoints splitted, which makes it use less system RAM while loading and makes it faster to load.
18
+ This model needs more effort to set up as you need to install git-lfs and pull the repo.
19
+
20
+ | Hyperparameter | Value |
21
+ |-------------------|--------|
22
+ | n_parameters | 6,053,381,344 |
23
+ | n_layers | 28* |
24
+ | d_model | 4,096 |
25
+ | d_ff | 16,384 |
26
+ | n_heads | 16 |
27
+ | d_head | 256 |
28
+ | n_ctx | 2,048 |
29
+ | n_vocab | 50,400 (same tokenizer as GPT-2/3) |
30
+ | position encoding | [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) |
31
+ | RoPE dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
32
+
33
+ `*` each layer consists of one feedforward block and one self attention block
34
+
35
+ The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
36
+ dimension is split into 16 heads, each with a dimension of 256. Rotary position encodings (RoPE) was applied to 64
37
+ dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
38
+ GPT-2/GPT-3.
39
+
40
+ ## Training data
41
+
42
+ GPT-J 6B was pretrained on the [Pile](pile.eleuther.ai), a large scale curated dataset created by EleutherAI for the purpose of training this model. After the pre-training, it's finetuned on the python code that was taken from the Pile.
43
+
44
+ ## Training procedure
45
+
46
+ Genji-python-6B is trained for 20k steps on around 655 million tokens with learning rate of 2e-06
47
+
48
+ ## Intended Use
49
+
50
+ This model is trained for assistence on writing python code and having fun trying weird stuff with it.
51
+
52
+ ### How to use
53
+
54
+ This model is only usable with our fork because GPT-J is not merged to the main transformers repo yet. When it's merged, we will make this model easily loadable.
55
+ For now, you need to use this fork:
56
+ [Fork](https://github.com/finetuneanon/transformers)
57
+
58
+ to install with pip:
59
+ ```bash
60
+ pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3-rp-b
61
+ ```
62
+
63
+ **git-lfs** also needs to be installed, on ubuntu:
64
+ ```bash
65
+ apt install git-lfs
66
+ ```
67
+
68
+ after it's installed, initialize git-lfs:
69
+ ```bash
70
+ git lfs install
71
+ ```
72
+
73
+ then clone this repo:
74
+ ```bash
75
+ git clone https://huggingface.co/NovelAI/genji-python-6B-split
76
+ ```
77
+
78
+ Now we can load the model.
79
+
80
+ We recommend the usage of the model as FP16. That way, it fits in 16GB VRAM cards.
81
+
82
+ How to use:
83
+ ```python
84
+ from transformers import (
85
+ AutoTokenizer,
86
+ AutoModelForCausalLM,
87
+ GPTNeoForCausalLM,
88
+ )
89
+
90
+ model = AutoModelForCausalLM.from_pretrained("genji-python-6B-split/model").half().eval().cuda()
91
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
92
+
93
+ text = '''def print_customer_name'''
94
+
95
+ tokens = tokenizer(text, return_tensors="pt").input_ids
96
+ generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, top_k=50, temperature=0.3, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id)
97
+ last_tokens = generated_tokens[0][len(tokens[0]):]
98
+ generated_text = tokenizer.decode(last_tokens)
99
+ print("Generation:\n" + generated_text)
100
+ ```
101
+ When ran, this code generates:
102
+ ```python
103
+ Prompt:
104
+ def print_customer_name
105
+ Generation:
106
+ (self, customer):
107
+ """Print the name of a customer."""
108
+ if not self.is_valid():
109
+ return
110
+
111
+ print("Customer: {}".format(customer))
112
+ ```
113
+
114
+ For example usage, you can see our colab notebook as well:
115
+ [Notebook](https://colab.research.google.com/drive/1PnWpx02IEUkY8jhLKd_NewUGEXahAska?usp=sharing)
116
+
117
+ ## Eval results
118
+
119
+ TBD
120
+
121
+ ## Acknowledgements
122
+
123
+ This project was possible because of the compute provided by the
124
+ [TPU Research Cloud](https://sites.research.google/trc/) and [EleutherAI](https://eleuther.ai/) for pretraining of the GPT-J 6B.
125
+
126
+ Thanks to everyone who contributed to this project:
127
+ - [Aero](https://github.com/AeroScripts)
128
+ - [Finetune](https://github.com/finetuneanon)
129
+ - [Kurumuz](https://github.com/kurumuz)