SivilTaram
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2 |
license: mit
|
3 |
datasets:
|
4 |
- sail/regmix-data
|
|
|
5 |
language:
|
6 |
- en
|
7 |
---
|
@@ -9,6 +10,29 @@ language:
|
|
9 |
|
10 |
# Models Trained with Random Mixture
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
## How to Load a Model
|
13 |
|
14 |
You can load any model using the corresponding branch with the Hugging Face Transformers library:
|
@@ -20,7 +44,6 @@ model = AutoModel.from_pretrained("sail/data-mixture-random-1b", revision="model
|
|
20 |
tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
|
21 |
```
|
22 |
|
23 |
-
|
24 |
## Data Mixture
|
25 |
|
26 |
The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.
|
|
|
2 |
license: mit
|
3 |
datasets:
|
4 |
- sail/regmix-data
|
5 |
+
- sail/regmix-data-sample
|
6 |
language:
|
7 |
- en
|
8 |
---
|
|
|
10 |
|
11 |
# Models Trained with Random Mixture
|
12 |
|
13 |
+
This is a collection of 64 language models, each with approximately 1B parameters, trained on different random mixtures of data. This project aims to validate the generalization capabilities of the RegMix approach (https://huggingface.co/papers/2407.01492) from small-scale (e.g., 1M parameters) to large-scale (e.g., 1B parameters) models.
|
14 |
+
|
15 |
+
## Key Features
|
16 |
+
|
17 |
+
- **Model Size**: 64 separate models, each with ~1B parameters
|
18 |
+
- **Training Data**: Random data mixtures on the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset
|
19 |
+
- **Purpose**: To validate the effectiveness of RegMix on identifying high-performing data mixture
|
20 |
+
|
21 |
+
## Dataset
|
22 |
+
|
23 |
+
The models were trained using the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset, which is split into different domains from The Pile dataset.
|
24 |
+
|
25 |
+
## Training Hyperparameters
|
26 |
+
|
27 |
+
| Hyperparameter | Value |
|
28 |
+
|:---------------|:------|
|
29 |
+
| Batch Size | 1M tokens |
|
30 |
+
| Learning Rate | 4e-4 |
|
31 |
+
| Minimum Learning Rate | 1e-5 |
|
32 |
+
| Learning Rate Schedule | Cosine |
|
33 |
+
| Warmup Ratio | 4% |
|
34 |
+
| Total Tokens | 25B |
|
35 |
+
|
36 |
## How to Load a Model
|
37 |
|
38 |
You can load any model using the corresponding branch with the Hugging Face Transformers library:
|
|
|
44 |
tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
|
45 |
```
|
46 |
|
|
|
47 |
## Data Mixture
|
48 |
|
49 |
The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.
|