sail
/

Text Generation
Transformers
English
llama
Inference Endpoints
SivilTaram commited on
Commit
de7da50
·
verified ·
1 Parent(s): a356e6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -1
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: mit
3
  datasets:
4
  - sail/regmix-data
 
5
  language:
6
  - en
7
  ---
@@ -9,6 +10,29 @@ language:
9
 
10
  # Models Trained with Random Mixture
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ## How to Load a Model
13
 
14
  You can load any model using the corresponding branch with the Hugging Face Transformers library:
@@ -20,7 +44,6 @@ model = AutoModel.from_pretrained("sail/data-mixture-random-1b", revision="model
20
  tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
21
  ```
22
 
23
-
24
  ## Data Mixture
25
 
26
  The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.
 
2
  license: mit
3
  datasets:
4
  - sail/regmix-data
5
+ - sail/regmix-data-sample
6
  language:
7
  - en
8
  ---
 
10
 
11
  # Models Trained with Random Mixture
12
 
13
+ This is a collection of 64 language models, each with approximately 1B parameters, trained on different random mixtures of data. This project aims to validate the generalization capabilities of the RegMix approach (https://huggingface.co/papers/2407.01492) from small-scale (e.g., 1M parameters) to large-scale (e.g., 1B parameters) models.
14
+
15
+ ## Key Features
16
+
17
+ - **Model Size**: 64 separate models, each with ~1B parameters
18
+ - **Training Data**: Random data mixtures on the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset
19
+ - **Purpose**: To validate the effectiveness of RegMix on identifying high-performing data mixture
20
+
21
+ ## Dataset
22
+
23
+ The models were trained using the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset, which is split into different domains from The Pile dataset.
24
+
25
+ ## Training Hyperparameters
26
+
27
+ | Hyperparameter | Value |
28
+ |:---------------|:------|
29
+ | Batch Size | 1M tokens |
30
+ | Learning Rate | 4e-4 |
31
+ | Minimum Learning Rate | 1e-5 |
32
+ | Learning Rate Schedule | Cosine |
33
+ | Warmup Ratio | 4% |
34
+ | Total Tokens | 25B |
35
+
36
  ## How to Load a Model
37
 
38
  You can load any model using the corresponding branch with the Hugging Face Transformers library:
 
44
  tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
45
  ```
46
 
 
47
  ## Data Mixture
48
 
49
  The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.