sail
/

data-mixture-random-1b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

SivilTaram commited on Jul 3, 2024

Commit

de7da50

·

verified ·

1 Parent(s): a356e6c

Update README.md

Files changed (1) hide show

README.md +24 -1

README.md CHANGED Viewed

@@ -2,6 +2,7 @@
 license: mit
 datasets:
 - sail/regmix-data
 language:
 - en
 ---
@@ -9,6 +10,29 @@ language:
 # Models Trained with Random Mixture
 ## How to Load a Model
 You can load any model using the corresponding branch with the Hugging Face Transformers library:
@@ -20,7 +44,6 @@ model = AutoModel.from_pretrained("sail/data-mixture-random-1b", revision="model
 tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
 ```
 ## Data Mixture
 The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.

 license: mit
 datasets:
 - sail/regmix-data
+- sail/regmix-data-sample
 language:
 - en
 ---
 # Models Trained with Random Mixture
+This is a collection of 64 language models, each with approximately 1B parameters, trained on different random mixtures of data. This project aims to validate the generalization capabilities of the RegMix approach (https://huggingface.co/papers/2407.01492) from small-scale (e.g., 1M parameters) to large-scale (e.g., 1B parameters) models.
+## Key Features
+- **Model Size**: 64 separate models, each with ~1B parameters
+- **Training Data**: Random data mixtures on the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset
+- **Purpose**: To validate the effectiveness of RegMix on identifying high-performing data mixture
+## Dataset
+The models were trained using the [RegMix-Data](https://huggingface.co/datasets/sail/regmix-data) dataset, which is split into different domains from The Pile dataset.
+## Training Hyperparameters
+| Hyperparameter | Value |
+|:---------------|:------|
+| Batch Size | 1M tokens |
+| Learning Rate | 4e-4 |
+| Minimum Learning Rate | 1e-5 |
+| Learning Rate Schedule | Cosine |
+| Warmup Ratio | 4% |
+| Total Tokens | 25B |
 ## How to Load a Model
 You can load any model using the corresponding branch with the Hugging Face Transformers library:
 tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
 ```
 ## Data Mixture
 The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.