awels
/

maximusLLM-14b-128k-gguf

Inference Endpoints

Model card Files Files and versions Community

fbeawels commited on Aug 11, 2024

Commit

d01551f

·

verified ·

1 Parent(s): 083053b

Update README.md

Files changed (1) hide show

README.md +12 -12

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ The dataset used to train Maximus consists of all the public documents available
 ## Training Details
 **Training Data:**
-The training data includes 16,000 Questions and Answers generated by the [Bonito LLM](https://github.com/BatsResearch/bonito). The dataset is split into 3 sets of data (training, test and validation) to ensure robust model performance.
 **Training Procedure:**
 Maximus was trained using supervised learning with cross-entropy loss and the Adam optimizer. The training involved 1 epoch, a batch size of 4, a learning rate of 5.0e-06, and a cosine learning rate scheduler with gradient checkpointing for memory efficiency.
@@ -56,22 +56,22 @@ The training was conducted using PyTorch.
 **Evaluation Metrics:**
 Maximus was evaluated on the training dataset:
-> epoch                    =        1.0
-  total_flos               = 60599604GF
-  train_loss               =     1.9974
-  train_runtime            = 0:18:06.31
-  train_samples_per_second =     11.261
-  train_steps_per_second   =      2.816
 **Performance:**
 The model achieved the following results on the evaluation dataset:
 > epoch                   =        1.0
-  eval_loss               =     1.6183
-  eval_runtime            = 0:00:51.01
-  eval_samples            =       2538
-  eval_samples_per_second =     61.025
-  eval_steps_per_second   =     15.271

 ## Training Details
 **Training Data:**
+The training data includes 67,000 Questions and Answers generated by the [Bonito LLM](https://github.com/BatsResearch/bonito). The dataset is split into 3 sets of data (training, test and validation) to ensure robust model performance.
 **Training Procedure:**
 Maximus was trained using supervised learning with cross-entropy loss and the Adam optimizer. The training involved 1 epoch, a batch size of 4, a learning rate of 5.0e-06, and a cosine learning rate scheduler with gradient checkpointing for memory efficiency.
 **Evaluation Metrics:**
 Maximus was evaluated on the training dataset:
+> epoch                    =         1.0
+  total_flos               = 233585641GF
+  train_loss               =      1.7111
+  train_runtime            =  1:08:52.73
+  train_samples_per_second =       11.41
+  train_steps_per_second   =       2.853
 **Performance:**
 The model achieved the following results on the evaluation dataset:
 > epoch                   =        1.0
+  eval_loss               =     1.4482
+  eval_runtime            = 0:03:24.92
+  eval_samples            =      10773
+  eval_samples_per_second =     57.386
+  eval_steps_per_second   =     14.347