fbeawels commited on
Commit
d01551f
·
verified ·
1 Parent(s): 083053b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -40,7 +40,7 @@ The dataset used to train Maximus consists of all the public documents available
40
  ## Training Details
41
 
42
  **Training Data:**
43
- The training data includes 16,000 Questions and Answers generated by the [Bonito LLM](https://github.com/BatsResearch/bonito). The dataset is split into 3 sets of data (training, test and validation) to ensure robust model performance.
44
 
45
  **Training Procedure:**
46
  Maximus was trained using supervised learning with cross-entropy loss and the Adam optimizer. The training involved 1 epoch, a batch size of 4, a learning rate of 5.0e-06, and a cosine learning rate scheduler with gradient checkpointing for memory efficiency.
@@ -56,22 +56,22 @@ The training was conducted using PyTorch.
56
  **Evaluation Metrics:**
57
  Maximus was evaluated on the training dataset:
58
 
59
- > epoch = 1.0
60
- total_flos = 60599604GF
61
- train_loss = 1.9974
62
- train_runtime = 0:18:06.31
63
- train_samples_per_second = 11.261
64
- train_steps_per_second = 2.816
65
 
66
  **Performance:**
67
  The model achieved the following results on the evaluation dataset:
68
 
69
  > epoch = 1.0
70
- eval_loss = 1.6183
71
- eval_runtime = 0:00:51.01
72
- eval_samples = 2538
73
- eval_samples_per_second = 61.025
74
- eval_steps_per_second = 15.271
75
 
76
 
77
 
 
40
  ## Training Details
41
 
42
  **Training Data:**
43
+ The training data includes 67,000 Questions and Answers generated by the [Bonito LLM](https://github.com/BatsResearch/bonito). The dataset is split into 3 sets of data (training, test and validation) to ensure robust model performance.
44
 
45
  **Training Procedure:**
46
  Maximus was trained using supervised learning with cross-entropy loss and the Adam optimizer. The training involved 1 epoch, a batch size of 4, a learning rate of 5.0e-06, and a cosine learning rate scheduler with gradient checkpointing for memory efficiency.
 
56
  **Evaluation Metrics:**
57
  Maximus was evaluated on the training dataset:
58
 
59
+ > epoch = 1.0
60
+ total_flos = 233585641GF
61
+ train_loss = 1.7111
62
+ train_runtime = 1:08:52.73
63
+ train_samples_per_second = 11.41
64
+ train_steps_per_second = 2.853
65
 
66
  **Performance:**
67
  The model achieved the following results on the evaluation dataset:
68
 
69
  > epoch = 1.0
70
+ eval_loss = 1.4482
71
+ eval_runtime = 0:03:24.92
72
+ eval_samples = 10773
73
+ eval_samples_per_second = 57.386
74
+ eval_steps_per_second = 14.347
75
 
76
 
77