Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ The dataset used to train Maximus consists of all the public documents available
|
|
40 |
## Training Details
|
41 |
|
42 |
**Training Data:**
|
43 |
-
The training data includes
|
44 |
|
45 |
**Training Procedure:**
|
46 |
Maximus was trained using supervised learning with cross-entropy loss and the Adam optimizer. The training involved 1 epoch, a batch size of 4, a learning rate of 5.0e-06, and a cosine learning rate scheduler with gradient checkpointing for memory efficiency.
|
@@ -56,22 +56,22 @@ The training was conducted using PyTorch.
|
|
56 |
**Evaluation Metrics:**
|
57 |
Maximus was evaluated on the training dataset:
|
58 |
|
59 |
-
> epoch =
|
60 |
-
total_flos =
|
61 |
-
train_loss =
|
62 |
-
train_runtime =
|
63 |
-
train_samples_per_second =
|
64 |
-
train_steps_per_second =
|
65 |
|
66 |
**Performance:**
|
67 |
The model achieved the following results on the evaluation dataset:
|
68 |
|
69 |
> epoch = 1.0
|
70 |
-
eval_loss = 1.
|
71 |
-
eval_runtime = 0:
|
72 |
-
eval_samples =
|
73 |
-
eval_samples_per_second =
|
74 |
-
eval_steps_per_second =
|
75 |
|
76 |
|
77 |
|
|
|
40 |
## Training Details
|
41 |
|
42 |
**Training Data:**
|
43 |
+
The training data includes 67,000 Questions and Answers generated by the [Bonito LLM](https://github.com/BatsResearch/bonito). The dataset is split into 3 sets of data (training, test and validation) to ensure robust model performance.
|
44 |
|
45 |
**Training Procedure:**
|
46 |
Maximus was trained using supervised learning with cross-entropy loss and the Adam optimizer. The training involved 1 epoch, a batch size of 4, a learning rate of 5.0e-06, and a cosine learning rate scheduler with gradient checkpointing for memory efficiency.
|
|
|
56 |
**Evaluation Metrics:**
|
57 |
Maximus was evaluated on the training dataset:
|
58 |
|
59 |
+
> epoch = 1.0
|
60 |
+
total_flos = 233585641GF
|
61 |
+
train_loss = 1.7111
|
62 |
+
train_runtime = 1:08:52.73
|
63 |
+
train_samples_per_second = 11.41
|
64 |
+
train_steps_per_second = 2.853
|
65 |
|
66 |
**Performance:**
|
67 |
The model achieved the following results on the evaluation dataset:
|
68 |
|
69 |
> epoch = 1.0
|
70 |
+
eval_loss = 1.4482
|
71 |
+
eval_runtime = 0:03:24.92
|
72 |
+
eval_samples = 10773
|
73 |
+
eval_samples_per_second = 57.386
|
74 |
+
eval_steps_per_second = 14.347
|
75 |
|
76 |
|
77 |
|