Update README.md
Browse files
README.md
CHANGED
@@ -749,12 +749,20 @@ Thus far, all completed in fp32 (_using nvidia tf32 dtype behind the scenes when
|
|
749 |
| SST2 | 90.6% | - | - | - | - | 0.2464 |
|
750 |
| QNLI | 89.6% | - | - | - | - | 0.2891 |
|
751 |
| MRPC | 84.07% | 86.59 | - | - | - | 0.3759 |
|
752 |
-
| STSB | - | 92.07 |
|
753 |
| MNLI | 82.2% | - | - | - | - | 0.4602 |
|
754 |
-
| CoLA | - | - | - | - |
|
755 |
| RTE | 66.43% | - | - | - | - | 0.6981 |
|
756 |
| WNLI | 35.21% | - | - | - | - | 0.7425 |
|
757 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
758 |
### Observations:
|
759 |
|
760 |
- **Performance Variation**: There's notable variation in model performance across different GLUE tasks. This variation can be attributed to the distinct nature of each task, the complexity of the datasets, and how well the model's architecture and hyperparameters are suited to each task.
|
|
|
749 |
| SST2 | 90.6% | - | - | - | - | 0.2464 |
|
750 |
| QNLI | 89.6% | - | - | - | - | 0.2891 |
|
751 |
| MRPC | 84.07% | 86.59 | - | - | - | 0.3759 |
|
752 |
+
| STSB | - | 92.07 | 0.9223 | 0.9192 | - | 0.4103 |
|
753 |
| MNLI | 82.2% | - | - | - | - | 0.4602 |
|
754 |
+
| CoLA | - | - | - | - | 0.6072 | 0.4569 |
|
755 |
| RTE | 66.43% | - | - | - | - | 0.6981 |
|
756 |
| WNLI | 35.21% | - | - | - | - | 0.7425 |
|
757 |
|
758 |
+
8-layer BERT with standard 512 ctx:
|
759 |
+
|
760 |
+
|
761 |
+
| Model | CoLA | SST-2 | MRPC | STS-B | QNLI | WNLI | RTE |
|
762 |
+
|-----------------------------|------|-------|--------|-------|------|------|--------|
|
763 |
+
| bert_uncased_L-8_H-768_A-12 | 0.54 | 0.91 | 0.88 | 0.93 | 0.90 | 0.34 | 0.67 |
|
764 |
+
|
765 |
+
|
766 |
### Observations:
|
767 |
|
768 |
- **Performance Variation**: There's notable variation in model performance across different GLUE tasks. This variation can be attributed to the distinct nature of each task, the complexity of the datasets, and how well the model's architecture and hyperparameters are suited to each task.
|