BEE-spoke-data
/

bert-plus-L8-4096-v1.0

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Feb 11, 2024

Commit

6ee299f

·

verified ·

1 Parent(s): 89176c1

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -749,12 +749,20 @@ Thus far, all completed in fp32 (_using nvidia tf32 dtype behind the scenes when
 | SST2      | 90.6%    | -              | -       | -         | -                    | 0.2464  |
 | QNLI      | 89.6%    | -              | -       | -         | -                    | 0.2891  |
 | MRPC      | 84.07%   | 86.59         | -       | -         | -                    | 0.3759  |
-| STSB      | -        | 92.07         | 92.23  | 91.92    | -                    | 0.4103  |
 | MNLI      | 82.2%    | -              | -       | -         | -                    | 0.4602  |
-| CoLA      | -        | -              | -       | -         | 60.72               | 0.4569  |
 | RTE       | 66.43%   | -              | -       | -         | -                    | 0.6981  |
 | WNLI      | 35.21%   | -              | -       | -         | -                    | 0.7425  |
 ### Observations:
 - **Performance Variation**: There's notable variation in model performance across different GLUE tasks. This variation can be attributed to the distinct nature of each task, the complexity of the datasets, and how well the model's architecture and hyperparameters are suited to each task.

 | SST2      | 90.6%    | -              | -       | -         | -                    | 0.2464  |
 | QNLI      | 89.6%    | -              | -       | -         | -                    | 0.2891  |
 | MRPC      | 84.07%   | 86.59         | -       | -         | -                    | 0.3759  |
+| STSB      | -        | 92.07         | 0.9223  | 0.9192    | -                    | 0.4103  |
 | MNLI      | 82.2%    | -              | -       | -         | -                    | 0.4602  |
+| CoLA      | -        | -              | -       | -         | 0.6072               | 0.4569  |
 | RTE       | 66.43%   | -              | -       | -         | -                    | 0.6981  |
 | WNLI      | 35.21%   | -              | -       | -         | -                    | 0.7425  |
+8-layer BERT with standard 512 ctx:
+| Model                       | CoLA | SST-2 | MRPC   | STS-B | QNLI | WNLI | RTE    |
+|-----------------------------|------|-------|--------|-------|------|------|--------|
+| bert_uncased_L-8_H-768_A-12 | 0.54 | 0.91  | 0.88   | 0.93  | 0.90 | 0.34 | 0.67   |
 ### Observations:
 - **Performance Variation**: There's notable variation in model performance across different GLUE tasks. This variation can be attributed to the distinct nature of each task, the complexity of the datasets, and how well the model's architecture and hyperparameters are suited to each task.