library_name: transformers
license: mit
base_model: agentlans/snowflake-arctic-embed-xs-zyda-2
tags:
- generated_from_trainer
- text-classification
- grammar-classification
metrics:
- accuracy
model-index:
- name: agentlans/snowflake-arctic-xs-grammar-classifier
results:
- task:
type: text-classification
name: Grammar Classification
dataset:
name: agentlans/grammar-classification
type: agentlans/grammar-classification
metrics:
- type: accuracy
value: 0.8724
name: Accuracy
datasets:
- agentlans/grammar-classification
- liweili/c4_200m
language:
- en
pipeline_tag: text-classification
snowflake-arctic-xs-grammar-classifier
This model is a fine-tuned version of agentlans/snowflake-arctic-embed-xs-zyda-2 for grammar classification. It achieves an accuracy of 0.8724 on the evaluation set.
Model description
The snowflake-arctic-xs-grammar-classifier is designed to classify the grammatical correctness of English sentences. It is based on the snowflake-arctic-embed-xs-zyda-2 model and has been fine-tuned on a grammar classification dataset derived from the C4 (Colossal Clean Crawled Corpus).
Intended uses & limitations
This model is intended for classifying the grammatical correctness of English sentences. It can be used in various applications such as writing assistance tools, educational software, or content moderation systems.
Usage example
from transformers import pipeline
import torch
device = 0 if torch.cuda.is_available() else -1
classifier = pipeline(
"text-classification",
model="agentlans/snowflake-arctic-xs-grammar-classifier",
device=device,
)
text = "I absolutely loved this movie!"
result = classifier(text)
print(result) # [{'label': 'grammatical', 'score': 0.8963921666145325}]
Example Classifications
Status | Text | Explanation |
---|---|---|
βοΈ | I absolutely loved this movie! | Grammatically correct, clear sentence structure |
β | How do I shot web? | Grammatically incorrect, improper verb usage |
βοΈ | Beware the Jabberwock, my son! | Poetic language, grammatically sound |
βοΈ | Colourless green ideas sleep furiously. | Grammatically correct, though semantically nonsensical |
β | Has anyone really been far even as decided to use even go want to do look more like? | Completely incoherent and grammatically incorrect |
Limitations
The model's performance is limited by the quality and diversity of its training data. It may not perform well on specialized or domain-specific text, or on languages other than English. Additionally, it may struggle with complex grammatical structures or nuanced language use.
Training and evaluation data
The model was trained on the agentlans/grammar-classification dataset, which contains 600β000 examples for binary classification of grammatical correctness in English. This dataset is derived from a subset of the C4_200M Synthetic Dataset for Grammatical Error Correction.
Training procedure
Training hyperparameters
- Learning rate: 5e-05
- Batch size: 128
- Number of epochs: 10
- Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- Learning rate scheduler: Linear
π Detailed Training Results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Input Tokens Seen |
---|---|---|---|---|---|
0.5192 | 1.0 | 3750 | 0.4722 | 0.7738 | 61β440β000 |
0.4875 | 2.0 | 7500 | 0.4521 | 0.7881 | 122β880β000 |
0.4590 | 3.0 | 11250 | 0.3895 | 0.8227 | 184β320β000 |
0.4351 | 4.0 | 15000 | 0.3981 | 0.8197 | 245β760β000 |
0.4157 | 5.0 | 18750 | 0.3690 | 0.8337 | 307β200β000 |
0.3955 | 6.0 | 22500 | 0.3260 | 0.8585 | 368β640β000 |
0.3788 | 7.0 | 26250 | 0.3267 | 0.8566 | 430β080β000 |
0.3616 | 8.0 | 30000 | 0.3192 | 0.8621 | 491β520β000 |
0.3459 | 9.0 | 33750 | 0.3017 | 0.8707 | 552β960β000 |
0.3382 | 10.0 | 37500 | 0.2971 | 0.8724 | 614β400β000 |
Framework versions
- Transformers: 4.46.3
- PyTorch: 2.5.1+cu124
- Datasets: 3.2.0
- Tokenizers: 20.3