BERT Paraphrase Detection (GLUE MRPC)
This model is fine-tuned for the paraphrase detection task on the GLUE MRPC dataset. It determines whether two given sentences are paraphrases (i.e., if they have the same meaning or not). This is a binary classification task with the following labels:
- 1: Paraphrase
- 0: Not a paraphrase
Model Overview
- Developer: Parit Kasnal
- Model Type: Sequence Classification (Binary)
- Language(s): English
- Pre-trained Model: BERT (bert-base-uncased)
Intended Use
This model is designed to assess whether two sentences convey the same meaning. It can be applied in various scenarios, including:
- Duplicate Question Detection: Identifying similar questions in QA systems.
- Plagiarism Detection: Detecting if content is copied and rephrased.
- Summarization Alignment: Matching sentences from summaries to the original content.
Example Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the fine-tuned model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Parit1/dummy")
tokenizer = AutoTokenizer.from_pretrained("Parit1/dummy")
def make_prediction(text1, text2):
device = "cuda" if torch.cuda.is_available() else "cpu"
inputs = tokenizer(text1, text2, truncation=True, padding=True, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
model.to(device)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
prediction = torch.argmax(logits, dim=-1).item()
return prediction
# Example usage
text1 = "The quick brown fox jumps over the lazy dog."
text2 = "A fast brown fox leaps over a lazy dog."
prediction = make_prediction(text1, text2)
print(f"Prediction: {prediction}")
Training Details
Training Data
The model was fine-tuned on the GLUE MRPC dataset, which contains pairs of sentences labeled as either paraphrases or not.
Training Procedure
- Number of Epochs: 2
- Metrics Used:
- Accuracy
- Precision
- Recall
- F1 Score
Training Logs (Summary)
Epoch | Avg Loss | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
1 | 0.5443 | 73.45% | 72.28% | 73.45% | 70.83% |
2 | 0.2756 | 89.34% | 89.25% | 89.34% | 89.27% |
Evaluation
Performance Metrics
The model's performance was evaluated using the following metrics:
- Accuracy: Percentage of correct predictions.
- Precision: Proportion of positive identifications that were actually correct.
- Recall: Proportion of actual positives that were correctly identified.
- F1 Score: The harmonic mean of Precision and Recall.
Test Set Results
Epoch | Avg Loss | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
1 | 0.3976 | 82.60% | 82.26% | 82.60% | 81.93% |
2 | 0.3596 | 84.80% | 84.94% | 84.80% | 84.87% |
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ParitKansal/BERT_Paraphrase_Detection_GLUE_MRPC
Base model
google-bert/bert-base-uncased