FineMath classifier

Model summary

This is a classifier for evaluating mathematical reasoning and deduction in web pages, fine-tuned from intfloat/multilingual-e5-small. It was developed to filter and curate mathematical content from web datasets and was trained on 1M annotations generated by LLama3-70B-instruct for web samples from Common Crawl, which were extracted using the OpenWebMath text extraction pipeline. To ensure a balanced dataset, we upsampled pages containing mathematical content in the annotations, using a preliminary math classifier on 5M samples.

We used this classifier to build FineMath dataset.

How to use in transformers

To load the FineMath classifier, use the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/finemath-classifier")
model = AutoModelForSequenceClassification.from_pretrained("HuggingFaceTB/finemath-classifier")

text = "This is a test sentence."
inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
outputs = model(**inputs)
logits = outputs.logits.squeeze(-1).float().detach().numpy()
score = logits.item()
result = {
    "text": text,
    "score": score,
    "int_score": int(round(max(0, min(score, 5)))),
}

print(result)
# {'text': 'This is a test sentence.', 'score': 0.07964489609003067, 'int_score': 0}

Training

The classifier was trained on 1M pairs of web samples and their scores from 0 to 5, generated by Llama3. The samples were annotated based on their usefulness for studying mathematics with 0 being not educational or containing matematical content and 5 being outstanding for mathetmatics education.

Below is the prompt used for LLama3 annotations:

Prompt for LLM annotation

We added a classification head with a single regression output to intfloat/multilingual-e5-small and trained the model for 20 epochs with a learning rate of 3e-4. During training, the embedding and encoder layers were frozen to focus on the classification head. The model achieved an F1 score of 87% when converted to a binary classifier using a score threshold of 3.

Training Details:

  • Model: intfloat/multilingual-e5-smallwith a classification head
  • Dataset: 1M samples from Llama3 annotations
  • Epochs: 20
  • Learning Rate: 3e-4
  • Evaluation Metric: F1 score

Evaluation: The model achieves the following results on the evaluation set:

  • Loss: 0.4478
  • Precision: 0.8771
  • Recall: 0.8769
  • F1 Macro: 0.8770
  • Accuracy: 0.8770
Downloads last month
49
Safetensors
Model size
118M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for HuggingFaceTB/finemath-classifier

Finetuned
(59)
this model

Collection including HuggingFaceTB/finemath-classifier