SetFit with BAAI/bge-m3

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-m3 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Sentence Transformer body: BAAI/bge-m3
  • Classification head: a LogisticRegression instance
  • Maximum Sequence Length: 8192 tokens
  • Number of Classes: 2 classes

Model Sources

Model Labels

Label Examples
lexical
  • "How does Happeo's search AI work to provide answers to user queries?"
  • 'What are the primary areas of focus in the domain of Data Science and Analysis?'
  • 'How can one organize a running event in Belgium?'
semantic
  • 'What changes can be made to a channel header?'
  • 'How can hardware capabilities impact the accuracy of motion and object detections?'
  • 'Who is responsible for managing guarantees and prolongations?'

Evaluation

Metrics

Label Accuracy
all 0.8947

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("yaniseuranova/setfit-rag-hybrid-search-query-router")
# Run inference
preds = model("What is the purpose of setting up a CUPS on a server?")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 4 13.7407 28
Label Training Sample Count
lexical 44
semantic 118

Training Hyperparameters

  • batch_size: (8, 8)
  • num_epochs: (3, 3)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.0005 1 0.257 -
0.0250 50 0.1944 -
0.0499 100 0.2383 -
0.0749 150 0.1279 -
0.0999 200 0.0033 -
0.1248 250 0.0021 -
0.1498 300 0.0012 -
0.1747 350 0.0008 -
0.1997 400 0.0004 -
0.2247 450 0.0006 -
0.2496 500 0.0005 -
0.2746 550 0.0003 -
0.2996 600 0.0003 -
0.3245 650 0.0003 -
0.3495 700 0.0004 -
0.3744 750 0.0005 -
0.3994 800 0.0003 -
0.4244 850 0.0002 -
0.4493 900 0.0002 -
0.4743 950 0.0002 -
0.4993 1000 0.0001 -
0.5242 1050 0.0001 -
0.5492 1100 0.0001 -
0.5741 1150 0.0002 -
0.5991 1200 0.0001 -
0.6241 1250 0.0003 -
0.6490 1300 0.0002 -
0.6740 1350 0.0001 -
0.6990 1400 0.0003 -
0.7239 1450 0.0001 -
0.7489 1500 0.0002 -
0.7738 1550 0.0001 -
0.7988 1600 0.0002 -
0.8238 1650 0.0002 -
0.8487 1700 0.0002 -
0.8737 1750 0.0002 -
0.8987 1800 0.0003 -
0.9236 1850 0.0001 -
0.9486 1900 0.0001 -
0.9735 1950 0.0001 -
0.9985 2000 0.0001 -
1.0 2003 - 0.1735
1.0235 2050 0.0001 -
1.0484 2100 0.0001 -
1.0734 2150 0.0001 -
1.0984 2200 0.0 -
1.1233 2250 0.0001 -
1.1483 2300 0.0001 -
1.1732 2350 0.0001 -
1.1982 2400 0.0002 -
1.2232 2450 0.0001 -
1.2481 2500 0.0 -
1.2731 2550 0.0001 -
1.2981 2600 0.0001 -
1.3230 2650 0.0 -
1.3480 2700 0.0001 -
1.3729 2750 0.0001 -
1.3979 2800 0.0001 -
1.4229 2850 0.0 -
1.4478 2900 0.0001 -
1.4728 2950 0.0001 -
1.4978 3000 0.0001 -
1.5227 3050 0.0001 -
1.5477 3100 0.0 -
1.5726 3150 0.0 -
1.5976 3200 0.0001 -
1.6226 3250 0.0001 -
1.6475 3300 0.0001 -
1.6725 3350 0.0001 -
1.6975 3400 0.0001 -
1.7224 3450 0.0 -
1.7474 3500 0.0002 -
1.7723 3550 0.0001 -
1.7973 3600 0.0 -
1.8223 3650 0.0 -
1.8472 3700 0.0001 -
1.8722 3750 0.0 -
1.8972 3800 0.0001 -
1.9221 3850 0.0 -
1.9471 3900 0.0 -
1.9720 3950 0.0001 -
1.9970 4000 0.0 -
2.0 4006 - 0.2593
2.0220 4050 0.0001 -
2.0469 4100 0.0001 -
2.0719 4150 0.0 -
2.0969 4200 0.0001 -
2.1218 4250 0.0 -
2.1468 4300 0.0001 -
2.1717 4350 0.0001 -
2.1967 4400 0.0001 -
2.2217 4450 0.0001 -
2.2466 4500 0.0001 -
2.2716 4550 0.0 -
2.2966 4600 0.0 -
2.3215 4650 0.0 -
2.3465 4700 0.0001 -
2.3714 4750 0.0001 -
2.3964 4800 0.0002 -
2.4214 4850 0.0001 -
2.4463 4900 0.0001 -
2.4713 4950 0.0 -
2.4963 5000 0.0001 -
2.5212 5050 0.0001 -
2.5462 5100 0.0 -
2.5711 5150 0.0001 -
2.5961 5200 0.0 -
2.6211 5250 0.0 -
2.6460 5300 0.0 -
2.6710 5350 0.0 -
2.6960 5400 0.0 -
2.7209 5450 0.0 -
2.7459 5500 0.0 -
2.7708 5550 0.0 -
2.7958 5600 0.0001 -
2.8208 5650 0.0 -
2.8457 5700 0.0 -
2.8707 5750 0.0 -
2.8957 5800 0.0 -
2.9206 5850 0.0 -
2.9456 5900 0.0001 -
2.9705 5950 0.0 -
2.9955 6000 0.0 -
3.0 6009 - 0.2738
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 2.6.1
  • Transformers: 4.39.0
  • PyTorch: 2.3.1+cu121
  • Datasets: 2.18.0
  • Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
29
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yaniseuranova/setfit-rag-hybrid-search-query-router

Base model

BAAI/bge-m3
Finetuned
(185)
this model

Evaluation results