peterizsak's picture
Upload README.md
48f830c verified
metadata
license: mit
language:
  - en

BGE-small-en-v1.5-rag-int8-static

A quantized version of BAAI/BGE-small-en-v1.5 quantized with Intel® Neural Compressor and compatible with Optimum-Intel.

The model can be used with Optimum-Intel API and as a standalone model or as an embedder or ranker module as part of fastRAG RAG pipeline.

Technical details

Quantized using post-training static quantization.

Calibration set qasper (with 50 random samples)"
Quantization tool Optimum-Intel
Backend IPEX
Original model BAAI/BGE-small-en-v1.5

Instructions how to reproduce the quantized model can be found here.

Evaluation - MTEB

Model performance on the Massive Text Embedding Benchmark (MTEB) retrieval and reranking tasks.

INT8 FP32 % diff
Reranking 0.5826 0.5836 -0.166%
Retrieval 0.5138 0.5168 -0.58%

Usage

Using with Optimum-intel

See Optimum-intel installation page for instructions how to install. Or run:

pip install -U optimum[neural-compressor, ipex] intel-extension-for-transformers

Loading a model:

from optimum.intel import IPEXModel

model = IPEXModel.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")

Running inference:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")

inputs = tokenizer(sentences, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)
    # get the vector of [CLS]
    embedded = model_output[0][:, 0]

Using with a fastRAG RAG pipeline

Get started with installing fastRAG as instructed here.

Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.

from fastrag.rankers import QuantizedBiEncoderRanker

ranker = QuantizedBiEncoderRanker("Intel/bge-small-en-v1.5-rag-int8-static")

and plugging it into a pipeline


from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="retriever", inputs=["Query"])
p.add_node(component=ranker, name="ranker", inputs=["retriever"])

See a more complete example notebook here.