How to test:

import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    pipeline,
    logging,
)

model_name = "RohitSahoo/llama-2-7b-chat-hf-math-ft-V1"


# load the quantized settings, we're doing 4 bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    # use the gpu
    device_map={"":0}
)

# don't use the cache
model.config.use_cache = False

# Load the tokenizer from the model (llama2)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


text = '''Reward Modelling Phase: In the second phase the SFT model is prompted with prompts x to
produce pairs of answers (y1, y2 ) ∼ π SFT (y | x). These are then presented to human labelers
who express preferences for one answer, denoted as yw ≻ yl | x where yw and yl denotes the
preferred and dispreferred completion amongst (y1 , y2 ) respectively. The preferences are assumed
to be generated by some latent reward model r∗ (y, x), which we do not have access to. There are a
number of approaches used to model preferences, the Bradley-Terry (BT) [5] model being a popular
choice (although more general Plackett-Luce ranking models [30, 21] are also compatible with the
framework if we have access to several ranked answers). The BT model stipulates that the human
preference distribution p∗ can be written as:
p∗ (y1 ≻ y2 | x) =
exp (r∗ (x, y1 ))
.
exp (r∗
(x, y1 )) + exp (r ∗ (x, y2))
(1)
Assuming access to a static dataset of comparisons D = 
x(i), yw (i) , yl (i) N
i=1 sampled from p∗ , we
can parametrize a reward model rφ (x, y) and estimate the parameters via maximum likelihood.
Framing the problem as a binary classification we have the negative log-likelihood loss:
LR(rφ , D) = −E(x,yw ,yl )∼D 
log σ(rφ (x, yw ) − rφ (x, yl )).

Please explain the math behind this paper by explaining all the variables'''


import time
start = time.time()

logging.set_verbosity(logging.CRITICAL)

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=1500)
result = pipe(f"<s>[INST] {text} [/INST]")
print(result[0]['generated_text'])

end = time.time()
print(end - start)
Downloads last month
19
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.