How to test:
import torch
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
pipeline,
logging,
)
model_name = "RohitSahoo/llama-2-7b-chat-hf-math-ft-V1"
# load the quantized settings, we're doing 4 bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=False,
)
# Load base model
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
# use the gpu
device_map={"":0}
)
# don't use the cache
model.config.use_cache = False
# Load the tokenizer from the model (llama2)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
text = '''Reward Modelling Phase: In the second phase the SFT model is prompted with prompts x to
produce pairs of answers (y1, y2 ) ∼ π SFT (y | x). These are then presented to human labelers
who express preferences for one answer, denoted as yw ≻ yl | x where yw and yl denotes the
preferred and dispreferred completion amongst (y1 , y2 ) respectively. The preferences are assumed
to be generated by some latent reward model r∗ (y, x), which we do not have access to. There are a
number of approaches used to model preferences, the Bradley-Terry (BT) [5] model being a popular
choice (although more general Plackett-Luce ranking models [30, 21] are also compatible with the
framework if we have access to several ranked answers). The BT model stipulates that the human
preference distribution p∗ can be written as:
p∗ (y1 ≻ y2 | x) =
exp (r∗ (x, y1 ))
.
exp (r∗
(x, y1 )) + exp (r ∗ (x, y2))
(1)
Assuming access to a static dataset of comparisons D =
x(i), yw (i) , yl (i) N
i=1 sampled from p∗ , we
can parametrize a reward model rφ (x, y) and estimate the parameters via maximum likelihood.
Framing the problem as a binary classification we have the negative log-likelihood loss:
LR(rφ , D) = −E(x,yw ,yl )∼D
log σ(rφ (x, yw ) − rφ (x, yl )).
Please explain the math behind this paper by explaining all the variables'''
import time
start = time.time()
logging.set_verbosity(logging.CRITICAL)
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=1500)
result = pipe(f"<s>[INST] {text} [/INST]")
print(result[0]['generated_text'])
end = time.time()
print(end - start)
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.