How to use gated model in inference

John6666 · September 26, 2024, 11:27pm

You could do it by making the code look like this.

Sample code

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Actual Code

import transformers
import torch

hf_token = "hf_*********" # When uploading code, never write directly!

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    token=hf_token,
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

You can make as many tokens as you want here. You should name your tokens in such a way that you can easily identify which is which.

Topic		Replies	Views
How to use llm (access fail) Beginners	4	150	August 21, 2024
Llama 3.2 1G error Beginners	1	106	October 3, 2024
How to use gated models? 🤗Hub	9	25028	September 17, 2024
Langchain ChatHuggingFace Beginners	14	86	December 14, 2024
When deploying AutoTrained model: "Cannot access gated repo" 🤗AutoTrain	1	641	May 1, 2024

How to use gated model in inference

Sample code

Actual Code

Related topics