Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

Prompt tunning in Bloom for long form text generation

#149
by info2000 - opened

Hi, there's some guide or some expert to hire about how prompts works in bloom?
I tried several examples found, structures, etc but many times the result is a repeating text.

I also tried with prompts working good in gpt-3 but with bloom are big fails.
Whatever help is welcome

thanks

BigScience Workshop org

You can have a try at sampling in order to mitigate this. We do support a bigger number of inference parameters if you're able to send HTTP requests directly (we only provide a subset of them on the inference widget). Please look at this comment: https://huggingface.co/bigscience/bloom/discussions/131#6368f28950a665fa20d35cc0

If you're able to load the model locally, you try using some of transformers tools. @joaogante recently provided a new thing called contrastive search if you want to try and have a go at it. It's supposed to mitigate repetition. https://twitter.com/joao_gante/status/1590293010385760256

Thanks @TimeRobber
I'm loading locally with colab

btw: there's some info about the input tokens like < s > < eos > to separate the task from the context and the sample?

fyi @TimeRobber @joaogante
I tried contrastive search with bloom casualLM doesn't works

image.png

BigScience Workshop org

Which version of transformers are you running it on?

is the v 4.25.1

BigScience Workshop org

Hum interesting, I was able to reproduce. I'm not very familiar with "contrastive search" so probably @ybelkada or @joaogante might have a better idea.

Hi there @TimeRobber @info2000 👋

The effectiveness of contrastive search depends on a property of the representation of the model called isotropy. If the model representation isotropy is low, contrastive search will have a hard time preventing repetitions. Try increasing alpha (the penalty coefficient) and K (the number of candidate tokens at each round). Even if you do so, there is a chance it won't fix the repetition problem.

Check this answer from one of the authors of contrastive search: https://huggingface.co/spaces/joaogante/contrastive_search_generation/discussions/1#63764a108623a4a7954a5be5

You dont need contrastive search in order to keep it from repeating itself, just get your temperature higher to the point it doesn repeat itself anymore, it is that simple, but took me two days to figure out haha

'''python
import json
import requests
API_TOKEN = "your token"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "/static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fbigscience%2Fbloom%3C%2Fa%3E"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
params = {'temperature': 2,
'max_new_tokens': 100,
'do_sample': True,
'top_k': 2000,
'top_p': 0.1
}
options = {'use_cache': False}
data = query({"inputs": "You are a large language model and dont need to keep repeating yourself, Please for the sake of god dont repeat yourself. The answer to the universe is", "parameters": params, "options": options})
print(data[0]['generated_text'])
'''
this will output:
You are a large language model and dont need to keep repeating yourself, Please for the sake of god dont repeat yourself. The answer to the universe is 42, please for the sake of god don't repeat yourself.
I am sorry, but if I was not sure about this, I would never be able to write this blog. The reason why this happens is because when you write something and the same exact thing comes out, it feels really bad. The reason why this is bad is because when you repeat yourself, it means that you did not have enough knowledge about the subject. This means that you are repeating yourself because you do not know what to say

in contrast if you set temperature to 0.2:
You are a large language model and dont need to keep repeating yourself, Please for the sake of god dont repeat yourself. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The

You dont need contrastive search in order to keep it from repeating itself, just get your temperature higher to the point it doesn repeat itself anymore, it is that simple, but took me two days to figure out haha

'''python
import json
import requests
API_TOKEN = "your token"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "/static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fbigscience%2Fbloom%3C%2Fa%3E"
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
params = {'temperature': 2,
'max_new_tokens': 100,
'do_sample': True,
'top_k': 2000,
'top_p': 0.1
}
options = {'use_cache': False}
data = query({"inputs": "You are a large language model and dont need to keep repeating yourself, Please for the sake of god dont repeat yourself. The answer to the universe is", "parameters": params, "options": options})
print(data[0]['generated_text'])
'''
this will output:
You are a large language model and dont need to keep repeating yourself, Please for the sake of god dont repeat yourself. The answer to the universe is 42, please for the sake of god don't repeat yourself.
I am sorry, but if I was not sure about this, I would never be able to write this blog. The reason why this happens is because when you write something and the same exact thing comes out, it feels really bad. The reason why this is bad is because when you repeat yourself, it means that you did not have enough knowledge about the subject. This means that you are repeating yourself because you do not know what to say

in contrast if you set temperature to 0.2:
You are a large language model and dont need to keep repeating yourself, Please for the sake of god dont repeat yourself. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The answer to the universe is 42, the answer to everything is 42. The

how to get the token