Trying to use the model with training on trlx

#1
by birgermoell - opened
AI Sweden Model Hub org
edited Jan 1, 2023

I'm trying out loading a model with trlx and getting the following error.

Example code

import trlx
trainer = trlx.train('AI-Sweden/gpt-sw3-126m', dataset=[('dolphins', 'geese'), (1.0, 100.0)])
print("trainer: ", trainer)

OSError: AI-Sweden/gpt-sw3-126m does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Steps to reproduce.

Install trlx following the documentation here and then run the code.
https://github.com/CarperAI/trlx

If you compare it to https://huggingface.co/EleutherAI/gpt-j-6B/tree/main there are additional files in their model folder

Screenshot 2023-01-01 at 15.43.31.png

Files in the gpt-sw3-126m repo
Screenshot 2023-01-01 at 15.44.41.png

AI Sweden Model Hub org

Hey, without having looked deeply I think it could simply be a problem with repository/model path. Try
AI-Sweden-Models/gpt-sw3-126m

AI Sweden Model Hub org

Great. That actually solved the issue but I got a new one. Might be related to trlx so not sure if this is related to the model.

AI Sweden Model Hub org

Seems to be related to dimensions of the output, but this could be related to trlx.
Screenshot 2023-01-01 at 15.55.04.png

AI Sweden Model Hub org

RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [1536, 2] but got: [1536, 1].

AI Sweden Model Hub org
edited Jan 1, 2023

Btw @JoeyOhman , it said in the readme you should load it like this.

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Initialize Variables
model_name = "AI-Sweden/gpt-sw3-126m"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
prompt = "Träd är fina för att"

# Initialize Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
model.to(device)

Maybe just update the readme to.

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Initialize Variables
model_name = "AI-Sweden-Models/gpt-sw3-126m"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
prompt = "Träd är fina för att"

# Initialize Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
model.to(device)
AI Sweden Model Hub org

Yeah, we recently changed the repository name and will fix this as soon as possible. Thanks!

I don't know why you get that error and I don't have access to my computer until a few days (maybe someone else will jump in here before that). However, something that concerns me is that it seems to use GPT2TokenizerFast and not the GPTSw3Tokenizer. It might not be related to the error but will probably give you unexpected behavior later on. The config file does point to the correct tokenizer, please let us know if you think that problem could be on our end!

AI Sweden Model Hub org

@JoeyOhman There is no tokenizer.json or tokenizer_config.json in this repository. Could it be that this makes huggingface use a default tokenizer that somehow breaks things?

AI Sweden Model Hub org

Install transformers from source and installing sentencepiece resolved the issue. Then I could load the tokenizer with the following code.
self.tokenizer = AutoTokenizer.from_pretrained("AI-Sweden-Models/gpt-sw3-126m", use_auth_token=True)

birgermoell changed discussion status to closed
birgermoell changed discussion status to open
AI Sweden Model Hub org

Sorry for the delayed answer and great that you solved it!

The README:s now have the correct repository path and a note about how to use the access token. Installing from source is required right now, but not for much longer as GPTSw3 should be included in HF Transformer's next official release.

Sign up or log in to comment