Tokenizer loading issue

#23
by Tanishq3232 - opened

tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
gives an error saying

image.png

full error:
Distant resource does not have an ETag, we won't be able to reliably ensure reproducibility.


OSError Traceback (most recent call last)
File ~/venvs/stgTenderTagging/lib/python3.9/site-packages/transformers/configuration_utils.py:566, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
564 try:
565 # Load from URL or cache if already cached
--> 566 resolved_config_file = cached_path(
567 config_file,
568 cache_dir=cache_dir,
569 force_download=force_download,
570 proxies=proxies,
571 resume_download=resume_download,
572 local_files_only=local_files_only,
573 use_auth_token=use_auth_token,
574 user_agent=user_agent,
575 )
576 # Load config dict

File ~/venvs/stgTenderTagging/lib/python3.9/site-packages/transformers/file_utils.py:1625, in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
1623 if is_remote_url(url_or_filename):
1624 # URL, so get it from the cache (downloading if necessary)
-> 1625 output_path = get_from_cache(
1626 url_or_filename,
1627 cache_dir=cache_dir,
1628 force_download=force_download,
1629 proxies=proxies,
1630 resume_download=resume_download,
1631 user_agent=user_agent,
1632 use_auth_token=use_auth_token,
1633 local_files_only=local_files_only,
1634 )
1635 elif os.path.exists(url_or_filename):
1636 # File, and it exists.

File ~/venvs/stgTenderTagging/lib/python3.9/site-packages/transformers/file_utils.py:1803, in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
1802 if etag is None:
-> 1803 raise OSError(
1804 "Distant resource does not have an ETag, we won't be able to reliably ensure reproducibility."
1805 )
1806 # In case of a redirect,
1807 # save an extra redirect on the request.get call,
1808 # and ensure we download the exact atomic version even if it changed
1809 # between the HEAD and the GET (unlikely, but hey).

OSError: Distant resource does not have an ETag, we won't be able to reliably ensure reproducibility.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
Cell In[3], line 3
1 from transformers import AutoTokenizer, AutoModelForCausalLM
----> 3 tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

File ~/venvs/stgTenderTagging/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py:487, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
485 if config_tokenizer_class is None:
486 if not isinstance(config, PretrainedConfig):
--> 487 config = AutoConfig.from_pretrained(
488 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
489 )
490 config_tokenizer_class = config.tokenizer_class
491 if hasattr(config, "auto_map") and "AutoTokenizer" in config.auto_map:

File ~/venvs/stgTenderTagging/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py:580, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
578 kwargs["name_or_path"] = pretrained_model_name_or_path
579 trust_remote_code = kwargs.pop("trust_remote_code", False)
--> 580 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
581 if "auto_map" in config_dict and "AutoConfig" in config_dict["auto_map"]:
582 if not trust_remote_code:

File ~/venvs/stgTenderTagging/lib/python3.9/site-packages/transformers/configuration_utils.py:591, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
588 if revision is not None:
589 msg += f"- or '{revision}' is a valid git identifier (branch name, a tag name, or a commit id) that exists for this model name as listed on its model page on 'https://huggingface.co/models'\n\n"
--> 591 raise EnvironmentError(msg)
593 except (json.JSONDecodeError, UnicodeDecodeError):
594 msg = (
595 f"Couldn't reach server at '{config_file}' to download configuration file or "
596 "configuration file is not a valid JSON file. "
597 f"Please check network or file content here: {resolved_config_file}."
598 )

OSError: Can't load config for 'EleutherAI/gpt-j-6B'. Make sure that:

  • 'EleutherAI/gpt-j-6B' is a correct model identifier listed on 'https://huggingface.co/models'
    (make sure 'EleutherAI/gpt-j-6B' is not a path to a local directory with something else, in that case)

  • or 'EleutherAI/gpt-j-6B' is the correct path to a directory containing a config.json file

I've had the same issue occur, been unable to find a fix.

(Edit: This has now been solved for me)

@johncookds how were you able to solve the problem? I am running into the same issue

@resz I believe it was a network connection issue for me that lasted a while. I originally suspected it was something to do with the huggingface space files but do not believe it actually was. After a few days the issue cleared up so there's a chance something was done on the HF side but I expect it was a network issue

I was getting this with transformers version 4.13, it is solved for me after upgrading.

I fixed it by using the model EleutherAI/gpt-j-6b instead of EleutherAI/gpt-j-6B so it is small b not B

Sign up or log in to comment