elyza/ELYZA-japanese-Llama-2-7b-instruct · 403 Forbidden error when accessing the model

model_id = "elyza/ELYZA-japanese-Llama-2-7b-instruct"
llm_hub = HuggingFaceEndpoint(repo_id=model_id, temperature= 0.1, max_new_tokens=600, model_kwargs={"max_length": 600})

I am using the above code to load the model. Since the size of the model is more than my RAM I gues it won`t be possible to load it locally.
So I want to use the inference to load the model.

I am even setting the HuggingFace token using os.environ["HUGGINGFACEHUB_API_TOKEN"] but getting the below error:
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: /static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Felyza%2FELYZA-japanese-Llama-2-7b-instruct%3C%2Fcode%3E%3C%2Fp%3E

The same code works for other heavy models. I even tried changing the access token from Inference to Read & Write but did not work.


Does this have something to do with the HuggingFace plan?
Can anyone please help me with this?