Could not load Llama model from path

#5
by rahul07 - opened

getting this error while loading the model -
Could not load Llama model from path: /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/47d28ef5de4f3de523c421f325a2e4e039035bab/llama-2-13b-chat.ggmlv3.q5_1.bin. Received error fileno (type=value_error)

How are you trying to load it? Using what client/library?

I'm loading the model via this code -

Loading model,

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=False,
        )

I'm trying to pass a pdf and query it using this sheet via your model - https://github.com/MuhammadMoinFaisal/LargeLanguageModelsProjects/blob/main/QA%20Book%20PDF%20LangChain%20Llama%202/Final_Llama_CPP_Ask_Question_from_book_PDF_Llama.ipynb

image.png

I also got the same error, have you found the solution?

image.png
I also got the same error, have you found the solution?

i am facing same error, how can resolve it please help me :- (
Could not load Llama model from path: /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/47d28ef5de4f3de523c421f325a2e4e039035bab/llama-2-13b-chat.ggmlv3.q5_1.bin. Received error fileno (type=value_error)

Please report this to whatever client provides that code. Nothing has changed with my models.

Thanks a lot TheBloke for your immense work. I am working with llama2 7b/13b q8 models both successfully in koboldcpp. But i can't get both of them work with lLamaCpp. I am getting value errror, assertion error. Do you have any suggestions, i can try. Thanks.

@shodhi llama.cpp no longer supports GGML models as of August 21st. GGML has been replaced by a new format called GGUF.

I will soon be providing GGUF models for all my existing GGML repos, but I'm waiting until they fix a bug with GGUF models. I will also soon update the READMEs on all my GGML models to mention this.

For now, please downgrade llama.cpp to commit dadbed99e65252d79f81101a392d0d6497b86caa and rebuild it, and it will work fine with these and all other GGML files. If you're using llama-cpp-python, please use version v0.1.78 or earlier.

Thank you so much for the prompt response. I will do as suggested and update it here. I was thinking of trying the model with Ctransformers inspite of llama also. Will update the results of that too here. Regards

Unfortunately it doesnt works with llama-cpp-python v0.1.78/0.1.77/0.1.76.
Else, I want it to be worked with langchain's LlamaCpp mostly?
No luck with cTransformers as well.
Any recommendations? Thanks

Screenshot (216).png
Screenshot (217).png
please take look , i am facing this error

Use this colab code as a starting point
At this time I do not know if some parameters in LlamaCpp() are ignored or if they need to be in some sort of metafile as input to the conversion but at least the model should work

!pip install -qq langchain wget 
!pip install gguf  #https://github.com/ggerganov/llama.cpp/tree/master/gguf-py
!git clone https://github.com/ggerganov/llama.cpp
!pip -qq install git+https://github.com/huggingface/transformers
#Assuming you are using a GPU
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

from huggingface_hub import hf_hub_download
repo_id="TheBloke/Llama-2-13B-GGML"; filename="llama-2-13b.ggmlv3.q5_1.bin"
hf_hub_download(
    repo_id=repo_id, filename=filename,
    local_dir="/content"
)

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

filename=filename+".gguf"


n_gpu_layers = 32  
n_batch = 512  
n_threads=4
llm = LlamaCpp(
    model_path="/content/"+filename,
    n_threads=n_threads,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    n_ctx=2048,
    temperature=0.8,
    repeat_penalty=1.18,
    top_p=1,
    top_k=3,
    max_tokens=256,
    streaming=True,
    #verbose=True,
)

Screenshot (219).png
Screenshot (220).png
i try this ,but facing same error

Make sure the single quotes on the conversion line are backtics `
In your code it looks like you removed them

So far my work with this is showing dropped words. I'm not sure if it's something due to the conversion or the Beta status of the new Llama.cpp
I will be going back to v0.1.78 but will keep an eye on the cutting edge to see how this works out

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python==0.1.78 --no-cache-dir

More information for the conversion script
Looks like -c 4096 and --eps 1e-5 should be used for Llama2

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py -c 4096 --eps 1e-5 --input `ls -tr /content/*ggmlv3*.bin | head -1` --output `ls -tr /content/*ggmlv3*.bin | head -1`.gguf

Convert GGMLv3 models to GGUF

--input, -i (Input GGMLv3 filename)
--output, -o (Output GGUF filename)
--name (Set model name)
--desc (Set model description)
--gqa default = 1 (grouped-query attention factor (use 8 for LLaMA2 70B))
--eps default = '5.0e-06' (RMS norm eps: Use 1e-6 for LLaMA1 and OpenLLaMA, use 1e-5 for LLaMA2)
--context-length, -c default = 2048 (Default max context length: LLaMA1 is typically 2048, LLaMA2 is typically 4096'))
--model-metadata-dir, -m (Load HuggingFace/.pth vocab and metadata from the specified directory'))
--vocab-dir (directory containing tokenizer.model, if separate from model file - only meaningful with --model-metadata-dir)
--vocabtype ["spm", "bpe"] (vocab format - only meaningful with --model-metadata-dir and/or --vocab-dir (default: spm))

Fix for "Could not load Llama model from path":

Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Code Example:

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Then Change "verbose=False" to "verbose=True" like the following code:

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)

I used to get the same error then, I included these lines and it worked!!

!pip install gguf #https://github.com/ggerganov/llama.cpp/tree/master/gguf-py
!git clone https://github.com/ggerganov/llama.cpp

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"

thanks @AbdelrahmanAhmed and @actionpace

@TheBloke thanks for the great work:)

any inputs on 70b ?
I initially loaded the GGML version by mistake instead of GGUF and discovered that LLama.cpp doesn't support GGML. I then converted it to GGUF using the LLama.cpp repository, but I'm still encountering the same error.(70B versions)
still getting the same errors..

modelq2gguf='/media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin'
llm = LlamaCpp(
model_path=modelq2gguf,
temperature=0.75,
max_tokens=2000,
top_p=1,
callback_manager=callback_manager,
verbose=True
)
ValidationError: 1 validation error for LlamaCpp
root
[Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin. Received error (type=value_error)]
(ValidationError: 1 validation error for LlamaCpp
root
Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin. Received error Model path does not exist: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin )

For those wanting a fully-baked way with macOS, that works as of Sept 12 2023:

# requirements.txt
huggingface-hub==0.17.1
llama-cpp-python==0.1.85

Please make sure your Python 3.11 supports amd64 per this:

python -m venv venv
source venv/bin/activate
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 python -m pip install -r requirements.txt --no-cache-dir

Then this Python code:

import pathlib

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

HF_REPO_NAME = "TheBloke/Llama-2-13B-chat-GGUF"
HF_MODEL_NAME = "llama-2-13b-chat.Q4_K_S.gguf"
REPO_MODELS_FOLDER = pathlib.Path(__file__).parent / "models"

REPO_MODELS_FOLDER.mkdir(exist_ok=True)
model_path = hf_hub_download(
    repo_id=HF_REPO_NAME, filename=HF_MODEL_NAME, local_dir=REPO_MODELS_FOLDER
)

llm = Llama(model_path=model_path, n_gpu_layers=1)  # n_gpu_layers uses macOS Metal GPU
llm("What is the capital of China?")

can anyone explain me about gpu_layers..
I've 3 cards of nvidia v100 tesla 32 gb each, now how many gpu layers i have to pass as atribute:
llm = AutoModelForCausalLM.from_pretrained('/media/iiit/Karvalo/zuhair/llama/llama70b_q2/genz-70b.Q2_K.gguf', model_type='llama', gpu_layers=gpu_layers)
it is accepting from [0-infinite) as Ive checked it for 100000 as the value, its still accepting it.

@zuhashaik I think your question is outside this discussion's scope, but check these links for info:

If you have further questions, I think it's worth a designated discussion thread somewhere else

This comment has been hidden

Fix for "Could not load Llama model from path":

Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Code Example:

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Then Change "verbose=False" to "verbose=True" like the following code:

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)

This "verbose=True" worked for me
Thanks

Thank you all!
But I'm still confused about the value of gpu_layers, what does the value of gpu_layers say in llama.cpp
It works when you use gpu_layers=[0 / 10/ 100000000000].
what does the value indicates? percentage?

@dorike hi, I've tried your code, but I'm still facing the same error ( Could not load Llama model from path: TheBloke/CodeLlama-13B-Python-GGUF/codellama-13b-python.Q5_K_M.gguf. Received error Model path does not
exist: TheBloke/CodeLlama-13B-Python-GGUF/codellama-13b-python.Q5_K_M.gguf (type=value_error))
do you have any solutions for this? thx

Gpu layers offload the model to gpu. I think around 50 to 70ish should be enough?

Also, the reason it’s not working is because you have to replace it with your model path?

!ls /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin
!cp /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin /content/
model_path = "/content/llama-2-13b-chat.ggmlv3.q5_1.bin"

to solve

Can I do the same process for reading tables from a pdf?

any inputs on 70b ?
I initially loaded the GGML version by mistake instead of GGUF and discovered that LLama.cpp doesn't support GGML. I then converted it to GGUF using the LLama.cpp repository, but I'm still encountering the same error.(70B versions)
still getting the same errors..

modelq2gguf='/media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin'
llm = LlamaCpp(
model_path=modelq2gguf,
temperature=0.75,
max_tokens=2000,
top_p=1,
callback_manager=callback_manager,
verbose=True
)
ValidationError: 1 validation error for LlamaCpp
root
[Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama70b_q2/llama-2-70b.gguf.q2_K.bin. Received error (type=value_error)]
(ValidationError: 1 validation error for LlamaCpp
root
Could not load Llama model from path: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin. Received error Model path does not exist: /media/iiit/Karvalo/zuhair/llama/llama-2-70b.ggmlv3.q4_1.bin )

If someone is still trying the @actionpace starter notebook given above and getting the same error, try looking at the paths. For example, I couldn't locate the conversion script at the path in the following cmd or at least the name wasn't correct

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

go to the llama.cpp folder (or in the dir you passed as dir in the previous code line) and find the conversion script manually, copy and paste the path into the above cmd, for me the changed cmd was

!python /content/llama.cpp/convert-llama-ggml-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf
This comment has been hidden

Fix for "Could not load Llama model from path":

Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Code Example:

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Then Change "verbose=False" to "verbose=True" like the following code:

llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)

This "verbose=True" worked for me
Thanks

Use this colab code as a starting point
At this time I do not know if some parameters in LlamaCpp() are ignored or if they need to be in some sort of metafile as input to the conversion but at least the model should work

!pip install -qq langchain wget 
!pip install gguf  #https://github.com/ggerganov/llama.cpp/tree/master/gguf-py
!git clone https://github.com/ggerganov/llama.cpp
!pip -qq install git+https://github.com/huggingface/transformers
#Assuming you are using a GPU
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

from huggingface_hub import hf_hub_download
repo_id="TheBloke/Llama-2-13B-GGML"; filename="llama-2-13b.ggmlv3.q5_1.bin"
hf_hub_download(
    repo_id=repo_id, filename=filename,
    local_dir="/content"
)

!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --input `ls -t /content/*ggmlv3*.bin | head -1` --output `ls -t /content/*ggmlv3*.bin | head -1`.gguf

filename=filename+".gguf"


n_gpu_layers = 32  
n_batch = 512  
n_threads=4
llm = LlamaCpp(
    model_path="/content/"+filename,
    n_threads=n_threads,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    n_ctx=2048,
    temperature=0.8,
    repeat_penalty=1.18,
    top_p=1,
    top_k=3,
    max_tokens=256,
    streaming=True,
    #verbose=True,
)

I tried to convert llama-2-13b.ggmlv3.q5_1.bin into a UUGF file with the above code like
!python /content/llama.cpp/convert-llama-ggmlv3-to-gguf.py --inputls -t /content/ggmlv3.bin | head -1--outputls -t /content/ggmlv3.bin | head -1.gguf.

Unfortunately, it doesn't work for me. The error message is

raise ValueError(f"Quantized tensor bytes per row ({shape[-1]}) is not a multiple of {quant_type.name} type size ({type_size})")
AttributeError: 'int' object has no attribute 'name'

How could I fix this issue, if I want to convert a GGML to a GGUF ?

Sign up or log in to comment