--- library_name: peft base_model: meta-llama/Llama-2-13b-chat-hf license: mit language: - en pipeline_tag: text2text-generation --- # Chadgpt Llama2 13b ## Colab Example https://colab.research.google.com/drive/1esMSQUSPyQtOY_3DedyQFKBlTrE9A2vM?usp=sharing ## Install Prerequisite ```bash !pip install -q git+https://github.com/huggingface/peft.git !pip install transformers !pip install -U accelerate !pip install accelerate !pip install bitsandbytes # Instal bits and bytes for inference of the model ``` ## Login Using Huggingface Token ```bash # You need a huggingface token that can access llama2 !huggingface-cli login ``` ## Download Model ```python import torch from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer peft_model_id = "danjie/Chadgpt-Llama2-13b" config = PeftConfig.from_pretrained(peft_model_id) model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto') tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) # Load the Lora model model = PeftModel.from_pretrained(model, peft_model_id) ``` ## Inference ```python def talk_with_llm(tweet: str) -> str: # Encode and move tensor into cuda if applicable. encoded_input = tokenizer(tweet, return_tensors='pt') encoded_input = {k: v.to("cuda") for k, v in encoded_input.items()} output = model.generate(**encoded_input, max_new_tokens=64) response = tokenizer.decode(output[0], skip_special_tokens=True) return response talk_with_llm(" Your sentence \n") ```