---
license: cc-by-nc-4.0
language:
- en
tags:
- text-generation
datasets:
- stanford_alpaca
pipeline_tag: text-generation
---
LLM Generation models trained by Jina AI, Finetuner team. This repo contains the lora weights (8bit) for Falcon-7b fit on the [Code Alpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) dataset. ## Reproduction This version of the weights was trained with the following hyperparameters: - Epochs: 6 - Batch size: 128 - Micro batch size: 8 - Learning rate: 3e-4 - Lora _r_: 8 - Lora target modules: query_key_value You can reproduce using this repository: https://github.com/jina-ai/jerboa Make sure you install requirements and finetune using this command using the following command: ``` python finetune.py \ --base-model tiiuae/falcon-7b --lora-target-modules query_key_value \ --data-path sahil2801/CodeAlpaca-20k --output-dir ./lora-alpaca-code \ --batch-size 128 --micro-batch-size 8 --eval-limit 45 \ --eval-file code_eval.jsonl --wandb-project jerboa --wandb-log-model \ --wandb-watch gradients --num-epochs 6 ``` ## Inference ```Python import torch from peft import PeftModel from transformers import AutoTokenizer, AutoModelForCausalLM TOKENIZER_SOURCE = 'tiiuae/falcon-7b' BASE_MODEL = 'tiiuae/falcon-7b' LORA_REPO = 'jinaai/falcon-7b-code-alpaca-lora' DEVICE = "cuda" PROMPT = """ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Write a for loop in python ### Input: ### Response: """ model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path=BASE_MODEL, torch_dtype=torch.float16, trust_remote_code=True, device_map='auto', ) model = PeftModel.from_pretrained( model=model, model_id=LORA_REPO, ) model.eval() tokenizer = AutoTokenizer.from_pretrained( TOKENIZER_SOURCE, trust_remote_code=True, padding_side='left', ) tokenizer.pad_token = tokenizer.eos_token inputs = tokenizer(PROMPT, return_tensors="pt") input_ids = inputs["input_ids"].to(DEVICE) input_attention_mask = inputs["attention_mask"].to(DEVICE) with torch.no_grad(): generation_output = model.generate( input_ids=input_ids, attention_mask=input_attention_mask, return_dict_in_generate=True, max_new_tokens=32, eos_token_id=tokenizer.eos_token_id, ) generation_output = generation_output.sequences[0] output = tokenizer.decode(generation_output, skip_special_tokens=True) print(output) ``` ## Contact Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.