--- base_model: HuggingFaceM4/Idefics3-8B-Llama3 library_name: peft license: apache-2.0 tags: - generated_from_trainer model-index: - name: idefics3-llama-gui-dense-descriptions results: [] datasets: - Agent-Eval-Refine/GUI-Dense-Descriptions language: - en --- # idefics3-llama-gui-dense-descriptions This model is a fine-tuned version of [HuggingFaceM4/Idefics3-8B-Llama3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) on https://huggingface.co/datasets/Agent-Eval-Refine/GUI-Dense-Descriptions dataset ## Intended usage ```python from peft import PeftModel from transformers import AutoProcessor, Idefics3ForConditionalGeneration from transformers.image_utils import load_image import torch adapter_path = "Maverick17/idefics3-llama-gui-dense-descriptions" base_model_id = "HuggingFaceM4/Idefics3-8B-Llama3" # Load Model base model model = Idefics3ForConditionalGeneration.from_pretrained( base_model_id, _attn_implementation="flash_attention_2", device_map="auto", torch_dtype=torch.bfloat16, ) # Merge LoRA and base model peft_model = PeftModel.from_pretrained(model, adapter_path) merged_model = peft_model.merge_and_unload() processor = AutoProcessor.from_pretrained(base_model_id) image = load_image("path/to/ui/image.png") # Create inputs messages = [ { "role": "user", "content": [ {"type": "image"}, { "type": "text", "text": "Provide a detailed description of the image.", }, ], }, ] prompt = processor.apply_chat_template(messages, add_generation_prompt=True) inputs = processor(text=prompt, images=[image], return_tensors="pt") inputs = {k: v.to("cuda") for k, v in inputs.items()} generation_args = { "max_new_tokens": 1024, "repetition_penalty": 1, } generation_args["do_sample"] = False generation_args.update(inputs) # Generate generated_ids = model.generate(**generation_args) generated_texts = processor.batch_decode( generated_ids[:, generation_args["input_ids"].size(1) :], skip_special_tokens=True ) print(generated_texts[0].strip()) ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 2 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 1 ### Framework versions - PEFT 0.13.0 - Transformers 4.44.0.dev0 - Pytorch 2.4.1+cu121 - Datasets 3.0.1 - Tokenizers 0.19.1