๐Ÿš€ al-baka-llama3-8b (Main Model)

Al Baka is an Fine Tuned Model based on the new released LLAMA3-8B Model on the Stanford Alpaca dataset Arabic version Yasbok/Alpaca_arabic_instruct.

Model Summary

Model Details

  • The model was fine-tuned and mergen in 16-bit precision using unsloth

How to Get Started with the Model

Setup

# Install packages
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

First, Load the Model

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

Second, Try the model

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
       "ุงุณุชุฎุฏู… ุงู„ุจูŠุงู†ุงุช ุงู„ู…ุนุทุงุฉ ู„ุญุณุงุจ ุงู„ูˆุณูŠุท.", # instruction
        "[2 ุŒ 3 ุŒ 7 ุŒ 8 ุŒ 10]", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Recommendations

  • unsloth for finetuning models. You can get a 2x faster finetuned model which can be exported to any format or uploaded to Hugging Face.
Downloads last month
82
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b 6

Collection including Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b