Continued pre-training on mistralai/Mistral-Nemo-Instruct-2407 using the Kurdish wiki dataset with unsloth. This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding. It's a quantized model using bitsandbytes so that it uses less memory. See bitsandbytes documentation.

There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find). Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish.

Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani.

Use

Should be fine-tuned further for a specific task. See instruction fine-tuned model nazimali/Mistral-Nemo-Kurdish-Instruct.

Training

Transformers 4.44.2
1 NVIDIA A100 80GB PCIe
Duration 6h 31m 4s

{
  "total_flos": 4121524790259794000,
  "train/epoch": 1,
  "train/global_step": 1960,
  "train/grad_norm": 3.1958093643188477,
  "train/learning_rate": 0,
  "train/loss": 1.2108,
  "train_loss": 1.256846008738693,
  "train_runtime": 23227.1752,
  "train_samples_per_second": 2.7,
  "train_steps_per_second": 0.084
}

Pre-training data:

  • nazimali/kurdish-wikipedia-articles
    • Dataset number of rows: 63,076
    • Filtered columns title, text
      • Must have at least 1 character
  • Number of rows used for training: 62,720

Training prompt format:

training_prompt = """Gotara Wikipedia
### Sernav: {}

### Gotar:
{}"""
Downloads last month
31
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nazimali/Mistral-Nemo-Kurdish

Finetuned
(43)
this model
Finetunes
1 model
Quantizations
5 models

Dataset used to train nazimali/Mistral-Nemo-Kurdish

Space using nazimali/Mistral-Nemo-Kurdish 1