[GGUF]

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
  - sources:
      - model: bamec66557/mergekit-slerp-xjlmywj
        layer_range: [0, 20]
      - model: bamec66557/MISCHIEVOUS-12B-Mix_0.3v
        layer_range: [0, 20]
    parameters:
      t:
        - value: 0.8

  - sources:
      - model: bamec66557/mergekit-slerp-xjlmywj
        layer_range: [20, 40]
      - model: bamec66557/MISCHIEVOUS-12B-Mix_0.3v
        layer_range: [20, 40]
    parameters:
      t:
        - value: 1.0
        - filter: self_attn
          value: [0.8, 0.9, 1.0, 1.1, 1.2]

merge_method: slerp  # Preserve merge method

base_model: bamec66557/mergekit-slerp-xjlmywj  # Base model

dtype: bfloat16  # Data types for fast merges

# Additional options
regularization:
  - method: weight_clipping
    clip_range: [-0.1, 0.1]

postprocessing:
  - operation: gaussian_smoothing
    sigma: 1.5  # Gaussian smoothing intensity
  - operation: smoothing
    parameters:
      adaptive: true
      range: [0.8, 1.2]  # Adaptively adjust
      kernel_size: 5  # Smoothing larger ranges with increased kernel size
  - operation: normalize  # Normalise after merge

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 26.31
IFEval (0-Shot) 65.08
BBH (3-Shot) 30.60
MATH Lvl 5 (4-Shot) 11.48
GPQA (0-shot) 8.95
MuSR (0-shot) 11.97
MMLU-PRO (5-shot) 29.81
Downloads last month
222
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bamec66557/MISCHIEVOUS-12B-Mix_0.4v

Finetuned
(1)
this model
Merges
6 models
Quantizations
4 models

Dataset used to train bamec66557/MISCHIEVOUS-12B-Mix_0.4v

Collection including bamec66557/MISCHIEVOUS-12B-Mix_0.4v

Evaluation results