merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: dare_ties  # Merge method for dynamic, task-aware parameter blending.
base_model: CultriX/Qwen2.5-14B-Wernickev3  # Main backbone for parameter alignment.
dtype: bfloat16  # Efficient precision for memory usage.
out_dtype: bfloat16  # Output data type to maintain consistency and efficiency.

parameters:
  epsilon: 0.010   # Fine-tuned scaling for precise parameter adjustments.
  lambda: 2.0   # Emphasizes high-impact parameters for improved task performance.
  normalize: true  # Ensures parameter normalization for stability during merging.
  rescale: true  # Rescales parameters across models for better integration.
  int8_mask: false  # Disables int8 masking to preserve full precision.

adaptive_merge_parameters:
  task_weights:  # Weight prioritization for tasks.
    tinyArc: 1.6  # Balanced focus on logical reasoning.
    tinyHellaswag: 1.5  # Moderate priority for contextual reasoning.
    tinyMMLU: 1.8  # High priority for multi-domain knowledge tasks.
    tinyTruthfulQA: 2.2  # High emphasis on factual QA accuracy.
    tinyTruthfulQA_mc1: 1.8  # High priority for multiple-choice factual QA.
    tinyWinogrande: 1.75  # Moderate priority for contextual reasoning tasks.
    IFEval: 2.5  # Maximum priority for instruction-following tasks.
    BBH: 2.2  # High priority for complex reasoning tasks.
    MATH: 2.8  # Maximum priority for mathematical reasoning.
    GPQA: 2.2  # Balanced focus on graduate-level QA tasks.
    MUSR: 2.2  # High priority for multi-step reasoning.
    MMLU-PRO: 2.0  # High priority for multitask, domain-specific knowledge.
  smoothing_factor: 0.03  # Precise blending of task-specific contributions.

gradient_clipping:  # Gradient clipping for stability during merging.
  CultriX/Qwen2.5-14B-Wernickev3: 0.89  # Stability for the base model.
  djuna/Q2.5-Veltha-14B-0.5: 0.91  # Stability for reasoning contributions.
  CultriX/SeQwence-14B-EvolMerge: 0.87  # Stabilized for multitask performance.
  qingy2024/Fusion4-14B-Instruct: 0.93  # High stability for mathematical reasoning.
  CultriX/Qwen2.5-14B-Emerged: 0.89  # Stability for multitask contributions.
  sometimesanotion/Lamarck-14B-v0.6: 0.89  # Stability for multi-step reasoning.
  allknowingroger/QwenSlerp6-14B: 0.90  # Stability for general reasoning and multitask tasks.
  hotmailuser/QwenSlerp2-14B: 0.91  # Stabilized for instruction following.
  CultriX/Qwen2.5-14B-Hyperionv3: 0.90  # Stability for this model's general performance.
  CultriX/Qwen2.5-14B-Brocav7: 0.90  # Stability for specific task contributions.

models:  # Definition of models and their weights/densities.
  - model: CultriX/Qwen2.5-14B-Wernickev3  # Base generalist model.
    parameters:
      weight: 0.28  # Balanced weight for a strong backbone.
      density: 0.78  # Slightly reduced to balance smaller contributors.

  - model: djuna/Q2.5-Veltha-14B-0.5  # Reasoning-focused model.
    parameters:
      weight: 0.27  # Slightly reduced for better balance.
      density: 0.77  # Balanced density to ensure nuanced reasoning contributions.

  - model: allknowingroger/QwenSlerp6-14B  # Strong multitask performer.
    parameters:
      weight: 0.15  # Balanced weight for generalist capabilities.
      density: 0.76  # Balanced density to maintain stable contributions.

  - model: hotmailuser/QwenSlerp2-14B  # High IFEval performer.
    parameters:
      weight: 0.12  # Maintains stable contributions for instruction-following tasks.
      density: 0.70  # Increased density to enhance integration.

  - model: CultriX/Qwen2.5-14B-Hyperionv3  # Generalist model with solid performance.
    parameters:
      weight: 0.10  # Increased for balanced general contributions.
      density: 0.75  # Balanced density for stable integration.

  - model: CultriX/Qwen2.5-14B-Brocav7  # Model for specific tasks like reasoning.
    parameters:
      weight: 0.10  # Increased weight to strengthen specific contributions.
      density: 0.76  # Increased density for better parameter preservation.

  - model: CultriX/SeQwence-14B-EvolMerge  # Multitask generalist.
    parameters:
      weight: 0.08  # Balanced weight for broader coverage.
      density: 0.68  # Slight increase for better integration.

  - model: qingy2024/Fusion4-14B-Instruct  # Specialist in mathematical reasoning.
    parameters:
      weight: 0.08  # Balanced weight for MATH tasks.
      density: 0.78  # Increased density to enhance task-specific integration.

  - model: CultriX/Qwen2.5-14B-Emerged  # General multitask model.
    parameters:
      weight: 0.08  # Balanced for multitask contributions.
      density: 0.72  # Increased density for better parameter alignment.

  - model: sometimesanotion/Lamarck-14B-v0.6  # Multi-step reasoning focus.
    parameters:
      weight: 0.05  # Slightly increased to improve its contributions.
      density: 0.65  # Increased for better parameter blending.
Downloads last month
0
Safetensors
Model size
14.8B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-Hyperionv7