pythontestmerge

This is a merge of pre-trained language models created using mergekit.

Catastrophic forgetting test results:

Initial evaluation loss on 1k subset of HuggingFaceTB/cosmopedia-100k dataset was 1.038. (I'm impressed.)

100 steps of LISA training isn't strictly reducing this over time, it's reducing but jumping around a bit. Might be converged to within that method's margin of error; cosmo-1b itself jumped 0.02 points with LISA training.

Comparison to control: cosmo-1b started out with 1.003 loss on (a different subset of) dataset, increasing to 1.024 at 100 steps.

Method by method comparison, initial evaluation loss on Cosmopedia data:

  • Full tuning (aka continued pretraining), batch 8: 1.615
  • LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
  • QLoRA with Dora (otherwise like below): 1.105
  • Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
  • Galore tuning, rank 256, scale factor 1, batch 8: 1.182
  • This Model Stock merge of all 4 training methods: 1.038
  • Model Stock 3/4 Methods (all except full tuning): 1.021
  • Control (cosmo-1b): 1.003

Training set validation results:

  • Cosmo-1b Starting Eval Loss: ~0.65
  • Model Stock 3/4 Loss: 0.451
  • Model Stock Loss: 0.40211
  • LISA Loss: 0.2534
  • GaLore Loss: 0.2426
  • QLoRA Loss: 0.2078
  • QLoRA with Dora Loss: 0.2055 (almost identical training graph)
  • Full Tune Loss: 0.2049

Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM. The Model Stock merge between the 4 different tuning methods clearly recovered a lot of original knowledge, at the cost of something like half the adaptation to new data. Of course, cosmo-1b was already pretty good at predicting the new data, narrow and task-focused as it was.

Merge Details

Merge Method

This model was merged using the Model Stock merge method using HuggingFaceTB/cosmo-1b as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Lambent/cosmo-1b-lisa-pythontest
  - model: Lambent/cosmo-1b-qlora-pythontest
  - model: Lambent/cosmo-1b-galore-pythontest
  - model: Lambent/cosmo-1b-tune-pythontest
base_model: HuggingFaceTB/cosmo-1b
merge_method: model_stock
parameters:
  filter_wise: false
dtype: float16
Downloads last month
13
Safetensors
Model size
1.74B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Lambent/cosmo-1b-stock-pythontest