Here's a "continued pre-trained" model using Finnish Wikipedia dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you.

LoRA: mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B

Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours.

So it does have some issues but I could try training it on Gemma 2 2B and see if that's a better model for this (Gemma 2 already is better at Finnish than Llama 3) and maybe add more datasets containing Finnish.

Evaluation

Model Size Type FIN-bench (score) Without math
mpasila/Llama-3.2-Finnish-Wikipedia-1B 1B Base 0.3170 0.4062
unsloth/Llama-3.2-1B 1B Base 0.4029 0.3881
Finnish-NLP/llama-7b-finnish 7B Base 0.2350 0.4203
LumiOpen/Viking-7B (1000B) 7B Base 0.3721 0.4453
HPLT/gpt-7b-nordic-prerelease 7B Base 0.3169 0.4524

Source

FIN-bench scores:

Task Version Metric Value Stderr
bigbench_analogies 0 multiple_choice_grade 0.4846 ± 0.0440
bigbench_arithmetic_1_digit_addition 0 multiple_choice_grade 0.0300 ± 0.0171
bigbench_arithmetic_1_digit_division 0 multiple_choice_grade 0.0435 ± 0.0435
bigbench_arithmetic_1_digit_multiplication 0 multiple_choice_grade 0.0200 ± 0.0141
bigbench_arithmetic_1_digit_subtraction 0 multiple_choice_grade 0.0700 ± 0.0256
bigbench_arithmetic_2_digit_addition 0 multiple_choice_grade 0.2200 ± 0.0416
bigbench_arithmetic_2_digit_division 0 multiple_choice_grade 0.0800 ± 0.0273
bigbench_arithmetic_2_digit_multiplication 0 multiple_choice_grade 0.2400 ± 0.0429
bigbench_arithmetic_2_digit_subtraction 0 multiple_choice_grade 0.1800 ± 0.0386
bigbench_arithmetic_3_digit_addition 0 multiple_choice_grade 0.3300 ± 0.0473
bigbench_arithmetic_3_digit_division 0 multiple_choice_grade 0.2100 ± 0.0409
bigbench_arithmetic_3_digit_multiplication 0 multiple_choice_grade 0.3000 ± 0.0461
bigbench_arithmetic_3_digit_subtraction 0 multiple_choice_grade 0.5500 ± 0.0500
bigbench_arithmetic_4_digit_addition 0 multiple_choice_grade 0.2800 ± 0.0451
bigbench_arithmetic_4_digit_division 0 multiple_choice_grade 0.2500 ± 0.0435
bigbench_arithmetic_4_digit_multiplication 0 multiple_choice_grade 0.1500 ± 0.0359
bigbench_arithmetic_4_digit_subtraction 0 multiple_choice_grade 0.4400 ± 0.0499
bigbench_arithmetic_5_digit_addition 0 multiple_choice_grade 0.5100 ± 0.0502
bigbench_arithmetic_5_digit_division 0 multiple_choice_grade 0.3000 ± 0.0461
bigbench_arithmetic_5_digit_multiplication 0 multiple_choice_grade 0.3100 ± 0.0465
bigbench_arithmetic_5_digit_subtraction 0 multiple_choice_grade 0.4000 ± 0.0492
bigbench_cause_and_effect_one_sentence 0 multiple_choice_grade 0.5882 ± 0.0696
bigbench_cause_and_effect_one_sentence_no_prompt 0 multiple_choice_grade 0.3922 ± 0.0690
bigbench_cause_and_effect_two_sentences 0 multiple_choice_grade 0.4510 ± 0.0704
bigbench_emotions 0 multiple_choice_grade 0.1938 ± 0.0313
bigbench_empirical_judgments 0 multiple_choice_grade 0.3434 ± 0.0480
bigbench_general_knowledge 0 multiple_choice_grade 0.2714 ± 0.0535
bigbench_hhh_alignment_harmless 0 multiple_choice_grade 0.3966 ± 0.0648
bigbench_hhh_alignment_helpful 0 multiple_choice_grade 0.3729 ± 0.0635
bigbench_hhh_alignment_honest 0 multiple_choice_grade 0.3390 ± 0.0622
bigbench_hhh_alignment_other 0 multiple_choice_grade 0.5581 ± 0.0766
bigbench_intent_recognition 0 multiple_choice_grade 0.0925 ± 0.0110
bigbench_misconceptions 0 multiple_choice_grade 0.4403 ± 0.0430
bigbench_paraphrase 0 multiple_choice_grade 0.5000 ± 0.0354
bigbench_sentence_ambiguity 0 multiple_choice_grade 0.4833 ± 0.0651
bigbench_similarities_abstraction 0 multiple_choice_grade 0.5921 ± 0.0567

Uploaded Llama-3.2-Finnish-Wikipedia-1B model

  • Developed by: mpasila
  • License: Llama 3.2 Community License Agreement
  • Finetuned from model : unsloth/Llama-3.2-1B

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
375
Safetensors
Model size
1.5B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mpasila/Llama-3.2-Finnish-Wikipedia-1B

Finetuned
(19)
this model
Quantizations
1 model

Dataset used to train mpasila/Llama-3.2-Finnish-Wikipedia-1B