File size: 974 Bytes
e4f1a47
 
d18cf0b
 
 
3664779
 
 
 
 
 
 
 
 
e4f1a47
d18cf0b
c3cfb60
d18cf0b
545266e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: apache-2.0
language:
- 'no'
- en
widget:
  - text: >-
      <extra_id_0> hver uke samles Regjeringens medlemmer til Statsråd på
      <extra_id_1>. Dette organet er øverste <extra_id_2> i Norge. For at møtet
      skal være <extra_id_3>, må over halvparten av regjeringens <extra_id_4>
      være til stede.
  - text: >-
      At <extra_id_0> there are countless paintings and <extra_id_1>, some are even <extra_id_2>
      the romans.
---

This is a pruned version of the ```google/mt5-large``` model. Here, the input and output embeddings are pruned to support a greatly reduced vocabulary.
The chosen vocabulary has 30K norwegian, english and special tokens, ~12% of the old size. This reduces the model size by roughly 37%.
The model is still OK on similar languages, like German and Danish, but very different languages like arabic are not a good fit anymore.
This model is intended as a starting point for finetuning mt5 for norwegian applications.