xls-r-greek-aivaliot

Aivaliot is a variety of Greek that was spoken in Aivali (known as Ayvalık in Turkish), located on the Edremit Gulf in Western Turkey, till the beginning of the 20th century. After the end of the war between Greece and Turkey (1919–1922) and the defeat of the Greek army, those Aivaliots who managed to survive flew to Greece, principally to the nearby island of Lesbos, where they settled in various dialectal enclaves. Aivaliot resembles Lesbian in many respects. According to Ralli (Ralli, 2019), Aivaliot and Lesbian belong to the group of Northern Greek Dialects, sharing unstressed /i/ and /u/ deletion and unstressed /o/ and /e/ raising. Aivaliot morphology and the lexicon are influenced by Turkish, because of a long domination by the Ottomans, as well as by Italo-Romance, due to the pre-Ottoman Genovese rule and trade with Venice (Ralli, 2019b). However, there are no Turkish or Italo-Romance influences on phonology or syntax. In 2002, a handful of first-generation Aivaliot speakers could still be found in Lesbos and elsewhere in Greece and abroad, where they still remembered and practiced their mother tongue (Ralli, 2019). Nowadays, the dialect is on the way to extinction, since second-generation speakers either have a passive knowledge of it, or those living in Lesbos mix their own dialectal variety with the parent Lesbian.

This is the first automatic speech recognition (ASR) model for Aivaliot. To train the model, we fine-tuned a Greek XLS-R model (jonatasgrosman/wav2vec2-large-xlsr-53-greek) on the Aivaliot resources.

Resources

We used recordings from the Asia Minor Archive (AMiGre) to train the model. AMiGre was compiled within the framework of two research projects that ran in the periods 2002-2005 and 2012-2016. We obtained permission to use it from the studies’ authors. It consists of narratives elicited from 18 elderly speakers (5 male, 13 female), all refugees from Aivali, who had settled in different villages of the island of Lesbos. The data collection was carried out in 2002-2003, after obtaining a written consent of the informants, as well as the approval of the Ethics committee of the University of Patras. The corpus has a total duration of almost 14 hours. It has been transcribed and annotated by two native speakers of the dialect, using a transcription system based on the Greek alphabet and orthography, which is adapted according to SAMPA. The annotations include metadata information, such as the source of the data, the identity and background of the informants, and the conditions of the data collection. The corpus is stored on the server of the Laboratory of Modern Greek Dialects of the University of Patras and is freely accessible online

To prepare the dataset, the texts were normalized (see greek_dialects_asr/ for scripts), and all audio files were converted into a 16 kHz mono format. We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 10h 14m 44s. Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration (compared to the initial 14h recordings).

Metrics

We evaluated the model on the test set split, which consists of 10% of the dataset recordings.

Model CER WER
pre-trained 104.80% 113.67%
fine-tuned 39.55% 73.83%

Training hyperparameters

We fine-tuned the baseline model (wav2vec2-large-xlsr-53-greek) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:

arg value
per_device_train_batch_size 8
gradient_accumulation_steps 2
num_train_epochs 35
learning_rate 3e-4
warmup_steps 500

Citation

To cite this work or read more about the training pipeline, see:

S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024.

Downloads last month
0
Safetensors
Model size
315M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.