Hi!
I’m planning on fine-tuning XLSR-Wav2Vec2 in Arabic using Common Voice. I already made code changes to transformers examples that should help with Arabic (e.g., Buckwalter orthography): transformers/examples/research_projects/wav2vec2 at master · huggingface/transformers · GitHub
You may also start with my pre-trained model: elgeish/wav2vec2-large-xlsr-53-arabic · Hugging Face – it should be easy to tweak the sample script to transliterate into Buckwalter (to match its vocab). I’m happy to help and collaborate. Good luck and happy fine-tuning!