Out of curiosity, is it possible to use wav2vec2-xlsr without finetuning provided we pass in a tokenizer for the language of choice? All example of usage have been on finetuned models, and I was wondering if there was a reason for always finetuning, and if it even makes sense to use wav2vec2-xlsr without finetuning.
Thanks in advance!