I’m trying to replicate this blog post on fine tuning XLSR (Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers ) and I’m running into CUDA out of memory issues. I’m training on a machine with multiple nvidia titan V (12 gb memory) and even when I:
- reduce batch size to 1
- remove all clips with > 5 seconds (even reduced this down to 2 seconds)
- use adafactor instead of adamw (as suggested here: Performance and Scalability: How To Fit a Bigger Model and Train It Faster)
I still run out of memory. I’m not sure if this suggests there is a bug in my code somewhere or I simply don’t have enough memory to do this - any advice would be appreciated!