I am experimenting with some of the time-series forecasting transformer models. In the example blog, I notice that the data transformation does not include any normalization of the inputs.
Link to blog: Probabilistic Time Series Forecasting with 🤗 Transformers
I’m quite new to ML, and from what I’ve read so far it is usually good practice to normalize inputs to neural nets to prevent things like gradient explosion.
Is it necessary to normalize the inputs to a time series transformer? Why/why not?