Since yesterday, we’ve been having issues with dedicated inference endpoints. We have two models continuously deployed on dedicated endpoints on an organization account.
Yesterday at 10.00 AM UTC the endpoints were suddenly paused, with the latest update attributed to “admin”. At the same time, the endpoints WebUI displayed the error: “There was an error while fetching your endpoints.”
I tried logging out and back in, at which point I was presented with an authorization screen, asking me to permit Inference Endpoints to access my (and the organization’s) HuggingFace account.
After granting access (~10.30 AM UTC) I was able to see the endpoints, and manually restart them. Both the WebUI and the Endpoints’ API continued to work fine yesterday.
The WebUI was again rendered inaccessible today (seen at ~8.20 AM UTC), displaying the “There was an error while fetching your endpoints.” error on the organization’s dedicated endpoints. The issue seems to have resolved at ~8.50. This time, the endpoints were not paused but continued to work through the WebUI outage.
Additionally, each time I log out and log in again I’m always asked to authorize Inference Endpoints’ access to HuggingFace account data. Not sure how relevant the authorization issue is to the WebUI errors, could be just a coincidence.