winglian's picture
bump deepspeed for fix for grad norm compute putting tensors on different devices (#1699)
851ccb1 unverified