Ah I see, sorry! For the cancellation part, this paragraph explains it:
My understanding is that if the indirect effect of a component is ~equal to the direct one but with opposite direction, then its contribution should be ~0 but risks being non-zero due to approximation errors on the indirect path (if the resulting value is ~0, even very tiny mistakes going through nonlienarities might be blown up). With GradDrop, they basically handle this situation by avoiding taking the difference, and instead estimate the effects of the directs and indirect paths separately