Can I obtain finer grained information when encoding images? For example, the attention score for each patch?
· Sign up or log in to comment