Hi there!
Converting a model to ONNX using python -m transformers.onnx --model=dslim/bert-large-NER onnx
and loading the model up in Java using the onnxruntime library gives off different results compared to when run using HF.
I’ve found this issue [Bug] Attention and QAttention don't work properly in some cases · Issue #14363 · microsoft/onnxruntime · GitHub and just wanted to check whether there’s a solution to this that I could use.
Thanks!