Specify library_name metadata
Hello @avsolatorio !
Pull Request overview
- Specify
library_name
metadata
Details
Hugging Face will be able to automatically add "Use in Sentence Transformers" etc. buttons to your model if the library_name
is specified as Sentence Transformers.
Edit: It seems that the online README metadata editor doesn't like floating point values ending in .0
, nor that end-of-file marker. Apologies that this made the PR a bit bigger than I had intended.
- Tom Aarsen
Thanks! :D
Hello!
I see now that the pipeline can't be automatically inferred anymore:
We can resolve this by setting the pipeline_tag
to either: sentence-similarity
or feature-extraction
. Most people set both in the tags
as well so it's easier to search for the model.
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
bge-m3 is an example that uses Sentence Similarity:
intfloat/multilingual-e5-large is an example that uses Feature Extraction:
This also affects the default pipeline used in the free serverless Inference Endpoints, e.g. see this link: https://huggingface.co/intfloat/multilingual-e5-large?inference_api=true
So, in short, I would recommend choosing the one that you prefer and adding it in the metadata :)
- Tom Aarsen
Hello Tom,
Great suggestion! I tried changing the default pipeline to sentence similarity, but it did not work. 😅
At first, it complained that no pytorch_model.bin
existed in the model directory. I attempted to fix this by uploading a pytorch_model.bin
file created using torch.save(model, 'pytorch_model.bin')
, but this seems incorrect since I am getting a new error saying the 'SentenceTransformer' object has no attribute 'keys'
.
Do you have any suggestions on how I might address this correctly?
Thank you!
Best,
Aivin
Oh, that makes sense actually - Sentence Transformers only very recently gotmodel.safetensors
support, so the pipeline code probably still uses an older version.
Saving a model with the old pytorch_model.bin
is a bit tricky with Sentence Transformers actually:
model = SentenceTransformer("avsolatorio/GIST-Embedding-v0")
model[0].auto_model.save_pretrained("tmp", safe_serialization=False)
This gets the underlying transformers
model, that way we keep it compatible with core transformers
.
I'll make you a PR!
- Tom Aarsen