Specify library_name metadata

#1
by tomaarsen HF staff - opened

Hello @avsolatorio !

Pull Request overview

  • Specify library_name metadata

Details

Hugging Face will be able to automatically add "Use in Sentence Transformers" etc. buttons to your model if the library_name is specified as Sentence Transformers.

Edit: It seems that the online README metadata editor doesn't like floating point values ending in .0, nor that end-of-file marker. Apologies that this made the PR a bit bigger than I had intended.

  • Tom Aarsen

Thanks! :D

avsolatorio changed pull request status to merged

Hello!

I see now that the pipeline can't be automatically inferred anymore:
image.png

We can resolve this by setting the pipeline_tag to either: sentence-similarity or feature-extraction. Most people set both in the tags as well so it's easier to search for the model.

pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity

This also affects the default pipeline used in the free serverless Inference Endpoints, e.g. see this link: https://huggingface.co/intfloat/multilingual-e5-large?inference_api=true

So, in short, I would recommend choosing the one that you prefer and adding it in the metadata :)

  • Tom Aarsen

Hello Tom,

Great suggestion! I tried changing the default pipeline to sentence similarity, but it did not work. 😅

At first, it complained that no pytorch_model.bin existed in the model directory. I attempted to fix this by uploading a pytorch_model.bin file created using torch.save(model, 'pytorch_model.bin'), but this seems incorrect since I am getting a new error saying the 'SentenceTransformer' object has no attribute 'keys'.

Do you have any suggestions on how I might address this correctly?

Thank you!

Best,
Aivin

Oh, that makes sense actually - Sentence Transformers only very recently gotmodel.safetensors support, so the pipeline code probably still uses an older version.
Saving a model with the old pytorch_model.bin is a bit tricky with Sentence Transformers actually:

model = SentenceTransformer("avsolatorio/GIST-Embedding-v0")
model[0].auto_model.save_pretrained("tmp", safe_serialization=False)

This gets the underlying transformers model, that way we keep it compatible with core transformers.

I'll make you a PR!

  • Tom Aarsen

Sign up or log in to comment