Spaces:
Running
on
CPU Upgrade
[`refactor`]: Tab & URL syncing; parameter counts as model size; filtering; search
Hello!
Pull Request overview
- Compute model size based on the number of parameters instead of the weight file size.
- Refactor Gradio initialization: now based on a nested data structure that is looped over to dynamically create Tabs.
- Tabs & URL syncing for easier sharing, e.g. selecting a tab adds
?task=overall&language=english
to the URL, and opening such a URL opens those tabs. - Add search bar.
- Add filtering options:
Open
vsAPI
and based on model sizes. - Show the model size in all tabs.
Details
Most of the changes in this PR are centered around the refactor that allows for dynamically creating Tabs. For example, the Tab & URL syncing requires some code surrounding each gr.Tabs
, which was infeasible before this refactor. Due to the size of the PR, perhaps it makes sense to review the commits separately.
Model size based on # of parameters (commit)
I've introduced a utility function that computes the number of parameters by 1) reading safetensors or 2) estimating based on file size (assuming fp32 for all estimated models). I've also added a KNOWN_BYTES_PER_PARAM
mapping from model names to the number of bytes per parameter (e.g. 4 for fp32 and 2 for fp16), in case we find a model that 1) does not use safetensors and 2) stored weights in fp16.
Beyond that, I updated the external model sizes from GB to Million Parameters.
Sadly this does cost another request for each model, so it's a bit slower to refresh.
Refactor Tabs initialization (commit)
This commit cuts ~400 lines from the app.py by using a new data structure:
data = {
"Overall": {
"metric": "Various, refer to task tabs",
"data": [
{
"language": "English",
"description": "**Overall MTEB English leaderboard** 🔮",
"data": DATA_OVERALL,
"refresh": get_mteb_average,
},
...
and then looping over this to dynamically create the Tabs. The functionality before & after this commit should be identical.
Tabs <-> URL syncing (commit)
This is fairly hacky I'm afraid, as we have to call a JavaScript function to update the current URL. This is possible with Gradio, but you have to provide e.g. gr.JSON()
, you can't just provide a normal Python dictionary. So, we use invisible JSON instances:
# Store the current task and language for updating the URL. This is a bit hacky, but it works
# for passing the current task and language to the JavaScript function via Gradio
current_task_language = gr.JSON(value=dict(), visible=False)
language_per_task = gr.JSON(value=dict(), visible=False)
Then, every time a tab is selected, we 1) update those gr.JSON
instances and 2) call the JS function.
That's the Tabs -> URL step done. To do the opposite, we use the set_tabs_on_load
function. Upon loading, it will observe the request URL and set the selected tabs accordingly. This is only ran when the leaderboard is loaded fresh for a user.
Search & Filtering (commit)
This is fairly standard; I've added filtering for Proprietary vs Open models & on the number of parameters. For convenience, you can also filter directly for models that are compatible with Sentence Transformers. I've also made it so that the model size is always shown, on all tabs. I think this is a very important piece of information that should not just be shown in the Overall tab, and it also simplifies the filtering heavily. As a result, the PR looks a bit messy, but it's fairly simple. I add a filter_data
function that gets all dataframes as well as all filtering/search options, and returns the filtered dataframes again.
Fix embedding dimensions if Dense module exists (commit)
For models that use Dense layers, such as e.g. https://huggingface.co/aspire/acge_text_embedding, the embedding dimension is not computed correctly. This is now fixed by also accounting for \d+_Dense/config.json
configuration files.
- Tom Aarsen
@Muennighoff
Should be ready for review now! Apologies for the sheer size of this one. I'm excited for these changes to come through.
- Tom Aarsen
This looks amazing. Can we allow a number of parameters for API models? For voyage they explicitly asked if we could show their number of parameters as it's useful to know for some people and said it would be 1.22B parameters
/ 2.45GB
?
Certainly! The PR currently labels models as "Proprietary" only if there is no known # of parameters, but I'll update that to "if there is no known # of parameters OR the model is in a specific list of exceptions" which will initially only contain the one Voyage model. In truth, I thought that perhaps the voyage model size was an error/oversight 😄 Out of curiosity, is it correct that only the model size for voyage-lite-02-instruct
should be public? I couldn't find any info on model sizes on https://docs.voyageai.com/docs/embeddings (and voyage-lite-02-instruct
is listed as a Deprecated model in https://docs.voyageai.com/docs/pricing ?). cc
@voyageai01
Edit: I'll actually determine the proprietary models exclusively through a list of models - otherwise gated models will be listed as proprietary.
I'll probably make the change I mentioned on ~Tuesday.
- Tom Aarsen
They only shared voyage-lite-02-instruct
sizes with us but maybe they also want to share the other model sizes? cc
@Shuang59
@hongliu9903
Also two more notes:
- When unselecting a parameter range & then reselecting it proprietary models are gone even though I'd expect it to be back to the start
- If selecting only
Proprietary Models
some Open Models like udever remain in the ranking, I guess because we cannot grab their model size
Looks really really cool!
Thanks for the details & for testing this out! I'll include fixes for these in the coming days.
@Muennighoff I've addressed all the comments.
- 56136076 fixes e.g. udever remaining in the ranking (I can still grab its model size though?) and voyage's model size is now listed again.
- 485f27b4 fixes the proprietary models from disappearing when toggling the model size.
Let me know if you need anything else from me here!
- Tom Aarsen
I've also incremented the Gradio SDK version. This fixes an issue where the DataFrame header & table will separate when scrolling on Firefox.
- Tom Aarsen
Nice though if I deactivate Proprietary the voyage model is still shown even though it is proprietary 🤔
Also interesting that e5-mistral & echo-mistral differ by 1 million parameters despite both stemming from the same model
Also do you think it is worth keeping the model size in GB tab in addition? I don't have a strong opinion but maybe it's useful to some people
Nice though if I deactivate Proprietary the voyage model is still shown even though it is proprietary 🤔
Apologies, this was an oversight. I based only the Proprietary models on the PROPRIETARY_MODELS
whereas the Open models were still based on the existence of the model size. I fixed this in 2db25dc3
Also interesting that e5-mistral & echo-mistral differ by 1 million parameters despite both stemming from the same model
I looked into this: e5-mistral-7b-instruct
is listed as an external model, for which I estimated the number of models based on the model size. That explains the small difference. I've updated that to match SFR-Embedding-Mistral.
I appreciate the detailed reviews.
- Tom Aarsen
We could put something nicer than empty space when no model matches
I'll look into that actually!
That also reminds me that I can add n/a
as the model size for the proprietary models.
And I forgot to address your comment here:
Also do you think it is worth keeping the model size in GB tab in addition? I don't have a strong opinion but maybe it's useful to some people
Hmm, I'm not 100% confident what's best here. I think it's a bit duplicate, but for inference the GB also somewhat refers to the memory requirements for inference I believe. I'll ask around a bit to get people's thoughts here.
I've merged the Law & Gecko changes into this PR and I've added the memory usage to all tables:
I'm considering renaming the column to just Memory Usage (GB)
. I've verified the correctness using this script:
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer("intfloat/multilingual-e5-large-instruct", device="cuda")
print(f"{torch.cuda.max_memory_allocated() / 1024**3:.2f}GB in use after loading model")
2.09GB in use after loading model
and for mixedbread-ai/mxbai-embed-large-v1
:
1.25GB in use after loading model
I can't find a convenient way to add a hint/warning when the filtering is too restrictive resulting in an empty table, nor does the "n/a" in the model size for proprietary models work (it messes up the column sorting). With other words, I think this might be ready for final review & to be merged. @Muennighoff
- Tom Aarsen
Looks amazing, final two points from my side:
- Cohere-embed-english-v3.0 is also proprietary but still there if unselecting prop
- Can we add 1200 Million parameters for the Gecko models? Also we can probably also add the GB based on what they gave us for the current lb (2.29) - I think you just multiplied it by 2 for voyage? I think that's fine & if it's wrong they can open an issue / let us know
Sorry, added one more model, voyage-2-law 😅
Haha, all good. Resolved the merge conflict & marked voyage-2-law
as a proprietary model. It should correctly behave to the filtering options now. I also marked Cohere-embed-english-v3.0
as proprietary.
For all models I use the memory usage with the weights in fp32, so given a number of parameters I can accurately compute the VRAM usage. I've done that for Gecko and also Voyage.
Cool feel free to merge!
Seems like the URL syncing doesn't work, I'm guessing that is blocked by HF Spaces or something for security reasons.
I've opened an issue on that here: https://github.com/gradio-app/gradio/issues/7957
I appreciate it, I'll add a bit of extra context there.