Allow submitting models with remote code

#58
by tju01 - opened

When submitting a model based on falcon like OpenAssistant/falcon-40b-sft-mix-1226 or nomic-ai/gpt4all-falcon, I get an error like Model "OpenAssistant/falcon-40b-sft-mix-1226"was not found on hub!. This seems to be because the is_model_on_hub function tries to load the model config with AutoConfig.from_pretrained. This works fine for most other models, but models based on falcon require trust_remote_code=True in order to load them which is currently not set. This PR fixes that part by switching to PretrainedConfig.get_config_dict instead which allows those models without needing to trust remote code.

Important: This repository only seems to upload the model information to HuggingFaceH4/lmeh_evaluations and the actual evaluation seems to happen over there or somewhere else. In order to actually evaluate the model after it was submitted to the queue, another change might be required to the evaluation code to use trust_remote_code=True when loading the model, possibly in a sandbox. However, this code doesn't seem to be public, so I can't submit a PR there.

tju01 changed pull request title from Enable trust_remote_code to Allow submitting models with remote code

Agreed. I wonder how falcon got there in the first place, where at least a dozen were before it in the queue.

Agreed. I wonder how falcon got there in the first place, where at least a dozen were before it in the queue.

that's a good question!

Open LLM Leaderboard org

Hi!
Thanks for your PR and detailed message ๐Ÿค—

We don't plan on allowing an automatic execution of models with trust_remote_code=True for the moment for safety reasons, as it would require us to manually examine the source code of every new model added to ensure no malicious code is executed on our cluster, which would sadly be too time consuming.

@clefourrier How Falcon came on the leaderboard then?

We don't plan on allowing an automatic execution of models with trust_remote_code=True for the moment for safety reasons, as it would require us to manually examine the source code of every new model added to ensure no malicious code is executed on our cluster, which would sadly be too time consuming.

I see. I thought that the code could be executed in a reasonable isolated environment or a sandbox could be used. Alternatively, one could add support specifically for falcon-based models, but I don't think that's easier. Either way, it would be nice to have falcon-based models on the leaderboard since they would probably take all of the top spots, but I can see your reason for not allowing them. I guess this PR can then also be closed, though I would be happy if it would be reconsidered in the future.

tju01 changed pull request status to closed

PR is now merged/closed. The ephemeral Space has been deleted.

(This is an automated message.)

Sign up or log in to comment