Inferences API Results: Similar Values for Arosal, Valence, and Dominance

#11

by shivamisb - opened Oct 29, 2024

Oct 29, 2024

Hi,
I am using the inference API and noted that the results for arousal, valence, and dominance are very similar similar, with values ranging between 3.2-3.4.
Can someone help me understand why?

Thanks.

hagenw

audEERING GmbH org Nov 21, 2024

The model should return values in the range of 0 to 1 for all three dimensions.

What do you mean with "inference API"?

wilke18

6 days ago

I have a similar thing that happens when I try to run with the HuggingFace audio-classification pipeline, even when I set the function_to_apply to be none (rather than running a softmax over the results). My best guess right now is that the audio-classification pipeline doesn't add the correct classification head that is needed but am not sure.
https://huggingface.co/docs/transformers/v4.47.1/en/main_classes/pipelines#transformers.AudioClassificationPipeline.__call__.function_to_apply(str,

audmax changed discussion status to closed 5 days ago

audmax changed discussion status to open 5 days ago

audmax

audEERING GmbH org 5 days ago

The classification head used in the audio-classification pipeline is different to the one used for this model. In the pipeline, the projection layer comes before the pooling while in this repository, the pooling is applied beforehand. The layers also differ in terms of their dimensionalities and names.

Thus, you will need to use the code example as given in the model card in order to run the inference correctly.

wilke18

5 days ago

Appreciate it! I had also found this to be the case but was investigating other issues I was noticing in the audio-classification pipelines. Seems like even if I fix those issues, the possible naming conventions/pipeline AutoModel configuration causes basically a complete loss of "classification" ability. Not sure if long term this should be something HuggingFace tries to standardize the naming of such that the pipelines can work for arbitrary classification heads but appreciate the help all the same! Very cool work y'all have put together!

audmax

audEERING GmbH org 5 days ago

Yes, layers are newly initialized if they are not found (or named differently) in the model file. If the architecture of the head is exactly the same as in the pipeline, renaming in the safetensors or pytorch_model.bin file, respectively, might solve the issue.
Otherwise, transformers returns only a message such as Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at [...]and no explicit warning or error.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment