--- language: - en arxiv: 2412.15838 license: cc-by-nc-4.0 tags: - any-to-any --- # AnyRewardModel All-Modality Generation benchmark evaluates a model's ability to follow instructions, automatically select appropriate modalities, and create synergistic outputs across different modalities (text, visual, audio) while avoiding redundancy. [🏠 Homepage](https://github.com/PKU-Alignment/align-anything) | [👍 Our Official Code Repo](https://github.com/PKU-Alignment/align-anything) [🤗 All-Modality Understanding Benchmark](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-AMU) [🤗 All-Modality Generation Benchmark (Instruction Following Part)](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-InstructionFollowing) [🤗 All-Modality Generation Benchmark (Modality Selection and Synergy Part)](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-Selection_Synergy) [🤗 All-Modality Generation Reward Model](https://huggingface.co/PKU-Alignment/AnyRewardModel) ## Data Example
## Usage ```python from transformers import AutoModel, AutoProcessor model = AutoModel.from_pretrained("PKU-Alignment/AnyRewardModel", trust_remote_code=True) processor = AutoProcessor.from_pretrained("PKU-Alignment/AnyRewardModel", trust_remote_code=True) ``` For Image-Audio Modality Synergy scoring: ```python user_prompt: str = 'USER: {input}' assistant_prompt: str = '\nASSISTANT:\n{modality}{text_response}' def sigmoid(x): return 1 / (1 + math.exp(-x)) def process_ia(prompt, image_path, audio_path): image_pixel_values = processor(data_paths = image_path, modality="image").pixel_values audio_pixel_values = processor(data_paths = audio_path, modality="audio").pixel_values text_input = processor( text = user_prompt.format(input = prompt) + \ assistant_prompt.format(modality = "