huggingface/HuggingDiscussions · [FEEDBACK] Daily Papers

Hugging Face org Jun 12, 2024

•

edited Jul 25, 2024

Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.

How to submit a paper to the Daily Papers, like @akhaliq (AK)?

Submitting is available to paper authors
Only recent papers (less than 7d) can be featured on the Daily

Then drop the arxiv id in the form at https://huggingface.co/papers/submit

Add medias to the paper (images, videos) when relevant
You can start the discussion to engage with the community

Please check out the documentation

RollingPig

Jun 17, 2024

https://arxiv.org/abs/2406.01954

runninglsy

Jun 18, 2024

•

edited Jun 27, 2024

We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".

Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co/AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co/datasets/AIDC-AI/Ovis-dataset

Yiwen-ntu

Jun 18, 2024

This comment has been hidden

kramp

Hugging Face org Jun 18, 2024

@Yiwen-ntu for now we support only videos as paper covers in the Daily.

renqiux0302

Jun 19, 2024

This comment has been hidden

taki555

Jun 19, 2024

This comment has been hidden

devichand

Jun 20, 2024

we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644

106 hidden messages

Expand all

Jinyang23

Dec 2, 2024

Hi @kramp and @akhaliq ,

I hope you're doing well! I would like to kindly request your assistance in verifying my authorship claim for this paper: https://huggingface.co/papers/2411.18478. Today marks the 6th day, and I would appreciate it if you could help expedite the verification process so that the paper can be featured on the daily papers.

Thank you so much for your help!

Best regards,
Jinyang Wu

GregorySenay

Dec 2, 2024

This comment has been hidden

Harold328

about 1 month ago

Self-Supervised Unified Generation with Universal Editing: https://arxiv.org/pdf/2412.02114

xumingyu16

29 days ago

Dear AK and HF team，

I would like to kindly request your assistance in verifying my authorship claim for this paper: https://huggingface.co/papers/2411.18478. Today marks the 7th day, and I would appreciate it if you could help expedite the verification process so that the paper can be featured on the daily papers.

Thank you so much for your help!

Best regards,
Mingyu Xu

Klayand

28 days ago

@akhaliq @kramp
Dear AK and HF team ,

🚀 We would like to kindly request your assistance in sharing our latest research paper in less than 1 month(Nov. 14), "Golden Noise for Diffusion Models: A Learning Framework". We believe it may be of significant interest for HF Daily Paper.

🌟 First, we identify a new concept termed noise prompt, which aims at turning a random noise into a golden noise by adding a small desirable perturbation derived from the text prompt. The golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Building upon this concept, we formulate a noise prompt learning framework that learns "prompted'' golden noises associated with text prompts for diffusion models.

🌟 Second, to implement the formulated noise prompt learning framework, we propose the training dataset, namely the noise prompt dataset(NPD), and the learning model, namely the noise prompt network(NPNet). Specifically, we design a noise prompt data collection pipeline via re-denoise sampling, a way to produce noise pairs. We also incorporate AI-driven feedback mechanisms to ensure that the noise pairs are highly valuable. This pipeline enables us to collect a large-scale training dataset for noise prompt learning, so the trained NPNet can directly transform a random Gaussian noise into a golden noise to boost the performance of the T2I diffusion model.

🌟 Third, we conduct extensive experiments across various mainstream diffusion models, including StableDiffusion-xl(SDXL), DreamShaper-xl-v2-turbo and Hunyuan-DiT, with 7 different samplers on 4 different datasets. We evaluate our model by utilizing 6 human preference metrics including Human Preference Score v2(HPSv2), PickScore Aesthetic Score(AES), ImageReward, CLIPScore and Multi-dimensional Preference Score(MPS). As illustrated in Fig.1, by leveraging the learned golden noises, not only is the overall quality and aesthetic style of the synthesized images visually enhanced, but all metrics also show significant improvements, demonstrating the effectiveness and generalization ability of our NPNet. For instance, on GenEval, our NPNet let SDXL improve the classical evaluation metric HPSv2 by 18%(24.04→28.41)}, which even surpasses a recent much stronger DiT-based diffusion model Hunyuan-DiT(27.78). Furthermore, the NPNet is a compact and efficient neural network that functions as a plug-and-play module, introducing only a 3% extra inference time per image compared to the standard pipeline, while requiring approximately 3% of the memory required by the standard pipeline. This efficiency underscores the practical applicability of NPNet in real-world scenarios.

📑 Paper: https://arxiv.org/abs/2411.09502
🌐 Project Page: https://github.com/xie-lab-ml/Golden-Noise-for-Diffusion-Models

We would greatly appreciate your assistance and consideration of our paper for inclusion.

Best regards,
Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie

ohad204

4 days ago

@akhaliq @kramp
Dear AK and HF team ,

🚀 We would like to kindly request your assistance in sharing our latest research paper, "Bringing Objects to Life: 4D generation from 3D objects".
We believe it may be of significant interest for HF Daily Paper.

🌟 Recent advancements in generative modeling now enable the creation of 4D content (moving 3D objects) controlled with text prompts.
4D generation has large potential in applications like virtual worlds, media, and gaming, but existing methods provide limited control over the appearance and geometry of generated content.

🌟 In this work, we introduce a method for animating user-provided 3D objects by conditioning on textual prompts to guide 4D generation, enabling custom animations while maintaining the identity of the original object.

🌟 We first convert a 3D mesh into a ``static" 4D Neural Radiance Field (NeRF) that preserves the visual attributes of the input object. Then, we animate the object using an Image-to-Video diffusion model driven by text. To improve motion realism, we introduce an incremental viewpoint selection protocol for sampling perspectives to promote lifelike movement and a masked Score Distillation Sampling (SDS) loss, which leverages attention maps to focus optimization on relevant regions.

🌟 We evaluate our model in terms of temporal coherence, prompt adherence, and visual fidelity and find that our method outperforms baselines that are based on other approaches, achieving up to threefold improvements in identity preservation measured using LPIPS scores, and effectively balancing visual quality with dynamic content.

📑 Paper: https://arxiv.org/abs/2412.20422
🌐 Project Page: https://3-to-4d.github.io/3-to-4d/

We would greatly appreciate your assistance and consideration of our paper for inclusion.

raannakasturi

4 days ago

https://doi.org/10.5281/zenodo.13862005

chengzeyi

about 20 hours ago

•

edited about 20 hours ago

👀