vtrungnhan9
/

audioldm2-music-zac2023

AudioLDM2Pipeline

Model card Files Files and versions Community

audioldm2-music-zac2023 / README.md

vtrungnhan9's picture

Update README.md

2a5a1ed verified 12 months ago

|

history blame contribute delete

2.3 kB

	---
	datasets:
	- google/MusicCaps
	language:
	- en
	library_name: diffusers
	tags:
	- music
	pipeline_tag: text-to-audio
	---
	# AudioLDM 2 Music for Zalo AI Challenge 2023

	<!-- Provide a quick summary of what the model is/does. -->

	This checkpoint is the result of finetuning AudioLDM 2 Music (https://huggingface.co/cvssp/audioldm2-music) on the challenge dataset + MusicCaps (https://www.kaggle.com/datasets/googleai/musiccaps)

	## Uses

	First, install the required packages:

	```
	pip install --upgrade diffusers transformers accelerate
	```

	### Text-to-Audio

	```python
	from diffusers import AudioLDM2Pipeline
	import torch

	repo_id = "vtrungnhan9/audioldm2-music-zac2023"
	pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
	pipe = pipe.to("cuda")

	prompt = "This music is instrumental. The tempo is medium with synthesiser arrangements, digital drums and electronic music. The music is upbeat, pulsating, youthful, buoyant, exciting, punchy, psychedelic and has propulsive beats with a dance groove. This music is Techno Pop/EDM."
	neg_prompt = "bad quality"
	audio = pipe(prompt, negative_prompt=neg_prompt, num_inference_steps=200, audio_length_in_s=10.0, guidance_scale=10).audios[0]
	```

	The resulting audio output can be saved as a .wav file:
	```python
	import scipy

	scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
	```

	Or displayed in a Jupyter Notebook / Google Colab:
	```python
	from IPython.display import Audio

	Audio(audio, rate=16000)
	```

	## Training Details

	### Training Data

	* You can download the challenge dataset from link: https://challenge.zalo.ai/portal/background-music-generation
	* You can download MusicCaps from link: https://www.kaggle.com/datasets/googleai/musiccaps

	[More Information Needed]

	### Training Procedure

	Please refer at https://github.com/declare-lab/tango/blob/master/train.py for training procedure

	## Citation

	BibTeX:

	```
	@article{liu2023audioldm2,
	title={"AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining"},
	author={Haohe Liu and Qiao Tian and Yi Yuan and Xubo Liu and Xinhao Mei and Qiuqiang Kong and Yuping Wang and Wenwu Wang and Yuxuan Wang and Mark D. Plumbley},
	journal={arXiv preprint arXiv:2308.05734},
	year={2023}
	}
	```

	## Model Card Contact

	[email protected]