Model Card for musicgen-songstarter-v0.2
musicgen-songstarter-v0.2 is a musicgen-stereo-melody-large
fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
π Update: I wrote a blogpost detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.
Compared to musicgen-songstarter-v0.1
, this new version:
- was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
- Is twice the size, bumped up from size
medium
β‘οΈlarge
transformer LM
If you find this model interesting, please consider:
Usage
Install audiocraft:
pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
model.set_generation_params(duration=8) # generate 8 seconds.
wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
wav = model.generate(descriptions) # generates 3 samples.
melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
Prompt Format
Follow the following prompt format:
{tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm
For example:
hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
For some example tags, see the prompt format section of musicgen-songstarter-v0.1's readme. The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.
Samples
Audio Prompt | Text Prompt | Output |
---|---|---|
trap, synthesizer, songstarters, dark, G# minor, 140 bpm | ||
acoustic, guitar, melody, trap, D minor, 90 bpm |
Training Details
For more verbose details, you can check out the blogpost.
- code:
- Repo is here. It's an undocumented fork of facebookresearch/audiocraft where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me.
- data:
- around 1700-1800 samples I manually listened to + purchased via my personal Splice account. About 7-8 hours of audio.
- Given the licensing terms, I cannot share the data.
- hardware:
- 8xA100 40GB instance from Lambda Labs
- procedure:
- trained for 10k steps, which took about 6 hours
- reduced segment duration at train time to 15 seconds
- hparams/logs:
- See the wandb run, which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script.
Acknowledgements
This work would not have been possible without:
- Lambda Labs, for subsidizing larger training runs by providing some compute credits
- Replicate, for early development compute resources
Thank you β€οΈ
- Downloads last month
- 829