RoyalCities
/

Vocal_Textures_Main

music-generation

sample-generation

Model card Files Files and versions Community

RoyalCities commited on Sep 7, 2024

Commit

3610af9

·

verified ·

1 Parent(s): a16674b

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -38,6 +38,7 @@ This finetuned Stable Audio Open model specializes in Vocal / Operatic Chord Pro
 - **Multiple Types of Stem Generation:** Outputs three types of voicings with a focus on Chord Progressions only,
 - **Tonal Versatility:** Generates stems in any key across the 12-tone chromatic scale, in both major and minor scales.
 - **Simplified Scale Notation:** Scales are written using <b><i>sharps only</i></b> in the following format:
 <pre>
@@ -222,7 +223,7 @@ See config file for further details.
 The Model has high accuracy when it comes to staying in key due to the balance in the dataset. The metadata however was designed in such a way that the model is mainly designed to generate chord progressions only - as opposed to the piano model which also generated melodies. This is due to Vocal choir samples often having very long attacks so often sit at the back of a mix or to fill out the frequency space.
-I have noticed some light artifacting in the outputs - in particular the ensemble progressions. It almost sounds like the model is trying to add other instrumentation in. I think this may be due to the base model primarily being trained on music stems rather than vocals but I cannot say for certain - this will need to be corrected in a future model or when there is far more vocal data.
 Best use case may be to add some light reverb or post-processing for best results / use in a song.

 - **Multiple Types of Stem Generation:** Outputs three types of voicings with a focus on Chord Progressions only,
 - **Tonal Versatility:** Generates stems in any key across the 12-tone chromatic scale, in both major and minor scales.
+- **Audio-To-Audio:** Generates interesting vocal timbres when paired with vocal stems.
 - **Simplified Scale Notation:** Scales are written using <b><i>sharps only</i></b> in the following format:
 <pre>
 The Model has high accuracy when it comes to staying in key due to the balance in the dataset. The metadata however was designed in such a way that the model is mainly designed to generate chord progressions only - as opposed to the piano model which also generated melodies. This is due to Vocal choir samples often having very long attacks so often sit at the back of a mix or to fill out the frequency space.
+I have noticed some light noise in the outputs - in particular the ensemble progressions. It almost sounds like the model is trying to add other instrumentation in. I think this may be due to the base model primarily being trained on music stems rather than vocals but I cannot say for certain - this will need to be corrected in a future model or when there is far more vocal data.
 Best use case may be to add some light reverb or post-processing for best results / use in a song.