adefossez commited on
Commit
10b60b3
·
2 Parent(s): 2471bc3 bffb181

Merge branch 'main' of github.com:facebookresearch/audiocraft

Browse files
Files changed (3) hide show
  1. MODEL_CARD.md +2 -2
  2. README.md +9 -3
  3. requirements.txt +1 -0
MODEL_CARD.md CHANGED
@@ -52,7 +52,7 @@ The model was evaluated on the [MusicCaps benchmark](https://www.kaggle.com/data
52
 
53
  ## Training datasets
54
 
55
- The model was trained using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
56
 
57
  ## Quantitative analysis
58
 
@@ -62,7 +62,7 @@ More information can be found in the paper [Simple and Controllable Music Genera
62
 
63
  **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
64
 
65
- **Mitigations:** All vocals have been removed from the data source using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs). The model is therefore not able to produce vocals.
66
 
67
  **Limitations:**
68
 
 
52
 
53
  ## Training datasets
54
 
55
+ The model was trained on licensed data using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
56
 
57
  ## Quantitative analysis
58
 
 
62
 
63
  **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
64
 
65
+ **Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
66
 
67
  **Limitations:**
68
 
README.md CHANGED
@@ -8,7 +8,7 @@ Audiocraft is a PyTorch library for deep learning research on audio generation.
8
  ## MusicGen
9
 
10
  Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
11
- Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't not require a self-supervised semantic representation, and it generates
12
  all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
13
  them in parallel, thus having only 50 auto-regressive steps per second of audio.
14
  Check out our [sample page][musicgen_samples] or test the available demo!
@@ -21,6 +21,8 @@ Check out our [sample page][musicgen_samples] or test the available demo!
21
  </a>
22
  <br>
23
 
 
 
24
  ## Installation
25
  Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
26
 
@@ -35,7 +37,11 @@ pip install -e . # or if you cloned the repo locally
35
  ```
36
 
37
  ## Usage
38
- You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing). Finally, a demo is also available on the [`facebook/MusiGen` HugginFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
 
 
 
 
39
 
40
  ## API
41
 
@@ -52,7 +58,7 @@ GPUs will be able to generate short sequences, or longer sequences with the `sma
52
  **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
53
  You can install it with:
54
  ```
55
- apt get install ffmpeg
56
  ```
57
 
58
  See after a quick example for using the API.
 
8
  ## MusicGen
9
 
10
  Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
11
+ Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
12
  all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
13
  them in parallel, thus having only 50 auto-regressive steps per second of audio.
14
  Check out our [sample page][musicgen_samples] or test the available demo!
 
21
  </a>
22
  <br>
23
 
24
+ We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
25
+
26
  ## Installation
27
  Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
28
 
 
37
  ```
38
 
39
  ## Usage
40
+ We offer a number of way to interact with MusicGen:
41
+ 1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing).
42
+ 2. You can use the gradio demo locally by running `python app.py`.
43
+ 3. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
44
+ 4. Finally, @camenduru did a great notebook that combines [the MusicGen Gradio demo with Google Colab](https://github.com/camenduru/MusicGen-colab)
45
 
46
  ## API
47
 
 
58
  **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
59
  You can install it with:
60
  ```
61
+ apt-get install ffmpeg
62
  ```
63
 
64
  See after a quick example for using the API.
requirements.txt CHANGED
@@ -17,3 +17,4 @@ transformers
17
  xformers
18
  demucs
19
  librosa
 
 
17
  xformers
18
  demucs
19
  librosa
20
+ gradio