Spaces:
Running
on
Zero
Running
on
Zero
# Changelog | |
All notable changes to this project will be documented in this file. | |
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). | |
## [1.4.0a1] - 2024-06-03 | |
Adding new metric PesqMetric ([Perceptual Evaluation of Speech Quality](https://doi.org/10.5281/zenodo.6549559)) | |
Adding multiple audio augmentation functions: generating pink noises, up-/downsampling, low-/highpass filtering, banpass filtering, smoothing, duck masking, boosting. All are wrapped in the `audiocraft.utils.audio_effects.AudioEffects` and can be called with the API `audiocraft.utils.audio_effects.select_audio_effects`. | |
Add training code for AudioSeal (https://arxiv.org/abs/2401.17264) along with the [hf checkpoints]( https://huggingface.co/facebook/audioseal). | |
## [1.3.0] - 2024-05-02 | |
Adding the MAGNeT model (https://arxiv.org/abs/2401.04577) along with hf checkpoints and a gradio demo app. | |
Typo fixes. | |
Fixing setup.py to install only audiocraft, not the unit tests and scripts. | |
Fix FSDP support with PyTorch 2.1.0. | |
## [1.2.0] - 2024-01-11 | |
Adding stereo models. | |
Fixed the commitment loss, which was until now only applied to the first RVQ layer. | |
Removed compression model state from the LM checkpoints, for consistency, it | |
should always be loaded from the original `compression_model_checkpoint`. | |
## [1.1.0] - 2023-11-06 | |
Not using torchaudio anymore when writing audio files, relying instead directly on the commandline ffmpeg. Also not using it anymore for reading audio files, for similar reasons. | |
Fixed DAC support with non default number of codebooks. | |
Fixed bug when `two_step_cfg` was overriden when calling `generate()`. | |
Fixed samples being always prompted with audio, rather than having both prompted and unprompted. | |
**Backward incompatible change:** A `torch.no_grad` around the computation of the conditioning made its way in the public release. | |
The released models were trained without this. Those impact linear layers applied to the output of the T5 or melody conditioners. | |
We removed it, so you might need to retrain models. | |
**Backward incompatible change:** Fixing wrong sample rate in CLAP (WARNING if you trained model with CLAP before). | |
**Backward incompatible change:** Renamed VALLEPattern to CoarseFirstPattern, as it was wrongly named. Probably no one | |
retrained a model with this pattern, so hopefully this won't impact you! | |
## [1.0.0] - 2023-09-07 | |
Major revision, added training code for EnCodec, AudioGen, MusicGen, and MultiBandDiffusion. | |
Added pretrained model for AudioGen and MultiBandDiffusion. | |
## [0.0.2] - 2023-08-01 | |
Improved demo, fixed top p (thanks @jnordberg). | |
Compressor tanh on output to avoid clipping with some style (especially piano). | |
Now repeating the conditioning periodically if it is too short. | |
More options when launching Gradio app locally (thanks @ashleykleynhans). | |
Testing out PyTorch 2.0 memory efficient attention. | |
Added extended generation (infinite length) by slowly moving the windows. | |
Note that other implementations exist: https://github.com/camenduru/MusicGen-colab. | |
## [0.0.1] - 2023-06-09 | |
Initial release, with model evaluation only. | |