Code for "Unsupervised Source Separation via Bayesian inference in the latent domain" - Link
| GT Compressed | Separated | |
|---|---|---|
| Drums | GT Compressed Drums | Separated Drums |
| Bass | GT Compressed Bass | Separated Bass |
| Mix | GT Compressed Mix | Separated Mix |
The separation is performed on a x64 compressed latent domain. The results can be upsampled via Jukebox upsamplers in order to increment perceptive quality (WIP).
https://user-images.githubusercontent.com/7619886/151544181-482e687d-538a-4886-b690-999dd4d64c42.mp4
https://user-images.githubusercontent.com/7619886/151544196-d43b0341-4214-4083-8b52-46574c02d800.mp4
https://user-images.githubusercontent.com/7619886/151544205-442793a6-b886-4c66-bc1a-6108abe260a7.mp4
Install the conda package manager from https://docs.conda.io/en/latest/miniconda.html
conda create --name lqvae-separation python=3.7.5
conda activate lqvae-separation
sudo apt install mpich
conda install mpi4py==3.0.3
pip install ffmpeg-python==0.2.0
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
(if problems try pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2)
pip install -r requirements.txt
pip install -e .
# for training
pip install av=7.0.1
pip install ./tensorboardX
script/ folder and create the folder checkpoints/ and the folder results/.checkpoints/bs separations of 3 seconds starting from second shift of the mixture created with the sources in path_1 and path_2. The sources must be WAV files sampled at 22kHz.
PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs
bs is 64, and can be handled by an RTX3080 with 16 GB of VRAM. Lower the value if you get CUDA: out of memory.vqvae/vqvae.pyfile of Jukebox has been modified in order to include the linearization loss of the LQ-VAE (it is computed at all levels of the hierarchical VQ-VAE but we only care of the topmost level given that we perform separation there). One can train a new LQ-VAE on custom data (here data/train for train and data/test for test) by running the following from the root of the projectPYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae --sample_length=131072 --bs=8
--audio_files_dir=data/train/ --labels=False --train --test --aug_shift --aug_blend --name=lq_vae --test_audio_files_dir=data/test
vqvae hyperparameters in hparams.py so if you want to change the levels / downsampling factors you have to modify them there.L_lin enforces the sum operation on the latent domain, you can use the data of both sources together (or any other audio data).logs/lq_vae (lq_vae is the name parameter).PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pior_source
--audio_files_dir=data/source/train --test_audio_files_dir=data/source/test --labels=False --train --test --aug_shift
--aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --min_duration=24 --sample_length=1048576
--bs=16 --n_ctx=8192 --sample=True --sample_iters=1000 --restore_vqvae=logs/lq_vae/checkpoint_lq_vae.pth.tar
data/source/train and data/source/test and we assume the LQ-VAE has 3 levels (topmost level = 2).small_prior in hparams.py and uses a context of n_ctx=8192 codes.--restore_vqvaelogs/pior_source (pior_source is the name parameter).codebook_precalc.py in the script folder:PYTHONPATH=.. python codebook_precalc.py --save_path=checkpoints/codebook_sum_precalc.pt
--restore_vqvae=../logs/lq_vae/checkpoint_lq_vae.pth.tar` --raw_to_tokens=64 --l_bins=2048
--sample_rate=22050 --alpha=[0.5, 0.5] --downs_t=(2, 2, 2) --commit=1.0 --emb_width=64
bayesian_inference.py as following:
PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs --restore_vqvae=checkpoints/checkpoint_step_60001_latent.pth.tar
--restore_priors 'checkpoints/checkpoint_drums_22050_latent_78_19k.pth.tar' checkpoints/checkpoint_latest.pth.tar' --sum_codebook=checkpoints/codebook_precalc_22050_latent.pt
restore_priors accepts two paths to the first and second prior checkpoints.bayesian_test.py after you have put the full Slakh drums and bass validation split inside data/bass/validation and data/drums/validation.If you find the code useful for your research, please consider citing
@article{mancusi2021unsupervised,
title={Unsupervised Source Separation via Bayesian Inference in the Latent Domain},
author={Mancusi, Michele and Postolache, Emilian and Fumero, Marco and Santilli, Andrea and Cosmo, Luca and Rodol{\`a}, Emanuele},
journal={arXiv preprint arXiv:2110.05313},
year={2021}
}
as well as the Jukebox baseline: