ViSQOL (Virtual Speech Quality Objective Listener) is an objective, full-reference metric for perceived audio quality. It uses a spectro-temporal measure of similarity between a reference and a test speech signal to produce a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score. MOS-LQO scores range from 1 (the worst) to 5 (the best).
ViSQOL can be run from the command line, or integrated into a project and used through its C++ or Python APIs. Whether being used from the command line, or used through the API, ViSQOL is capable of running in two modes:
ViSQOL was trained with data from subjective tests that roughly follow industry standards, such as ITU-T Rec. P.863. As a result certain assumptions are made, and your input to ViSQOL should probably have these properties:
5.1.0
.pip install numpy
bazel build :visqol -c opt
5.1.0
.git
for Windows can be obtained from the official git website.git
to be accessed from the system shells.tensorflow
build dependencies for windows.bazel build :visqol -c opt
--reference_file
--degraded_file
--batch_input_csv
Used to specify a path to a CSV file with the format:
reference,degraded ref1.wav,deg1.wav ref2.wav,deg2.wav
If the batch_input_csv
flag is used, the reference_file
and degraded_file
flags will be ignored.
--results_csv
Used to specify a path that the similarity score results will be output to. This will be a CSV file with the format:
reference,degraded,moslqo ref1.wav,deg1.wav,3.4 ref2.wav,deg2.wav,4.1
--verbose
--output_debug
--similarity_to_quality_model
--use_speech_mode
--use_unscaled_speech_mos_mapping
--use_lattice_model
To compare two files and output their similarity to the console:
./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --verbose
bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --verbose
To compare all reference-degraded file pairs in a CSV file, outputting the results to another file and also outputting additional "debug" information:
./bazel-bin/visqol --batch_input_csv input.csv --results_csv results.csv --output_debug debug.json
bazel-bin\visqol.exe --batch_input_csv "input.csv" --results_csv "results.csv" --output_debug "debug.json"
To compare two files using scaled speech mode and output their similarity to the console:
./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --use_speech_mode --verbose
bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --use_speech_mode --verbose
To compare two files using unscaled speech mode and output their similarity to the console:
./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --use_speech_mode --use_unscaled_speech_mos_mapping --verbose
bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --use_speech_mode --use_unscaled_speech_mos_mapping --verbose
To integrate ViSQOL with your Bazel project:
local_repository (
name = "visqol",
path = "/path/to/visqol",
)
deps = ["@visqol//:visqol_lib"],
int main(int argc, char **argv) {
// Create an instance of the ViSQOL API configuration class.
Visqol::VisqolConfig config;
// Set the sample rate of the signals that are to be compared.
// Both signals must have the same sample rate.
config.mutable_audio()->set_sample_rate(48000);
// When running in audio mode, sample rates of 48k is recommended for the input signals.
// Using non-48k input will very likely negatively affect the comparison result.
// If, however, API users wish to run with non-48k input, set this to true.
config.mutable_options()->set_allow_unsupported_sample_rates(false);
// Optionally, set the location of the model file to use.
// If not set, the default model file will be used.
config.mutable_options()->set_model_path("visqol/model/libsvm_nu_svr_model.txt");
// ViSQOL will run in audio mode comparison by default.
// If speech mode comparison is desired, set to true.
config.mutable_options()->set_use_speech_scoring(false);
// Speech mode will scale the MOS mapping by default. This means that a
// perfect NSIM score of 1.0 will be mapped to a perfect MOS-LQO of 5.0.
// Set to true to use unscaled speech mode. This means that a perfect
// NSIM score will instead be mapped to a MOS-LQO of ~4.x.
config.mutable_options()->set_use_unscaled_speech_mos_mapping(false);
// Create an instance of the ViSQOL API.
Visqol::VisqolApi visqol;
absl::Status status = visqol.Create(config);
// Ensure that the creation succeeded.
if (!status.ok()) {
std::cout<<status.ToString()<<std::endl;
return -1;
}
// Perform the comparison.
absl::StatusOr<Visqol::SimilarityResultMsg> comparison_status_or =
visqol.Measure(reference_signal, degraded_signal);
// Ensure that the comparison succeeded.
if (!comparison_status_or.ok()) {
std::cout<<comparison_status_or.status().ToString()<<std::endl;
return -1;
}
// Extract the comparison result from the StatusOr.
Visqol::SimilarityResultMsg similarity_result = comparison_status_or.value();
// Get the "Mean Opinion Score - Listening Quality Objective" for the degraded
// signal, following the comparison to the reference signal.
double moslqo = similarity_result.moslqo();
// Get the similarity results for each frequency band.
google::protobuf::RepeatedField<double> fvnsim = similarity_result.fvnsim();
// Get the center frequency bands that the above FVNSIM results correspond to.
google::protobuf::RepeatedField<double> cfb = similarity_result.center_freq_bands();
// Get the mean of the FVNSIM values (the VNSIM).
double vnsim = similarity_result.vnsim();
// Get the comparison results for each patch that was compared.
google::protobuf::RepeatedPtrField<Visqol::SimilarityResultMsg_PatchSimilarityMsg> patch_sims =
similarity_result.patch_sims();
for (Visqol::SimilarityResultMsg_PatchSimilarityMsg each_patch : patch_sims) {
// Get the similarity score for this patch.
double patch_similarity = each_patch.similarity();
// Get the similarity results for each frequency band for this patch.
// The center frequencies that these values correspond to are the
// same as those that are returned in the parent center_freq_bands().
google::protobuf::RepeatedField<double> patch_fvnsim = each_patch.freq_band_means();
// Get the time (in sec) where this patch starts in the reference signal.
double ref_patch_start_time = each_patch.ref_patch_start_time();
// Get the time (in sec) where this patch ends in the reference signal.
double ref_patch_end_time = each_patch.ref_patch_end_time();
// Get the time (in sec) where this patch starts in the degraded signal.
double deg_patch_start_time = each_patch.deg_patch_start_time();
// Get the time (in sec) where this patch ends in the degraded signal.
double deg_patch_end_time = each_patch.deg_patch_end_time();
}
return 0;
}
From within the root directory install ViSQOL using pip.
pip install .
import os
from visqol import visqol_lib_py
from visqol.pb2 import visqol_config_pb2
from visqol.pb2 import similarity_result_pb2
config = visqol_config_pb2.VisqolConfig()
mode = "audio"
if mode == "audio":
config.audio.sample_rate = 48000
config.options.use_speech_scoring = False
svr_model_path = "libsvm_nu_svr_model.txt"
elif mode == "speech":
config.audio.sample_rate = 16000
config.options.use_speech_scoring = True
svr_model_path = "lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"
else:
raise ValueError(f"Unrecognized mode: {mode}")
config.options.svr_model_path = os.path.join(
os.path.dirname(visqol_lib_py.__file__), "model", svr_model_path)
api = visqol_lib_py.VisqolApi()
api.Create(config)
similarity_result = api.Measure(reference, degraded)
print(similarity_result.moslqo)
Armadillo - http://arma.sourceforge.net/
Libsvm - http://www.csie.ntu.edu.tw/~cjlin/libsvm/
PFFFT - https://bitbucket.org/jpommier/pffft
Boost - https://www.boost.org/
Using the libsvm codebase, you can train a model specific to your data. The procedure is as follows:
Currently, SVR is only supported for audio mode.
Use of this source code is governed by a Apache v2.0 license that can be found in the LICENSE file.
There have been several papers that describe the design of the ViSQOL algorithm and compare it to other metrics. These three should serve as an overview:
ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric (2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX))
ViSQOL: an objective speech quality model (2015 EURASIP Journal on Audio, Speech, and Music Processing)
Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio (The 2017 IEEE Transactions on Broadcasting)
This may have to do with bazel being out of sync. You may need to run bazel clean --expunge
and rebuild.
There are a number of possible explanations, here are the most common ones:
In addition to the contributions visible on the repository history, Colm Sloan and Feargus O'Gorman have significantly contributed to the codebase in the collaboration between Andrew Hines and Google.