It was developed by Moore, Tan and others in the paper:
Predicting the Perceived Quality of Nonlinearly Distorted Music and Speech Signals
I have implemented this metric in software, via Matlab. This software as well as LUFS can be found in the linked folder.
What it essentially does is:
- Model the filtering effects of the outer & middle ear on both track versions,
- Filters both track versions through an ERB based Gammatone filter array, to simulate the basilar membrane
- Creates a weighting coefficient set using the maximum level comparisons for the distorted version for each packet of a 30ms sliding window
- Finds the maximum cross-correlation between both versions of the track
- Does this for lags of -10ms to + 10ms
- weights these normalised maximum cross correlations, so all used coefficients sum to 1
- sums across the 40 filtered versions of the signal per packet
- mean averages per packet
Papers like http://projekter.aau.dk/projekter/files/9852082/07gr1061_Thesis.pdf show that it gives excellent correlation with listener perceived levels of distortion.
I will use this to derive how much perceptual distortion is in a track, which is important as I am trying to stay on the fringe of distortion perceptibility, as not to incur a response derived by the fact that the distortion is loud and present, and more the effect of the effect of the particulars of the distortion in and of itself.
https://onedrive.live.com/redir?resid=563e4881c8b0b60!6782&authkey=!ALzVdJrzlx7dS68&ithint=folder%2c