Re: FIR filters
Micheal,
Thanks for going over this. I really appreciate the dialog.
I fail to see any advantage in averaging the impulse responses (IRs) directly. Averaging the IRs should be equivalent to averaging the complex frequency response since averaging is addition and a scale factor and the Fourier transform is a linear operator. This would result a dip in the magnitude of the average at a frequency where the individual phases differ, which seems perceptually wrong. A listener at either position would not hear the dip.
I found it useful to pre-smooth (1/24th octave seems to work well) the log magnitude before averaging to get generally less chaotic behavior (and easier to look at graphs) and to address outliers by using the trimmed mean. The trimmed mean is a statistic that can range from the mean to the median in accordance with a parameter, usually called alpha. If the pre-smoothed log magnitude has troublesome outliers you just crank up alpha until they stop influencing the statistic. Going away from the mean does introduce discontinuities in the derivative of the log magnitude, however, so subsequent smoothing is necessary. Another thing I'd like to look at is giving precedence to peaks over dips, which goes with perception. But perhaps this is overkill and it's better just to get more measurements to average.
For the excess-phase all-pass part, too, it makes more sense to me to average the actual characteristic that we're trying to fix, which is phase or group delay. Averaging phase is troublesome, as I think you've observed, which is why I work with group delay. Also, the normalized average group delay is not affected by the time alignment of the individual measurements. The group delay of a later measurement, for example, is just shifted up (more delay) and has an equivalent effect on the normalized average.
Regardless, I need to study IR time alignment more for when you really want to see the average (total) phase. Oversampling (using a sin(x)/x interpolation kernel) and aligning the peaks makes sense but I wonder if it really is physically correct or optimal. What I've been doing is matching the group delay of multiple measurements over a narrow range of frequencies where the phase is well behaved, but that seems kind of ad hoc. One can imagine other, better metrics.
There's nothing like trying to write down some of this stuff to clarify my thinking. Before I could write the above I had to go back and make several tests on the code. I hope that by keeping this open discussion going we all can learn something. Perhaps this should move to DIY audio?
Best,
--Frank