Rethinking ReplayGain
I’m a luddite and an audiophile and I dislike the idea of subscription music services, so my music collection is a disk of FLACs which gets transcoded to other formats as necessary. I recently got around to including ReplayGain in this system, so now everything can be played at the same loudness. This is a massive improvement on having a chaos of different volumes, but… it turns out that it’s still not quite right.
If you go from, say, an acoustic track to a metal track – both perfectly ReplayGained – the metal track will sound weirdly muted in comparison. But supposedly they’re at the same perceived loudness, so what’s going on? My way of thinking about this is that there’s an expected loudness that doesn’t necessarily match the perceived loudness. A metal track should feel louder than a gentle acoustic track.
So it turns out that equal perceived loudness isn’t what I want as a listener. I want the metal tracks to hit as hard as I feel they should (perhaps subjectively, but probably not uniquely subjectively). So what I’m aiming for is a measure of how loud a track should be.
You could throw up your hands at this stage and say there’s nothing we can do about this. We can automatically calculate the perceived loudness, but there’s no way to know what loudness a track should have. This is an artistic decision, not something that we can automate. And maybe that’s true at some level, but I reckon we don’t have to give up entirely.
Here’s a simple model of why the expected loudness of a metal track is systematically different from the perceived loudness when compared to an acoustic track: There’s a lot of high-frequency content in metal. The guitar distortion, the cymbals, the vocal rasps, etc., etc. are adding a lot of high-frequency energy that’s just not present in acoustic tracks. And perhaps electronic bass music does a similar thing at the low end: there’s a big sub-bass that should shake the room. And again, it should be louder than the gentle acoustic tracks.
The parts of a metal track that sound muted with ReplayGain are some quite specific elements: the body of the drums, guitar, and vocals. This is all in the middle of the frequency range.
If my hypothesis is correct, there’s a simple way to correct for it: just apply a highpass and a lowpass filter and make the loudness calculation on the mid-range. So what you’d end up with is a music collection where the mid-range has been normalised, but the highs and the lows (which will be contributing to perceived loudness) are left to float free.
Way back when I was mastering Empire and Dust, I actually used something like this approach: I put on a high-pass and a low-pass filter so that I was only listening to the mid-range, then volume matched the tracks like that, and only after that considered whether the tracks were too bright or dark and whether they had the right amount of low-end.
In the olden days (a couple of years ago) this would have just remained a nice thought experiment, but I wouldn’t have done anything about it. But these days I just ask Claude Code to build things for me.
I was already using rsgain in my transcoding scripts, so I just forked it and added some options.
Stock rsgain calculates the perceived loudness of the track and the album as LUFS (along with an algorithm for silence detection, discarding some outlier sections and so on), then adds tags saying how to adjust the volume to get to the target level.
My new mode (MidrangeFilter=true) calculates the perceived loudness of the midrange frequencies in the same way, adds a fixed dB offset (MidrangeOffset) so that a “typical piece of music” gets roughly the same level as it would with the original mode, and then you can blend that number with the original (MidrangeBlend where 0.0 is none and 1.0 is full). And you can set the low and high filters (MidrangeLow and MidrangeHigh).
Why have the blending? I enjoy some fairly experimental music, and I guess I’m a bit wary that I might have some track that just consists of high frequency whistling that then gets boosted to infinity and suddenly destroys my ears.
After some very basic tweaking (which you could certainly do more carefully and systematically than I’ve done), this is the preset I’ve ended up with:
[Global]
MidrangeFilter=true
MidrangeBlend=0.8
MidrangeLow=100
MidrangeHigh=1000
MidrangeOffset=0.7
And you can run that with:
rsgain easy -p midrange.preset <files>
I’m not sure that it’s quite right yet, but I think it’s at least slightly less bad than it was.