Mixing Audio without Clipping in iOS:
Limiters and Other Techniques

If you’ve ever tinkered around with low level audio in iOS, you’ve likely come across this common situation:  Sound 1 plays perfectly.  Sound 2 plays perfectly.  Maybe you even get away with Sound 1 + Sound 2 playing perfectly together.  However at some point as you increase the “polyphony”, the number of simultaneous sounds, for your synthesizer, loop mixer or virtual instrument, you begin to notice a certain crackling sound.  You desperately try to pretend that it isn’t really there, or that it was just a one-off fluke, but in reality you’ve discovered “clipping”.

The Issues: Clipping & Consumer Devices

Clipping is when you max out the digital circuitry.  It means your audio data has hit the +/- INT_MAX ceiling.   Audiophiles know it as “zero decibels full scale” (0 dBFS).  Beyond this, any additions will simply be truncated to that ceiling.  Waveforms which would have gracefully swung beyond this limit are abruptly flattened, leaving sharp corners.  These corners create all kinds of additional frequencies, known as harmonic distortion, which add noise and interfere with existing frequencies. The end result is an overall degradation in sound quality and a very irritating static sound, the musical equivalent of a stain on your clean white shirt.

Time Plot and Frequency Spectrum of a 100 Hz and a 161Hz Sine Wave Combined Without Clipping

Time plot and frequency spectrum of a 100 Hz and 161Hz sine waves combined without considering clipping

Clipping introduces unpleasant harmonic distortion

Clipping introduces unpleasant harmonic distortion. 1.0 is the floating point equivalent of 0 dBFS. Amplitude on the second graph is exaggerated to see the higher frequencies added.

Mobile devices present an additional challenge.   Normally, when working with audio in a digital audio workstation (DAW), such as Logic or Reaper, we curtail this problem by simply turning down the volume fader on the audio channel and turning up the speaker or headphone amplifier to compensate.  It’s called creating headroom.   As we’ll see below, this is indeed a valid solution for iOS apps. However since mobile devices are consumer gadgets, the amplifier (controlled by the volume buttons) has a fairly low maximum.  Additionally, users are generally accustomed to the Music player whose tracks are likely normalised to near 0 dBFS, so sounds played significantly below this will seem comparatively quiet.

Solution 1: Quick and Dirty Maths

This solution was originally discussed in Viktor Toth’s article, “Mixing Digital Audio” and illustrated with code on A Tasty Pixel’s post. With this technique we add the audio when there is no chance of clipping, i.e. when the samples from waveform A and B have different signs.  When they are the same sign and clipping might result, we combine them with a specialised averaging equation tweaked to eliminate certain problems discussed in Viktor’s article.

Piecewise Equation for Quick and Dirty Audio Mixing

Piecewise Equation for Quick and Dirty Audio Mixing

The maths are simple and fairly kind to your CPU for low polyphonies – hence the “quick” description.  The “dirty” stems from the fact that this piecewise equation introduces its own harmonic distortion.  You can see this in the graph below as corners at the points when the waveforms switch from having different signs to the same sign.  As in the clipping graph above, the amplitude scale on the frequency spectrum is slightly exaggerated to illustrate the high frequencies.  The graph confirms that this method introduces less distortion than clipping, but it’s still far from ideal.

The quick and dirty Toth method for adding two signals also creates harmonic distortion but may be acceptable for certain cases

The quick and dirty Toth method for adding two signals also creates harmonic distortion but may be acceptable for certain cases

Despite this visual depiction, for certain applications the ears apparently fail to detect any issues.  Applications which involve a low polyphony and whose sources are complex or noisy sounds (e.g. snare drums, rock/pop tracks) fare quite well.  A Tasty Pixel’s Loopy app is used by professional musicians and successfully employs this technique.  However for sounds which are simple waveforms or natural instruments, especially where the polyphony count adds up, this trick might pose problems.

Solution 2: Turn Down the Volume

The audio engine in the Sound Wand app presents an extreme example of this:  Its natural tones are sampled from a live string instrument and have very long sustain.  The nature of the instrument requires a max polyphony of at least 20.  This means the app typically demands simultaneous playback of 20 audio files, which are all normalised to 0 dBFS.  Given these extremes and the harmonic distortion issue, the previous technique is unsuitable.  After much experimenting with compression and limiter algorithms, the most feasible technique turned out simply to be to scale down the volume of the audio data.

So how much did it need to be scaled down, 1/20th?  Thankfully not.  Due to waveform interference and the short spike inherent in plucked string sounds, it is highly unlikely that any more than a few waveforms will hit positive/negative 0 dbFS  at the exact same time.  The end result is that even with fervent playback of the app, pre-multiplying by 0.3 is enough to prevent any noticeable clipping.

Incidentally, the idea for this came from studying OpenAL, which I believe uses this technique.  The first incarnation of the audio engine was written in OpenAL and I couldn’t understand why the sound level was so quiet, even at maximum volume.  It turns out that pre-multiplying by 0.3 gives just about the same level as an OpenAL setup with 20 playback sources allocated.

So what are the drawbacks?  On the whole, it works well and gives the motion-based instrument a very sensitive feel with soft notes which are quiet and a volume swell with more intense playback – much like a classical instrument.  However, this large dynamic range doesn’t necessary translate well on small sound systems which sometimes distort louder notes when you crank the volume enough to hear satisfactorily the soft notes.  The overall 1/3 pre-multiplication also makes it a bit quiet when heard via the iPhone’s speaker in a noisy environment.

Solution 3:  The Elusive (Lookahead) Limiter

Ideally, an optional limiter would be available in the settings for those difficult cases.  Chances are that you too have overheard someone nonchalantly dictating “you need a limiter”.  When I began down this route, I thought it would be fairly simple.  I’ve got limiters galore in my DAW, so surely their commonplace nature means there would be plenty of open algorithm’s available to borrow and tweak…

Turns out there is not.  Plus there are other issues:  Only a very special type of limiter will actually prevent clipping.

First I tried an envelope follower limiter, as is kindly detailed in this C++ code.  This type of limiter is equivalent to a compressor with a very high (infinite) ratio.  It has an attack time, which I set quite low, and a release time (see Wikipedia for a review of all this jargon).  The problem here is that the attack time, by design, makes the limiter slow to react, meaning that some clipping has already happened by the time the limiter’s volume reduction has fully kicked it.

Next I thought, “well just make the attack time zero” which amounts to a brickwall limiter.  Here we have a similar problem, one that afterwards is a bit of a “duh” moment:  the act of “immediately reducing the volume by the amount to make the audio 0 dbFS” is the exact same thing as “clipping”! Actually, this is only true during the rise portion of the audio waveform.  During the waning portion, the release time will keep the limiter’s volume reduction unnecessarily high which adds a bit of smoothness (and harmonic distortion) to the second half of the otherwise-would-be-clipped peak.  The end result is that only half the clipping is fixed.  This equates to the sound I observed – slightly improved but still evident.

Brickwall Limiting

Brickwall limiting without "lookahead" kicks in too late and only prevents half the clip

What’s the solution then?  Are you ready? Introducing…the lookahead brickwall limiter!  According to the manual for Waves’ industry standard version, the legendary L1-Ultramaximizer, a lookahead limiter avoids the possibility of overshoot by utilizing a lookahead technique that allows the system to anticipate and reshape signal peaks in a way that produces the bare minimum of audible artifacts”.

Sweet.  Let’s just dial that into Google to get our algorithm.  Turns out there isn’t any.  This outline on MusicDSP.org is the best we get.  I speculate that the reason for this glaring hole in our opensource collective mind is because optimised lookahead techniques are quite tricky and varied and therefore trade secrets of those companies who’ve done the work to build them.  This PDF by OmniaAudio explains the details as to why it’s not as trivial a technique as you might at first think.

Assuming we can even make our own, the technique is not without drawbacks.   The first big challenge, particularly on iOS, is efficiency.  As a result of the technique, each sample suddenly becomes dependent on all the samples which come after within the lookahead time as well as all the samples which precede it by the release time.  This convolving creates an algorithmic overhead and its complexity means we won’t be able to use the hi-speed, machine code optimised commands of the Accelerate framework.  What about Accelerate’s convolution functions?  Given that the effect of a nearby sample depends on its distance, this is not a standard, textbook convolution, though I still maintain hope that someone with more maths than I will step forward and explain how it might be adapted.

The other drawback to this technique is a potential deal breaker depending on your application.  In order to anticipate future clipping, you need to delay the output by the lookahead time.  This increases the latency, the time between the user’s action and when the result is heard.  For hi-performance musical instruments this can be a party killer if it’s too high.

A Plan for a Lookahead Brickwall Limiter

I have yet to implement this technique, so no code unfortunately, but the challenge continues to intrigue me.  Here’s my current thinking:

  1. Pad a ring buffer by the lookahead amount before you begin.
  2. In the render callback round, copy the audio onto the end of the ring buffer.
  3. Next, we want to create a gain reduction buffer which is the amount by which we need to multiple the original audio to limit the volume. To accomplish this, first copy out of the beginning of the ring buffer (copy – don’t consume – as we’ll need it again later), one audio buffer size worth of data.   It is delayed by the lookahead time and you should have an additional lookahead time’s worth of data afterwards in the ring buffer which we’ll use the next time around.
  4. In the copied buffer, “clamp” all “safe” values, i.e. those below magnitude 1.0, to 1.0 (assuming floating point and max=1.0).  See Accelerate’s vDSP Clip, Threshold and Limit functions.
  5. Next, apply the “release” on the peaks:  Loop forward through the samples, and only if you come across a value which has decreased compared to its previous neighbour, smooth it with the iterative decay function:
    S'(n) = (1-a)*S(n) + a * S'(n-1)

    Iterative Decay Function and Time Constant

    T is in seconds.  F is samples per second (e.g. 44,100). You might need a denormal cut-off below which you just set the value to 1.0.  See the first graph in the figure below.

  6. Now, apply the “attack” leading up to the peaks:  Loop backwards though the samples and perform the same operation (decreases only this time too).  Alpha can be different as here it is related to the lookaheadattack time. See the second graph in the figure below.
  7. Invert (1/x) the values.  This has no effect on “safe” samples (1/1.0=1.0), but converts others to a reduction multiplier rather than a divisor.
  8. Now consume the original (and shifted), audio samples from the beginning of the ring buffer and multiply by the processed copy.  The result is the third graph below.

    The result of the two passes on the peak data and the final waveform after applying the lookahead limiting

    The result of the two easing passes on the peak data and the final waveform after applying the lookahead limiting

Optional: To have the limiter increase the overall loudness of audio, even if not clipping, pre-multiply the audio data by a certain gain > 1. Then apply the above technique.  In other words, we curb the effect of the volume gain on the loudest components by forcing them into the clipping territory and thus, the quieter ones, untouched by the limiting, are boosted more by comparison.

Will It Work?

As the OmniaAudio PDF discussed, limiting is not without its own harmonic distortion, but the graphs below do show that it is substantially less (and different) than clipping…hooray!

Frequency spectrums of clipping and the two summing methods show relative levels of harmonic distortion added by each

Frequency spectrums of clipping and the two summing methods show relative levels of harmonic distortion added by each

As you might have guessed, it also turns out that the harmonic distortion is less when the lookahead time is longer.  The trade-off here is increased latency.

Open Trade Secrets

Applying this method shouldn’t require too much code or CPU overhead.  The iterative decay is far less intensive than calculating exponentials directly.  Hopefully, I’ll get some proper iOS code in here soon, but if you beat me to the punch please do share so we can plug this knowledge gap and help support all the well-deserving audio apps and indie devs out there.