The vocoder. It's undeniably attention-grabbing. And, for many, it is also completely off-putting. It is absolutely the anchovies-on-pizza of the synthesizer world. I like anchovies. But would I like a vocoder? I needed to find out...so, I built my own!
What is a Vocoder? A vocoder is an effect that makes a synthesizer sound like your voice. You sing or talk into a microphone and you play your synthesizer. The vocoder transfers certain qualities of your voice onto your synthesizer. Like a wah pedal, equalizer, or filter, a vocoder doesn't make its own sound; it manipulates and changes the sound from another device, such as a synthesizer.
Setup. As shown below, a vocoder needs a microphone and a synthesizer. The two are plugged into the vocoder. The output of the vocoder goes out to your PA or DAW. Pretty easy. It's what's inside the vocoder that is the magic.
Tracking Your Voice's Formants. To add more voice-like qualities, a real vocoder looks at the loudness of many different frequency regions of your voice. The different frequency regions are chosen to sense the different frequency regions associated with the different vowel sounds. As you make the different vowel sounds, you change the shape of your mouth (and nasal passages) to enhance or attenuate the different harmonics contained in your voice. This shaping is why an "A" sounds different from an "E" and from an "O"; the peaks in the frequency response (the "formants") are very different for the different vowel sounds. A vocoder detects this frequency shaping and applies the same frequency shaping to the synthesizer's audio. The result is a voice-like quality to the sound of the synth.
Multiband Vocoder. Our 1-band vocoder changed the loudness of the overall signal -- all frequencies were affected equally. To make the synth more voice like, we want to sense the frequency shaping (the formants) as it dynamically changes to make the different vowels. Therefore, we need to break up the audio so that we can control the loudness of individual slices of the frequency spectrum. To make this give us this finer control, we copy our 1-band vocoder many times in order to get a multiband vocoder. The illustration below expands our 1-band vocoder into a 3-band vocoder.
Breaking Up the Audio into Frequency Bands. The core of this system are still the green RMS blocks (envelope followers) and orange gain blocks (VCAs) as discussed before. This system still controls loudness. But, note that we precede each of these channels with a bandpass filter. Therefore, each channel is controlling the loudness of just one frequency region. In this simple 3-band example, the first filter might isolate the low frequencies so that that loudness of the low frequencies are made to be the same between the synth and the voice. The middle filter and last filter would do the same for the middle and high frequencies. Now, we are impressing more of the voice's qualities onto the synth. A real vocoder uses 8-16 of these channels, which gives it much better resolution, making the output even more voice-like.
Making My Own Vocoder. In the old days, the bandpass filters, RMS blocks, and gain blocks would all be implemented by actual electronic circuits. Today, however, it's much easier to write signal processing software to perform these functions. That's what I did. I used an open-source digital audio device from Blackaddr (the "Teensy Guitar Audio Pro", TGA) and wrote software to implement the vocoder. The TGA uses a Teensy 3.6 as its processor, which can be programmed in the Arduino IDE, This is great because Arduino is what I use for many of my other synth hacks. I've shared my Teensy/Arduino vocoder software on my GitHub here.
Filtering and Processing Speed. Having up to 16 channels, and needing at least two filters per channel, a vocoder needs a lot of filters. So, your choice of filter is important as it can consume a lot of the available processing power. I used the multimode filter model that comes in the Teensy Audio Library They implemented the filter as a time-domain IIR filter with fixed-point operations. I ended up using two filters in series to sharpen the filter's response (though I'm not sure that was really necessary). Even with the burden of the extra filtering, the Teensy 3.6 was fast enough to enable a 16-band vocoder. With the double-filtering, that's 64 filters in total! I was pleased.
Filter Frequencies. An important choice is to pick the frequency bands for your vocoder's filters. I know that the frequencies should be tailored to the human voice, but I had no specific guidance on what frequencies to use. After a bit of trial-and-error, I chose to center my first filter at 125 Hz and then I step the frequency upward by a factor of 1.319x for each subsequent filter. This seemingly-bizarre value is partway between half-octave steps (1.414x) and third-octave steps (1.260x). Using this step size, my filters end up being centered at: 125 Hz, 165 Hz, 218 Hz,...<etc>..., 4595 Hz, 6063 Hz and 8000 Hz. To me, this felt like a good span for the human voice.
Make Your Own! To be clear, when I wrote this vocoder, many of my choices were arbitrary. Don't be afraid to make your own choices! Choose a different type of bandpass filter. Choose different filter frequencies. Use FFTs instead of time-domain filters. That's the beauty of hacking using open technology: you can try things out for yourself! Go and have fun!