Digital Sound Synthesis
Before I delve into describing different types of synthesis, I should start with a disclaimer: I’m coming at this mainly from the angle of how old computers (and video game systems) used to synthesise sound rather than talking about music synthesisers, because that’s where most of my knowledge is. Although I have owned various keyboards, I don’t have a deep knowledge of exactly how they work as I’m more of a pianist than a keyboard player really. There is quite a bit of overlap between methods used in computers and methods used in musical instruments though, especially more recently.
To illustrate the different synthesis methods, I’m going to be using the same example sound over and over again, synthesised in different ways. It’s the glockenspiel part from the opening of Sonic Triangle‘s sort-of Christmas song “It Could Be Different”. For comparison to the synthesised versions, here it is played (not particularly well, but you should get the idea!) on a real glockenspiel:
(In fact, in the original recording of the song, it isn’t a real glockenspiel. It’s the sample-based synthesis of my Casio keyboard… there’ll be more about that sort of synthesis later).
If you have trouble hearing the sounds in this post, try right clicking the links, saving them to your hard drive and opening them from there. Seriously, I can’t believe that in 2013 there still isn’t an easy way of putting sounds on web pages that works on all major browsers. Grrrr!
As we saw last time, digital sound recordings (which include CDs, DVDs, and any music files on a computer) are just very long lists of numbers that were created by feeding a sound wave into an analogue-to-digital converter. To play them back, we feed the numbers into a digital-to-analogue converter and then play back the resulting sound using a loudspeaker. But what if, instead of using a list of numbers that was previously recorded, we used a computer program to generate a list of numbers and then played them back in the same way? This is the basis of digital sound synthesis – creating entirely new sounds that never existed in reality.
Very old (1980s) home computers and games consoles tended to only be able to generate very primitive, “beepy” sounding music. This was because they were generating basic sound wave shapes that aren’t like anything you’d get from a real musical instrument. The simplest of all, used by a lot of early computers, is a square wave:
Another option is the triangle wave, with a slightly softer sound:
The sound could be improved by giving each note a “shape” (known as its envelope), so that a glockenspiel sound, for example, would start loud and then die away, like a real glockenspiel does:
None of these methods sound particularly nice, and it’s hard to imagine any musician using them now unless they were deliberately going for a retro electronic sort of effect. But they have the advantage of being very easy to synthesise, requiring only a simple electronic circuit or a few lines of program code. (I wrote a program to generate the sound samples in this section from scratch in about half an hour). The square wave, for example, only has two possible levels, so all the computer has to do is keep track of how long to go before switching to the other level. The length of time spent on each level determines the pitch of the sound produced, and the difference in height between the levels determines the volume.
I remember being very excited when we upgraded from our old ZX Spectrum +3, which could only do square wave synthesis, to a PC and a Sega Megadrive that were capable of FM (Frequency Modulation) Synthesis. They could actually produce the sounds of different instruments! Looking back now, they didn’t sound very much like the instruments they were supposed to, but it was still a big improvement on square waves.
FM synthesis involves combining two (or sometimes more) waves together to produce a single, more complex wave. The waves are generally sine waves and the combination process is called frequency modulation – it means the frequency of one wave (the “carrier”) is altered over time in a way that depends on the other wave (the “modulator”) to produce the final sound wave. So, at low points on the modulator wave, the carrier wave’s peaks will be spread out with a longer distance between them, while at the high points of the modulator they will be bunched up closer together, like this:
Some FM synthesisers can combine more than two waves together in various ways to give a richer range of possible sounds.
Here’s our glockenspiel snippet synthesised in FM:
(In case you’re curious, this was done using DOSBox, which emulates the Yamaha OPL-2 FM synthesiser chip used in the old Adlib and SoundBlaster sound cards common in DOS PCs, and the Allegro MIDI player example program. Describing how to get an ancient version of Allegro up and running on a modern computer would make a whole blog post in itself, but probably not a very interesting one).
It’s certainly a step up from the square wave and triangle wave versions. But it still sounds unnatural; you would be unlikely to mistake it for a real glockenspiel.
FM synthesis is a lot more complicated to perform than the older primitive methods, but by the 90s FM synthesiser chips were cheap enough to put in games consoles and add-in sound cards for PCs. Contrary to popular belief, they are not analogue (or hybrid analogue-digital) synths; they are fully digital devices apart from the final conversion to analogue at the end of the process.
In case you were wondering, this is pretty much the same “frequency modulation” process that is used in FM radio. The main difference between the two is that in FM radio, you have a modulator wave that is an audio signal, but the carrier wave is a very high frequency radio wave (up in the megahertz, millions-of-hertz range). In FM synthesis, both the carrier and modulator are audio frequency waves.
Today, when you hear decent synthesised sound coming from a computer or a music keyboard, it’s very likely to be using sample-based methods. (This is often referred to as “wavetable synthesis”, but strictly speaking this term refers to only a quite specific subset of the sample-based methods). Sample-based synthesis is not really true synthesis in the same way that the other methods I’ve talked about are – it’s more a clever mixture of recording and synthesis.
Sample-based synthesis works by using short recordings of real instruments and manipulating and combining them to generate the final sound. For example, it might contain a recording of someone playing middle C on a grand piano. When it needs to play back a middle C, it can play back the recording unchanged. If it needs the note below, it will “stretch out” the sample slightly to increase its wavelength and lower its frequency. Similarly, for the note above it can “compress” the sample so that its frequency increases. It can also adjust the volume if the desired note is louder or quieter than the original recording. If a chord needs to be played, several instances of the sample can be played back simultaneously, adjusted to different pitches.
This synthesis method is not too computationally intensive; sound cards capable of sample-based synthesis (such as the Gravis Ultrasound and the SoundBlaster AWE 32/64) became affordable in the mid 90s and today’s computers can easily do it in software. Windows, for example, has a built-in sample-based synthesiser that is used to play back MIDI sound if there isn’t a hardware synth connected. Sound quality can be very good for some instruments – it is typically very good for percussion instruments, reasonable for ensemble sounds (like a whole string section or a choir), and not so good for solo string and wind instruments. The quality also depends on how good the samples themselves are and how intelligent the synth is at combining them.
Here’s the glockenspiel phrase played on a sample-based synth (namely my Casio keyboard):
This is a big step up from the other synths – this time we have something that might even be mistaken for a real glockenspiel! But it’s not perfect… if you listen carefully, you’ll notice that all of the notes sound suspiciously similar to each other, unlike the real glockenspiel recording where they are noticeably different.
Next time I’ll talk about the limitations of the methods I’ve described in this post, and what can be done about them.