Sound Synthesis II: Digital Recording

Digital Recording

Things changed with the advent of compact discs, and later DVDs and MP3s as well. Instead of storing the continuously changing shape of the sound wave, these store the sound digitally.

What do we mean by digitally? It means the sound is stored as a collection of numbers. In fact, the numbers are binary, which means only two digits are allowed – 0 and 1. The music on a CD, or in an MP3 file, is nothing more than a very long string of 0s and 1s.

How do you get from the shape of the sound to a string of numbers? After all, the sound wave graphics we saw last time looks very different from 1000110111011011010111011000100. First of all, you sample the sound signal. That means you look at where it is at certain individual points in time, and ignore it the rest of the time. Imagine drawing the shape of a sound wave on a piece of graph paper like this:


To sample this signal, we can look at where the signal is each time it crosses one of the vertical lines. We don’t care what it’s doing the rest of the time – only its intersections with the lines matter now. Here’s the same sound, but instead of showing the full wave, we just show the samples (as Xs):


To simplify things further so we can stick to dealing with whole numbers, we’re also going to move each sample to the nearest horizontal grid line. This means that all the samples will be exactly on an intersection where two of the grid lines cross:


So far, so good. We have a scattering of Xs across the graph paper. Hopefully you can see that they still form the shape of the original sound wave quite well. From here, it’s easy to turn our sound wave into a stream of numbers, one for each sample. We just look at each vertical grid line and note the number of the horizontal grid line where our sample is:


The wave we started with is now in digital form: 5, 9, 5, 6, 7, 1, 2, 6, 4, 6. It’s still in ordinary decimal numbers, but we could convert it to binary if we wanted to. (I won’t go into details of how to convert to binary here, but if you’re curious, there are plenty of explanations of binary online – here’s one). We can record this stream of numbers in a file on a computer disk, on a CD, etc. When we want to play it back, we can reverse the process we went through above to get back the original sound wave. First we plot the positions of the samples onto graph paper:


And now we draw the sound wave – all we have to do is join up our samples:


Voila! All ready to be played back again.

This might look very spiky and different from the original smooth sound wave. That’s because I’ve used a widely spaced grid with only a few points here so you can see what’s going on. In real digital audio applications, very fine grids and lots of samples are used so that the reconstructed wave is very, very close to the original – to show just one second of CD quality sound, you would need a grid with 65,536 horizontal and 44,100 vertical lines!

(In electronics, the device that turns an analogue sound wave into samples is called an analogue to digital converter, and its cousin that performs the inverse task is a digital to analogue converter. As you probably guessed, it’s not really done using graph paper).

But why?

At this point you may be wondering, why bother with digital recording? It seems like we just went through a complicated process and gained nothing – in fact, we actually lost some detail in the sound wave, which doesn’t look quite the same after what it’s been through! There are several advantages to digital recording:

  • Digital recordings can be easily manipulated and edited using a computer. Computers (at least all the ones in common use today) can only deal with digital information – anything analogue, such as sounds and pictures, has to be digitised before they will work with it. This opens up a huge range of possibilities, allowing much more sophisticated effects and editing techniques than could be accomplished in the analogue domain. It also allows us to do clever things like compressing the information so it takes up less space while still sounding much the same (this is what the famous MP3 files do).
  • I noted above that we lost a bit of detail in our sound wave when we converted it to digital and then converted it back. However, in real life situations digital recordings generally give much better sound quality than analogue recordings. This is because the small inaccuracies introduced in the digitisation process are usually much smaller and less noticeable than the background noise that inevitably gets into analogue recording and playback equipment no matter how careful you are. Digital is more or less immune to background noise for reasons I’ll explain shortly.
  • Digital recordings can be copied an unlimited number of times without losing any quality. This is closely related to the point above about sound quality. If you’re old enough to have copied records or cassettes onto blank tapes, or taped songs off the radio, you may have noticed this in action. The copy always sounds worse than the original, with more background noise. If you make another copy from that copy instead of from the original, it will be worse still. But it isn’t like that with digital recording – if you copy a CD to another CD, or copy an MP3 file from one computer to another, there is no loss of quality – the copy sounds exactly like the original, and if you make another copy from the copy, it will also sound exactly like the original. (This isn’t just a case of the loss in quality being so small you can’t hear it – there genuinely is no loss whatsoever. The copies are absolutely identical!).

Notes on background noise

I mentioned above that digital recordings are more or less immune to background noise and that’s one of their big advantages. But first of all, what is background noise, where does it come from, and what does it do to our sound signals?

Background noise is any unwanted interference that gets into the sound signal at some point during the recording or playback process. It can come from several different sources – if the electrical signal is weak (like the signal from a microphone or from a record player’s pick-up), it can be affected by electromagnetic interference from power lines or other devices in the area. If there is dust or dirt on the surface of a record or tape, this will also distort the signal that’s read back from it.

There is no getting away from background noise, it will always appear from somewhere. If we have a vinyl record with a sound signal recorded onto it that looks like this:


by the time it gets played back through the speakers, noise from various sources will have been added to the original signal and it might look more like this:


Once the noise is there, it’s very difficult or impossible to get rid of it again, mainly because there’s no reliable way to tell it apart from the original signal. So ideally we want to minimise its chances of getting there in the first place. This is where digital recording comes in. Let’s say we have the same sound signal recorded onto a CD instead of a vinyl record. Because it’s in digital form, it will be all 0s and 1s instead of a continuously varying wave like on the vinyl. So the information on the CD will look something like this:


This time there are only two levels, one representing binary 0 and the other binary 1.

There will still be some noise added to the signal when it gets read back from the CD – maybe there is dust on the disc’s surface or electrical interference getting to the laser pick-up. So the signal we get back will look more like this:


But this time the noise doesn’t matter. As long as we can still tell what is meant to be a 0 and what is a 1, small variations don’t make any difference. In this case it’s very obvious that the original signal shape was meant to be this:


So, despite the noise, we recovered exactly the original set of samples. We can pass them through the digital to analogue converter (DAC) and get back this:


a much more accurate version of the original sound wave than we got from the analogue playback. Although the noise still got into the signal we read from the CD, it’s disappeared as if by magic and doesn’t affect what we hear like it did with the record.

(Of course, digital recording isn’t completely immune to noise. If the noise level was so high that we could no longer tell what was meant to be a 0 and what was a 1, the system would break down, but it’s normally easy enough to stop this from happening. Also, we can’t prevent noise from getting into the signal after it’s converted back to analogue form, but again this is a relatively small problem as the majority of the recording and playback system works in digital form).

Does digital recording really sound better?

Not everyone thinks so. A lot of people say they prefer the sound of analogue recordings, often saying they have a “warmer” sound compared with the “colder” sound of digital. In my opinion, yes there is a difference, but digital is more faithful to the original sound – the “warmth” people talk about is actually distortion introduced by the less accurate recording method! It’s absolutely fine to prefer that sound, in the same way that it’s absolutely fine to prefer black and white photography or impressionist paintings even though they’re less realistic than a colour photograph or a painting with lots of fine detail.

“Ah”, you might say. “But surely a perfect analogue recording would have to be better than a digital recording? Because you’re recording everything rather than just samples of it”. Technically this is true… but in reality (a) there’s no such thing as a perfect analogue recording because there are so many ways for noise to get in, and (b) at CD quality or better, the loss of information from digitising the sound is miniscule, too small for anyone to be able to hear. Double-blind tests have been done where audio experts listened to sounds and had to determine whether the sound had been converted to digital and back or not. No-one was able to reliably tell.

Phew! That was longer than I meant it to be. That’s the background… next time I really will start on actual sound synthesis, I promise!


Sound Synthesis I: How Sound Recording Works

Hello, and welcome to the first of several blog entries inspired by one of my projects, Project Noah. Actually that’s just my code name – its real name is Next Generation Sound Synthesis. I actually get paid for working on this one!

As the name suggests, the project is about new ways of synthesising sound, creating more realistic sounding digital instruments and better acoustic models. I think it’s a pretty interesting area, and the approach being taken (physical modelling synthesis) shows a lot of promise. But before I get onto that, I’d like to go back to basics a bit and talk about computer synthesis in general, giving some examples of the different ways it’s been done over the years, what they sound like, their strengths and weaknesses, etc. I’ll be talking mainly from a computer programmer’s perspective rather than a musicians, so my examples will draw mainly from the sound chips and software used to make sounds in computers and games consoles rather than from music synthesizers. (Although I do play music keyboards, I don’t know a great deal about the technical side of them, especially not the earlier ones).

In fact, before I even start on that, I’m going to go even further back to basics and talk about how sound recording works in this first entry. (If it’s not immediately clear how that’s relevant to synthesis, I hope it will become clearer by the end).

Recording Sound

Sound is vibrations in the air that we can hear when they are picked up by our ear drums. To record sound and be able to play it back later, we need some means of capturing and storing the shape of those vibrations as they happen – and also a means of turning the stored shapes back into vibrations again so that they can be heard.

The earliest methods of sound recording didn’t even rely on any electronics – they were entirely mechanical. A diaphragm would pick up the vibrations from the air, then a needle connected to the diaphragm would etch out a groove in some soft medium – initially wax cylinders, later flat vinyl discs. The cylinder or disc would be turned by hand or by clockwork. The groove’s shape would correspond to the shape of the sound waves as they changed over time.


This isn’t actually a mechanical gramophone, but it is the oldest one I could easily get hold of. It used to be my Granny’s.

To play back the sound, the process was reversed; the needle was made to run along the groove, transmitting its shape to the diaphragm, which would vibrate in the right way to recreate the original sound (more or less – the quality of these early systems left a lot to be desired).

It’s worth pausing for a moment to say something about how the shapes of sound wave relate to what they actually sound like to us. First of all, and maybe not surprisingly, a sound wave with bigger variations (larger peaks and troughs) sounds louder than one with smaller variations. So this:


sounds exactly the same as this:


except the first one is a lot louder.

So the height of the peaks in the wave (often called the amplitude) determines the loudness, more or less. The pitch (that is, whether the sound is high like a flute or low like a tuba) depends on how close together the peaks are. When there are a lot of peaks in quick succession like this:


the sound is high pitched. When there aren’t so many, like this:


the sound will be deeper. This is called the frequency of the sound wave. Frequency is measured in Hertz (abbreviated to Hz), which is simply the number of peaks the wave has per second. Humans can hear frequencies in the range of about 20Hz up to 20,000Hz, but are much better at hearing sounds around 1,000Hz than sounds at either of those extremes. Also, ability to hear high frequencies tends to tail off quite dramatically with age, so it’s unlikely adults will be able to hear all the way up to 20,000Hz (20kHz).

Real sound waves (such as speech, music, everyday noises) are usually more complex than the ones I’ve shown above and are made up of a whole mixture of different frequencies and amplitudes, which also vary over time. This makes things more interesting from the perspective of synthesising sounds.

Electronic Recording

The simple mechanical recording system was improved with the advent of electronics. Electronic recording was more complex but resulted in much better sound quality. In the electronic system, a microphone is used to turn the sound vibrations into electrical signals whose voltage varies over time in the same shape as the sound waves. Having the sound in electronic form opens up lots more possibilities – for example, it can be boosted by electronic amplifiers, allowing a stronger signal to be stored, and the sound to be played back at much louder volumes. It can also be much more easily mixed with other sound signals, very useful for editing recordings.


The first electronic systems still stored the sound as a groove cut in a vinyl disc, just as the original mechanical systems had. And as in the mechanical systems, the groove was the same shape as the original sound waves – there was no fancy encoding or conversion going on. Later, sound was also stored as a varying magnetic field on magnetic tape. The variations in magnetic field strength, like the shape of the grooves, corresponded exactly to the shape of the sound being recorded. This is known as analogue recording.


Tune in next time for lots of information about the next big innovation in sound recording: digital recording!