Sound Synthesis IV: Next Generation Sound Synthesis

Last time we looked at (and listened to!) various methods of digital sound synthesis, beginning with the very primitive systems used by early computers, and ending with the sample-based methods in widespread use today. This time I’m going to talk about a new and very promising method currently in development.

What’s wrong with sample-based synthesis?

Our glockenspiel test sound already sounded pretty good using the sample-based method… do we really need a more advanced method? The answer is, although sample-based synthesis does work very well for certain instruments under certain conditions, it doesn’t work well the whole time.

Although it’s based on samples of real instruments, it’s still not fully realistic. Often the same sample will be used for different notes and different volumes, with the synth altering the frequency and amplitude of the sample as needed. But on a real piano (for example), the notes will all sound subtly different. A high C won’t sound exactly the same as a low C with its frequency increased, and pressing a key hard will result in a very different sound from pressing the same key softly – it won’t just be louder. Some of the better synths will use a larger number of samples in an attempt to capture these nuances, but the amount of data can become unmanageable. And that’s just considering one note at a time. In a real piano, when multiple notes are being played at the same time, the vibrations in all the different strings will influence each other in quite complex ways to create the overall sound.

It gets even worse for string and brass instruments. For example, changing from one note to another on a trumpet can sound totally different depending on how fast the player opens and closes the valves and it is unlikely a sample-based system will be able to reproduce all the possibilities properly without recording an unrealistically large number of samples. In some genres of music, the player may do things with the instrument that were never intended, such as playing it with a valve only part way open. A sample-based system would have no way of dealing with such unforeseen cases – if no-one recorded a sample for that behaviour, it can’t synthesise it.

The other problem with many of the synthesis methods is one of control. Even if it were possible to get them to generate the desired sound, it’s not always very obvious how to do it. FM synthesisers, for example, take a bewildering array of parameters, many of which can seem like “magic numbers” that don’t bear any obvious relation to the sound being generated. To play a note, sound envelopes and frequencies need to be set for every operator, the waveforms can be adjusted, and the overall configuration of the operators also needs to be set. Hardly intuitive stuff for people accustomed to thinking in terms of instruments and notes.

Physical Modelling Synthesis

A newer synthesis method has the potential to solve both the realism problem and the control problem, giving musicians virtual instruments that not only sound more realistic but are much easier to “play” and will correctly handle all situations, even ones that weren’t envisaged when the synth was designed. This is called Physical Modelling Synthesis, and it’s the basis for the project I’m working on just now.

The basic idea is that instead of doing something abstract that just happens to give the result you want (like e.g. FM synthesis), or “cheating” with recordings to give a better sounding result (like sample-based synthesis), you simulate exactly how a real instrument would behave. This means building a mathematical model of the entire instrument as well as anything else that’s relevant (the surrounding air, for example). Real instruments create sound because they vibrate in a certain audible way when they are played – whether that’s by hitting them, bowing them, plucking their strings, blowing into them, or whatever. Physical modelling synthesis works by calculating exactly how the materials that make up the instrument would vibrate given certain inputs.

How do we model an instrument mathematically? It can get very complex, especially for instruments that are made up of lots of different parts (for example, a piano has hundreds of strings, a sound board, and a box filled with air surrounding them all). But let’s start by looking at something simpler: a metal bar that could be, for example, one note of a glockenspiel.

glockdiagram1

To simulate the behaviour of the bar, we can divide it into pieces called elements. Then for each element we store a number, which will represent the movement of that part of the bar as it vibrates. To begin with, the bar will be still and not vibrating, so all these numbers will be zero:

glockdiagram2

We also need something else in this setup – we need a way to hear what’s going on, otherwise the whole exercise would be a bit pointless. So, we’ll take an output from towards the right hand end of the bar:

glockdiagram3

Think of this like a sort of “virtual microphone” that can be placed anywhere on our instrument model. All it does is take the number from the element it’s placed on – it doesn’t care about any of the other elements at all. At the moment the number (like all the others) is stuck at zero, which means the microphone will be picking up silence. As it should be, because a static, non-moving bar doesn’t make any sound.

Now we need to make the bar vibrate so that it does generate some sound. To do this, we will simulate hitting the bar with a beater near its left hand end:

glockdiagram4

What happens when the beater hits the bar? Essentially, it just makes the bar move slightly. So now, instead of all zeroes in our element numbers, we have a non-zero value in the element that’s just been hit by the beater, to represent this movement:

glockdiagram5

But the movement of the bar won’t stay confined to this little section nearest where the beater hit. Over time, it will propagate along the whole length of the bar, causing it to vibrate at its resonant frequency. After some short length of time, the bar might look like this:

glockdiagram6

and then like this:

glockdiagram7

then this:

glockdiagram8

As you can see, the value from the beater strike has “spread out” along the bar so now the majority of the bar is displaced in one direction or another. The details of how this is done depend on the material and exactly how the bar is modelled, but basically each time the computer updates the bar, the number in each box is calculated based on the previous numbers in all the surrounding boxes. (The values that were in those boxes immediately before the update are the most influential, but for some models numbers from longer ago come into play as well). Sometimes the boxes at the ends of the bar are treated differently from the other boxes – in fact, they are different, because unlike the boxes in the middle they only have a neighbouring box on one side of them, not both. There are various different ways of treating the edge boxes, and these are referred to as the model’s boundary conditions. They can get quite complex so I won’t say more about them here.

Above I said “some short length of time”, but that’s quite vague. We actually want to wait a very specific length of time, called the timestep, between updates to the bar. The timestep is generally chosen to match the sampling rate of the audio being output, so that the microphone can just pick up one value each time the bar is updated and output it. So, for a CD quality sample rate of 44100Hz, a timestep lasts 1/44100th of a second, or 0.0000226757 seconds.

If the model is working properly, the result of all this will be that the bar vibrates at its resonant frequency – just like the bar of a real glockenspiel. Every timestep, the “microphone” will pick up a value, and when this sequence of values is played back through speakers, it should sound like a metal bar being hit by a beater.

Here are the first 20 values picked up by the microphone: 0, 0, 0.022, -0.174, -0.260, 0.111, 0.255, 0.123, 0.426, 0.705, 0.495, 0.342, 0.293, 0.116, 0.016, 0.009, 0.033, -0.033, -0.312, -0.321, -0.030

and here’s a graph showing the wave produced by them:

pmgraph

To simulate a whole glockenspiel, we can model several of these bars, each one a slightly different length so as to produce a different note, and take audio outputs from all of them. Then if we hit them with our virtual beater at the right times, we can hear our test sample, this time generated by physical modelling synthesis:

pmsynth

I used a very primitive version of physical modelling synthesis to generate this sample, so it doesn’t sound amazing. I also used a bit of trial and error tweaking to get the bar lengths I wanted, so the tuning isn’t perfect. Both the project, and my knowledge of this type of synthesis, are still in fairly early stages just now! In the next section I’ll talk about what we can do do improve the accuracy of the models, and therefore also the quality of the sound produced.

Accuracy and model complexity

In our project we are mainly going for quality rather than speed. We want to try and generate the best quality of sound that we can from these models; if it takes a minute (or even an hour) of computer time to generate a second of audio, we don’t see that as a huge problem. But obviously we’d like things to run as fast as possible, and if it’s taking days or weeks to generate short audio samples, that is a problem. So I’ll say a bit about how we’re trying to improve the quality of the models, as well as how we hope to keep the compute time from becoming unmanageable.

A long thin metal bar is one of the simplest things to model and we can get away with using a one-dimensional row of elements (as demonstrated above) for this. But for other instruments (or parts of instruments), more complex models may be required. To model a cymbal, for example, we will need a two-dimensional grid of elements spaced across the surface of the cymbal. And for something big and complicated like a whole piano, we would most likely need individual 1D models for each string, a 2D model for the sound board, and a 3D model for the air surrounding everything, all connected and interacting with each other in order to get an accurate synthesis. In fact, any instrument model can generally be improved by embedding it in a 3D space model, so that it is affected by the acoustics of the room it is in.

There are also different ways of updating the model’s elements each timestep. Simple linear models are very easy and fast to compute and are sufficient for many purposes (for example, modelling the vibration of air in a room). Non-linear models are much more complicated to update and need more compute time, but may be necessary in order to get accurate sound from gongs, brass instruments, and others.

Inputs (for example, striking, bowing, blowing the model instruments) and how they are modelled can have an effect as well. The simplest way to model a strike is to add a number to one of the elements of the model for just a single timestep as shown in the example above, but it’s more realistic to add a force that gradually increases and then gradually decreases again across several timesteps. Bowing and blowing are more complicated. With most of these there is some kind of trade-off between the accuracy of the input and the amount of computational resources needed to model it.

2D models and especially 3D models can consume a lot of memory and take a huge number of calculations to update. For CD quality audio, quite a finely spaced grid is required and even a moderately sized 3D room model can easily max out the memory available on most current computers. Accurately modelling the acoustics of a larger room, such as a concert hall, using this method is currently not realistic due to lack of memory, but should become feasible within a few years.

The number of calculations required to update large models is also a challenge, but not an insurmountable one. Especially for the 3D acoustic models, the largest ones, we usually want to do the same (or very similar) calculations again and again and again on a massive number of points. Fortunately, there is a type of computer hardware that is very good at doing exactly this: the GPU.

GPU stands for graphics processing unit, and these processors were indeed originally designed for generating graphics, where the same relatively simple calculations need to be applied to every polygon or every pixel on the screen many, many times. In the last few years there has been a lot of interest in using GPUs for other sorts of calculations, for example scientific simulations, and now many of the world’s most powerful supercomputers contain GPUs. They are ideal for much of the processing in our synthesis project where the simple calculations being applied to every point in a 3D room model closely parallel the calculations being applied to every pixel on the screen when rendering an image.

Advantages of Physical Modelling Synthesis

You might wonder, when sample-based synthesis is getting so good and is so much easier to perform, why bother with physical modelling synthesis? There are three main reasons:

  • Sound quality. With a good enough model, physical modelling synthesis can theoretically sound just as good as a real instrument. Even with simpler models, certain instrument types (e.g. brass) can sound a lot better than sample-based synthesis.
  • Flexibility. If you want to do something more unusual, for example hitting the strings of a violin with the wooden side of the bow instead of bowing them with the hair, or playing a wind instrument with the valves half-open, you are probably going to be out of luck with a sample-based synthesiser. Unless whoever designed the synthesiser foresaw exactly what you want and included samples of it, there will be no way to do it. But physical modelling synthesis can – you can use the same instrument model and just modify the inputs however you want.
  • Ease of control. I mentioned at the beginning that older types of synthesiser can be hard to control – although they may theoretically be able to generate the sound you want, it might not be at all obvious how to get them to do it, because the input parameters don’t bear much obvious relation to things in the “real world”. FM is particularly bad for this – to play a note you might have to do something like: “Set the frequency of operator 1 to 1000Hz, set its waveform type to full sine wave, set its attack rate to 32, its decay rate to 18, its sustain level to 5 and its release rate to 4. Now set operator 2’s frequency to 200Hz, its attack rate to 50, decay rate 2, sustain level 14, release rate 3. Now chain the operators together so that 2 is modulating 1”. (In reality the quoted text would be some kind of programming language rather than English, but you get the idea). Your only options for getting the sound you want are likely to be trial and error, or using a library of existing sounds that someone else came up with by trial and error.

Contrast this with how you might play a note on a physical modelling synthesiser: “Hit the left hand bar of my glockenspiel model with the virtual beater 10mm from its front end, with a force of 10N”. Much better, isn’t it? You might still use a bit of trial and error to find the optimum location and force for the hit, but the model’s input parameters are a lot closer to things we understand from the real world, so it will be a lot less like groping around in the dark. This is because we are trying to model the real world as accurately as possible, unlike FM and sample-based synthesisers which are abstract systems attempting to generate sound as simply as possible.

Here’s a link to the Next Generation Sound Synthesis project website. The project’s been running for a year and has four years still to go. We’re investigating several different areas, including how to make good quality mathematical models for various types of instruments, how to get them to run as fast as possible, and also how to make them effective and easy to use for musicians.

Of course, whatever happens I doubt we will be able to synthesise the bassoon ;).

Sound Synthesis III: Early Synthesis Methods

Digital Sound Synthesis

Before I delve into describing different types of synthesis, I should start with a disclaimer: I’m coming at this mainly from the angle of how old computers (and video game systems) used to synthesise sound rather than talking about music synthesisers, because that’s where most of my knowledge is. Although I have owned various keyboards, I don’t have a deep knowledge of exactly how they work as I’m more of a pianist than a keyboard player really. There is quite a bit of overlap between methods used in computers and methods used in musical instruments though, especially more recently.

To illustrate the different synthesis methods, I’m going to be using the same example sound over and over again, synthesised in different ways. It’s the glockenspiel part from the opening of Sonic Triangle‘s sort-of Christmas song “It Could Be Different”. For comparison to the synthesised versions, here it is played (not particularly well, but you should get the idea!) on a real glockenspiel:

glockenspiel

(In fact, in the original recording of the song, it isn’t a real glockenspiel. It’s the sample-based synthesis of my Casio keyboard… there’ll be more about that sort of synthesis later).

If you have trouble hearing the sounds in this post, try right clicking the links, saving them to your hard drive and opening them from there. Seriously, I can’t believe that in 2013 there still isn’t an easy way of putting sounds on web pages that works on all major browsers. Grrrr!

Primitive Methods

As we saw last time, digital sound recordings (which include CDs, DVDs, and any music files on a computer) are just very long lists of numbers that were created by feeding a sound wave into an analogue-to-digital converter. To play them back, we feed the numbers into a digital-to-analogue converter and then play back the resulting sound using a loudspeaker. But what if, instead of using a list of numbers that was previously recorded, we used a computer program to generate a list of numbers and then played them back in the same way? This is the basis of digital sound synthesis – creating entirely new sounds that never existed in reality.

Very old (1980s) home computers and games consoles tended to only be able to generate very primitive, “beepy” sounding music. This was because they were generating basic sound wave shapes that aren’t like anything you’d get from a real musical instrument. The simplest of all, used by a lot of early computers, is a square wave:

synth3_1

square wave sound

Another option is the triangle wave, with a slightly softer sound:

synth3_2

triangle wave sound

The sound could be improved by giving each note a “shape” (known as its envelope), so that a glockenspiel sound, for example, would start loud and then die away, like a real glockenspiel does:

synth3_3

triangle wave with envelope sound

None of these methods sound particularly nice, and it’s hard to imagine any musician using them now unless they were deliberately going for a retro electronic sort of effect. But they have the advantage of being very easy to synthesise, requiring only a simple electronic circuit or a few lines of program code. (I wrote a program to generate the sound samples in this section from scratch in about half an hour). The square wave, for example, only has two possible levels, so all the computer has to do is keep track of how long to go before switching to the other level. The length of time spent on each level determines the pitch of the sound produced, and the difference in height between the levels determines the volume.

FM Synthesis

I remember being very excited when we upgraded from our old ZX Spectrum +3, which could only do square wave synthesis, to a PC and a Sega Megadrive that were capable of FM (Frequency Modulation) Synthesis. They could actually produce the sounds of different instruments! Looking back now, they didn’t sound very much like the instruments they were supposed to, but it was still a big improvement on square waves.

FM synthesis involves combining two (or sometimes more) waves together to produce a single, more complex wave. The waves are generally sine waves and the combination process is called frequency modulation – it means the frequency of one wave (the “carrier”) is altered over time in a way that depends on the other wave (the “modulator”) to produce the final sound wave. So, at low points on the modulator wave, the carrier wave’s peaks will be spread out with a longer distance between them, while at the high points of the modulator they will be bunched up closer together, like this:

synth3_4

Some FM synthesisers can combine more than two waves together in various ways to give a richer range of possible sounds.

Here’s our glockenspiel snippet synthesised in FM:

fm sound

(In case you’re curious, this was done using DOSBox, which emulates the Yamaha OPL-2 FM synthesiser chip used in the old Adlib and SoundBlaster sound cards common in DOS PCs, and the Allegro MIDI player example program. Describing how to get an ancient version of Allegro up and running on a modern computer would make a whole blog post in itself, but probably not a very interesting one).

It’s certainly a step up from the square wave and triangle wave versions. But it still sounds unnatural; you would be unlikely to mistake it for a real glockenspiel.

FM synthesis is a lot more complicated to perform than the older primitive methods, but by the 90s FM synthesiser chips were cheap enough to put in games consoles and add-in sound cards for PCs. Contrary to popular belief, they are not analogue (or hybrid analogue-digital) synths; they are fully digital devices apart from the final conversion to analogue at the end of the process.

In case you were wondering, this is pretty much the same “frequency modulation” process that is used in FM radio. The main difference between the two is that in FM radio, you have a modulator wave that is an audio signal, but the carrier wave is a very high frequency radio wave (up in the megahertz, millions-of-hertz range). In FM synthesis, both the carrier and modulator are audio frequency waves.

Sample-based Synthesis

Today, when you hear decent synthesised sound coming from a computer or a music keyboard, it’s very likely to be using sample-based methods. (This is often referred to as “wavetable synthesis”, but strictly speaking this term refers to only a quite specific subset of the sample-based methods). Sample-based synthesis is not really true synthesis in the same way that the other methods I’ve talked about are – it’s more a clever mixture of recording and synthesis.

Sample-based synthesis works by using short recordings of real instruments and manipulating and combining them to generate the final sound. For example, it might contain a recording of someone playing middle C on a grand piano. When it needs to play back a middle C, it can play back the recording unchanged. If it needs the note below, it will “stretch out” the sample slightly to increase its wavelength and lower its frequency. Similarly, for the note above it can “compress” the sample so that its frequency increases. It can also adjust the volume if the desired note is louder or quieter than the original recording. If a chord needs to be played, several instances of the sample can be played back simultaneously, adjusted to different pitches.

This synthesis method is not too computationally intensive; sound cards capable of sample-based synthesis (such as the Gravis Ultrasound and the SoundBlaster AWE 32/64) became affordable in the mid 90s and today’s computers can easily do it in software. Windows, for example, has a built-in sample-based synthesiser that is used to play back MIDI sound if there isn’t a hardware synth connected. Sound quality can be very good for some instruments – it is typically very good for percussion instruments, reasonable for ensemble sounds (like a whole string section or a choir), and not so good for solo string and wind instruments. The quality also depends on how good the samples themselves are and how intelligent the synth is at combining them.

Here’s the glockenspiel phrase played on a sample-based synth (namely my Casio keyboard):

sample based

This is a big step up from the other synths – this time we have something that might even be mistaken for a real glockenspiel! But it’s not perfect… if you listen carefully, you’ll notice that all of the notes sound suspiciously similar to each other, unlike the real glockenspiel recording where they are noticeably different.

Next time I’ll talk about the limitations of the methods I’ve described in this post, and what can be done about them.

 

Sound Synthesis II: Digital Recording

Digital Recording

Things changed with the advent of compact discs, and later DVDs and MP3s as well. Instead of storing the continuously changing shape of the sound wave, these store the sound digitally.

What do we mean by digitally? It means the sound is stored as a collection of numbers. In fact, the numbers are binary, which means only two digits are allowed – 0 and 1. The music on a CD, or in an MP3 file, is nothing more than a very long string of 0s and 1s.

How do you get from the shape of the sound to a string of numbers? After all, the sound wave graphics we saw last time looks very different from 1000110111011011010111011000100. First of all, you sample the sound signal. That means you look at where it is at certain individual points in time, and ignore it the rest of the time. Imagine drawing the shape of a sound wave on a piece of graph paper like this:

digital1

To sample this signal, we can look at where the signal is each time it crosses one of the vertical lines. We don’t care what it’s doing the rest of the time – only its intersections with the lines matter now. Here’s the same sound, but instead of showing the full wave, we just show the samples (as Xs):

digital2

To simplify things further so we can stick to dealing with whole numbers, we’re also going to move each sample to the nearest horizontal grid line. This means that all the samples will be exactly on an intersection where two of the grid lines cross:

digital3

So far, so good. We have a scattering of Xs across the graph paper. Hopefully you can see that they still form the shape of the original sound wave quite well. From here, it’s easy to turn our sound wave into a stream of numbers, one for each sample. We just look at each vertical grid line and note the number of the horizontal grid line where our sample is:

digital4

The wave we started with is now in digital form: 5, 9, 5, 6, 7, 1, 2, 6, 4, 6. It’s still in ordinary decimal numbers, but we could convert it to binary if we wanted to. (I won’t go into details of how to convert to binary here, but if you’re curious, there are plenty of explanations of binary online – here’s one). We can record this stream of numbers in a file on a computer disk, on a CD, etc. When we want to play it back, we can reverse the process we went through above to get back the original sound wave. First we plot the positions of the samples onto graph paper:

digital3

And now we draw the sound wave – all we have to do is join up our samples:

digital5

Voila! All ready to be played back again.

This might look very spiky and different from the original smooth sound wave. That’s because I’ve used a widely spaced grid with only a few points here so you can see what’s going on. In real digital audio applications, very fine grids and lots of samples are used so that the reconstructed wave is very, very close to the original – to show just one second of CD quality sound, you would need a grid with 65,536 horizontal and 44,100 vertical lines!

(In electronics, the device that turns an analogue sound wave into samples is called an analogue to digital converter, and its cousin that performs the inverse task is a digital to analogue converter. As you probably guessed, it’s not really done using graph paper).

But why?

At this point you may be wondering, why bother with digital recording? It seems like we just went through a complicated process and gained nothing – in fact, we actually lost some detail in the sound wave, which doesn’t look quite the same after what it’s been through! There are several advantages to digital recording:

  • Digital recordings can be easily manipulated and edited using a computer. Computers (at least all the ones in common use today) can only deal with digital information – anything analogue, such as sounds and pictures, has to be digitised before they will work with it. This opens up a huge range of possibilities, allowing much more sophisticated effects and editing techniques than could be accomplished in the analogue domain. It also allows us to do clever things like compressing the information so it takes up less space while still sounding much the same (this is what the famous MP3 files do).
  • I noted above that we lost a bit of detail in our sound wave when we converted it to digital and then converted it back. However, in real life situations digital recordings generally give much better sound quality than analogue recordings. This is because the small inaccuracies introduced in the digitisation process are usually much smaller and less noticeable than the background noise that inevitably gets into analogue recording and playback equipment no matter how careful you are. Digital is more or less immune to background noise for reasons I’ll explain shortly.
  • Digital recordings can be copied an unlimited number of times without losing any quality. This is closely related to the point above about sound quality. If you’re old enough to have copied records or cassettes onto blank tapes, or taped songs off the radio, you may have noticed this in action. The copy always sounds worse than the original, with more background noise. If you make another copy from that copy instead of from the original, it will be worse still. But it isn’t like that with digital recording – if you copy a CD to another CD, or copy an MP3 file from one computer to another, there is no loss of quality – the copy sounds exactly like the original, and if you make another copy from the copy, it will also sound exactly like the original. (This isn’t just a case of the loss in quality being so small you can’t hear it – there genuinely is no loss whatsoever. The copies are absolutely identical!).

Notes on background noise

I mentioned above that digital recordings are more or less immune to background noise and that’s one of their big advantages. But first of all, what is background noise, where does it come from, and what does it do to our sound signals?

Background noise is any unwanted interference that gets into the sound signal at some point during the recording or playback process. It can come from several different sources – if the electrical signal is weak (like the signal from a microphone or from a record player’s pick-up), it can be affected by electromagnetic interference from power lines or other devices in the area. If there is dust or dirt on the surface of a record or tape, this will also distort the signal that’s read back from it.

There is no getting away from background noise, it will always appear from somewhere. If we have a vinyl record with a sound signal recorded onto it that looks like this:

digital1

by the time it gets played back through the speakers, noise from various sources will have been added to the original signal and it might look more like this:

digital7

Once the noise is there, it’s very difficult or impossible to get rid of it again, mainly because there’s no reliable way to tell it apart from the original signal. So ideally we want to minimise its chances of getting there in the first place. This is where digital recording comes in. Let’s say we have the same sound signal recorded onto a CD instead of a vinyl record. Because it’s in digital form, it will be all 0s and 1s instead of a continuously varying wave like on the vinyl. So the information on the CD will look something like this:

digital8

This time there are only two levels, one representing binary 0 and the other binary 1.

There will still be some noise added to the signal when it gets read back from the CD – maybe there is dust on the disc’s surface or electrical interference getting to the laser pick-up. So the signal we get back will look more like this:

digital9

But this time the noise doesn’t matter. As long as we can still tell what is meant to be a 0 and what is a 1, small variations don’t make any difference. In this case it’s very obvious that the original signal shape was meant to be this:

digital8

So, despite the noise, we recovered exactly the original set of samples. We can pass them through the digital to analogue converter (DAC) and get back this:

digital1

a much more accurate version of the original sound wave than we got from the analogue playback. Although the noise still got into the signal we read from the CD, it’s disappeared as if by magic and doesn’t affect what we hear like it did with the record.

(Of course, digital recording isn’t completely immune to noise. If the noise level was so high that we could no longer tell what was meant to be a 0 and what was a 1, the system would break down, but it’s normally easy enough to stop this from happening. Also, we can’t prevent noise from getting into the signal after it’s converted back to analogue form, but again this is a relatively small problem as the majority of the recording and playback system works in digital form).

Does digital recording really sound better?

Not everyone thinks so. A lot of people say they prefer the sound of analogue recordings, often saying they have a “warmer” sound compared with the “colder” sound of digital. In my opinion, yes there is a difference, but digital is more faithful to the original sound – the “warmth” people talk about is actually distortion introduced by the less accurate recording method! It’s absolutely fine to prefer that sound, in the same way that it’s absolutely fine to prefer black and white photography or impressionist paintings even though they’re less realistic than a colour photograph or a painting with lots of fine detail.

“Ah”, you might say. “But surely a perfect analogue recording would have to be better than a digital recording? Because you’re recording everything rather than just samples of it”. Technically this is true… but in reality (a) there’s no such thing as a perfect analogue recording because there are so many ways for noise to get in, and (b) at CD quality or better, the loss of information from digitising the sound is miniscule, too small for anyone to be able to hear. Double-blind tests have been done where audio experts listened to sounds and had to determine whether the sound had been converted to digital and back or not. No-one was able to reliably tell.

Phew! That was longer than I meant it to be. That’s the background… next time I really will start on actual sound synthesis, I promise!