The halfway point… ?

It’s now six months since the UK went into lockdown due to Covid-19. One of the first things I did when I realised I would be in the house a lot more than planned this year was something I’d been meaning to do for a while… I got my old CD collection out, put it somewhere accessible and sorted the discs into alphabetical order.

Singles followed by albums, in alphabetical order
Classical CDs

A bit later on I also got hold of a nice new stand for my faithful hifi components (thanks Laura!).

Hifi on its new stand

The hifi has been a good investment. I think I spent about £400 on the main components soon after I started work, and nearly 20 years on they still work flawlessly and sound great. If I’d bought a cheap and nasty one instead I could easily have ended up spending more money in total replacing it every time it broke, plus it obviously wouldn’t have sounded as good.

As an aside, I feel a bit embarrassingly old fashioned to still be clinging to physical media and a bulky hifi component system in this day and age. Aren’t we all supposed to be listening to Spotify on our smart speakers by now? But to tell the truth, I’ve just never really got on with streaming services, though I did try them for a while. I think it’s the feeling that I no longer have control of my music collection… what if one of my favourite artists does something “bad” and Spotify decides to pull all their stuff as a result? It seems like there is already a precedent for this with, for example the Michael Jackson episode of the Simpsons being removed from streaming services and future DVD releases as a result of allegations against him. As someone who gets very emotionally invested in my music and TV, I feel safer having a physical copy, or at least a locally stored DRM-free copy that can’t be taken away from me on a whim.

(I think it also didn’t help that, soon after I first got Spotify, I boarded a long flight only to discover that it had picked that moment to delete all the music I’d downloaded).

More to the point, I just like my CDs and my hifi, damn it. I always dreamed of having a big CD collection and a nice good quality hifi when I was a teenager, and now that I have those things I should be enjoying them, not feeling ashamed of them.

The collection is certainly big. It currently runs to 121 singles, 261 pop/rock albums, 166 classical CDs, and 39 compilation CDs. And when I’d finished sorting them, I decided, “You know what, I’m going to listen to ALL of them”. So I did.

Or rather, I started listening to all of them, because nearly six months on I’m still nowhere near finished yet. I generally average about one or two discs per day while I’m working. I started off listening through both the singles and the classical CDs in alphabetical order, then when I finished the singles I started on the albums. At this point I have listened to 126 classical CDs, 121 singles and 28 albums, whilst I have 40 classical, 233 albums and 39 compilations still to go. In case you were wondering, I’m up to Tchaikovsky with the classical and The Bluetones with the rock/pop.

One of the reasons I wanted to start this was because I had a feeling we’d be out of lockdown again before I finished, and that was kind of a comforting thought. My brother was more pessimistic and thought I would finish all the CDs before that happened. I think I was at least half-right in the end… I wouldn’t class the current restrictions as a lockdown anymore, so the lockdown did end before I got to the end of my CDs. I’m now starting to wonder whether the pandemic will be over and we’ll be back to “full normal” before I’m done. Given what I’ve been reading lately about vaccines, and given that the UK authorities now seem to be hinting at about another 6 months of restrictions, it’s possible, though I don’t quite dare yet to believe it’s likely.

I decided at the start that I would make myself listen to every CD in full, the only exception being that I could skip a track if I’d already listened to exactly the same version of it on a previous disc. This made me a bit apprehensive, both because I knew some of them would bring back memories I’d rather not think about, and also because I suspected some of what I used to listen to in my teens wouldn’t have aged very well. So far, though, it’s been OK. Two songs did move me to tears (I won’t tell you what they were because you’ll think I’m mad), one of them because it must have been the first time I’d heard the original version in well over 20 years and I’d forgotten how nice it is compared to the blander album version I’ve heard many times since.

I discovered a handful of tracks that for some reason had never made it to being ripped and put on my phone, so of course I remedied that straight away. I think there were even a few whole CDs (presents and impulse buys) that I’ve never listened to at all until now. Mostly, though, the only new stuff I discovered was thumping, repetitive dance mixes on the B-sides of a lot of the singles that reminded me why I never bothered listening to them in the first place.

Anyway, it’s been something to do to mark the passage of these weird times, and a nice little trip down memory lane. I could have just listened to all the same music on my computer but it wouldn’t have been the same somehow. Getting the actual physical CDs out, seeing the covers and the booklets and the disc designs has been almost as much of a nostalgia hit as the music itself.

Sonic Triangle: Back from the dead!

Well… I have to admit, when I first opened this blog with a post about my band, Sonic Triangle, I didn’t expect it to be nearly five years before we released anything new. Five years!! How the hell did that happen? :O

But better late than never, as they say. (I sometimes think I should adopt that saying as my motto, as it applies to so many things in my life). We finally released a song! Two songs, in fact. One’s called Mercury, and it has a video as well! The other is called Homesick, and it doesn’t have a video. (Actually, it sort of does, but I doubt that that video will ever see the light of day, so we’ll just pretend it doesn’t). We’re pretty happy with both of them, and I hope you enjoy them too.

As to why it’s taken five years, I’m not actually sure. It’s not as if we haven’t been doing stuff… Alex sent round the first demo of Homesick way back in spring 2011, and we’ve actually been working on it (and about seven or eight other tracks, some of which we might finish and release at some point) on-and-off pretty much ever since then. We’ve just all been quite busy with other things, and haven’t got to the point of having anything we feel happy enough with to release until now.

A shot from the Mercury video, featuring our very talented singer.

A shot from the Mercury video, featuring our very talented singer.

Our process of recording hasn’t changed a great deal since I first wrote about it. Most of the instruments are still played on my Casio keyboard, though when I moved house a few years ago I brought the Technics electric piano that I inherited from my uncle out of storage, so the piano parts are now played on that, which is a big improvement. We did some recording with the glockenspiel, but it doesn’t feature on either of the new tracks. I think the way Alex creates the MIDI demos and edits the final versions has changed a bit, but I don’t know the details. I just play my keyboard and piano, then Alex goes away with the sound files and a few hours later a marvellous mix appears that leaves me thinking “Did I really play all that?”.

We’ve now dragged ourselves into the 21st century and created a Facebook page, supplementing our rather minimalist website. I think we always had a Twitter account, it just hasn’t been used much.

(In other music news, I’m looking forward to seeing Belle and Sebastian live next month… they’ve been on my list of bands to go and see for even longer than it’s taken us to finish Homesick 😉 ).

Next on my Chopin list…

… here’s the Fantaisie-Impromptu, which I’ve been learning for the past few months:

This was one of the pieces on the first piano compilation CD I ever got hold of, at age 15. It had 18 tracks and I think I can play 10 of them now (or at least could at one time, I’m out of practise at some of them now)… I had thoughts of trying to learn all 18, but that would probably be a bit of a pointless exercise. Not to mention strictly speaking impossible, since one of the pieces needs two pianos and two pianists!

The Fantaisie-Impromptu is a good demonstration of something that I’m very bad at remembering and applying – not a piano-specific thing in fact, but a general life thing. Thing is, I avoided trying to learn the Fantaisie-Impromptu for years because it looked too hard – for most of it, the right hand plays quadruplets and the left hand plays triplets, and I thought that getting them to synchronise properly was going to be a nightmare.

But when I finally decided to tackle it back in January this year, I found the key: it’s not hard as long as you don’t think about it too much. The last thing you want is to be trying to count your way through every bar, working out in detail where each note falls in relation to the other hand’s part. That way quickly leads to insanity because the piece is way too fast for that. All you need to do is learn each part on its own, then once you have a reasonable grasp of them separately, try playing them at the same time without really thinking about what’s going on… just let it flow and let the music take over. I am not good at this at all… usually I will over-analyse everything to death rather than just going with the flow and letting it happen. But I’m starting to see that in certain cases it’s the only approach that won’t drive you mad… if it applies to the Fantaisie-Impromptu, maybe it applies to other things in life too…

(I’m not very happy with the audio quality in this one – something weird is definitely happening. Will look into alternative recording methods for my next video).

I achieved one of my life goals…

… with hopefully a good 40 or 50 years to spare.

Namely, I learned to play a Chopin Etude:

(OK, I know it’s a fair bit slower than it’s meant to be and there are a few mistakes, but I’m very happy to have even got it to this point. Bars 79-82, the bit near the end with ascending triplets in both hands, especially had me almost tearing my hair out for a while. In the end I had to devise my own set of exercises just for those few bars and beaver away at them for slow and painstaking hours to stop the ending from falling apart completely. Probably a good indication that I should have picked an easier piece, but by that time I was determined to complete it).

Chopin wrote 27 etudes (studies) in three sets. This is the no. 5 etude from the first set, imaginatively nicknamed the “Black Keys” Etude due to the fact that the right hand plays entirely on the black keys. That might sound like a bit of a limitation, but in fact it’s not that bad – using only the black keys you can play a major pentatonic scale starting on G flat. A lot of music of various genres uses pentatonic scales… the Skye Boat Song is another well-known example. It’s my favourite etude out of the handful of popular ones that always make it onto Chopin compilation CDs.

Obviously this is a very different style to my previous piano project, but I think it shares at least one similarity. Both Chopin and Bach imposed on themselves what seem like quite severe technical restrictions (using only the black keys in Chopin’s case, conforming exactly to all the rules of a fugue in Bach’s), yet within those constraints they both produced wonderful music that doesn’t sound restricted or stilted in the slightest.

Right. Now that’s out of the way, I’m off to go learn something that doesn’t take me literally years to finish!

 

Bachtime

One of the advantages to our new flat is that I’ve got the nice electric piano that used to belong to my Uncle (but was languishing in the garage at the old place due to not fitting up the stairs) in here. I’ve been taking full advantage of it to learn some pieces I always wanted to play. Here’s a Bach 4 part Fugue:

I loved this one as soon as I heard the Well Tempered Clavier (played by Glenn Gould, who’s a lot better at it than I am). It just amazes me that anyone can even write a four part fugue that complies with all the complicated rules for how fugues should be constructed (and this one does) at all, nevermind also produce something so musical and satisfying at the same time.

Fugues start off like rounds where each voice enters in turn a few bars apart with the same melody (called the subject). After that, things get more complicated – another melody (called the answer) is introduced, and the subject is usually repeated in numerous ways (upside down, backwards, etc.), the different voices layered on top of each other producing complex and ever-changing harmonies.

The Well Tempered Clavier is made up of two books, each of which is a collection of 24 preludes and fugues in every major and minor key. This is the A minor fugue (no. 20) from the first book. It’s one of the longest – but I think the length gives the music more weight than the shorter fugues and makes the climax it builds to near the end more dramatic. I hope you like it.

I’m not actually playing it from memory… I have the score scanned into a PDF file which is open on the laptop, and the USB cable in the foreground goes to a foot pedal configured to work as a “Page Down” button so that I can turn the pages with my feet. (They’re not doing anything else; keyboard instruments in Bach’s time mostly didn’t have pedals so I don’t use the piano pedals while I’m playing Bach either). I suspect the score is actually doing the same job as Dumbo’s feather at this point… I’ve played the piece so many times now that it must surely all be imprinted on my brain, but if I can’t see the music in front of me, I freak out and forget how to play it.

(Apologies for the audio quality in some bits – I was recording direct to my laptop from the piano’s headphone socket which started off sounding fine but went a bit weird for reasons I’m not sure of…).

 

Sound Synthesis IV: Next Generation Sound Synthesis

Last time we looked at (and listened to!) various methods of digital sound synthesis, beginning with the very primitive systems used by early computers, and ending with the sample-based methods in widespread use today. This time I’m going to talk about a new and very promising method currently in development.

What’s wrong with sample-based synthesis?

Our glockenspiel test sound already sounded pretty good using the sample-based method… do we really need a more advanced method? The answer is, although sample-based synthesis does work very well for certain instruments under certain conditions, it doesn’t work well the whole time.

Although it’s based on samples of real instruments, it’s still not fully realistic. Often the same sample will be used for different notes and different volumes, with the synth altering the frequency and amplitude of the sample as needed. But on a real piano (for example), the notes will all sound subtly different. A high C won’t sound exactly the same as a low C with its frequency increased, and pressing a key hard will result in a very different sound from pressing the same key softly – it won’t just be louder. Some of the better synths will use a larger number of samples in an attempt to capture these nuances, but the amount of data can become unmanageable. And that’s just considering one note at a time. In a real piano, when multiple notes are being played at the same time, the vibrations in all the different strings will influence each other in quite complex ways to create the overall sound.

It gets even worse for string and brass instruments. For example, changing from one note to another on a trumpet can sound totally different depending on how fast the player opens and closes the valves and it is unlikely a sample-based system will be able to reproduce all the possibilities properly without recording an unrealistically large number of samples. In some genres of music, the player may do things with the instrument that were never intended, such as playing it with a valve only part way open. A sample-based system would have no way of dealing with such unforeseen cases – if no-one recorded a sample for that behaviour, it can’t synthesise it.

The other problem with many of the synthesis methods is one of control. Even if it were possible to get them to generate the desired sound, it’s not always very obvious how to do it. FM synthesisers, for example, take a bewildering array of parameters, many of which can seem like “magic numbers” that don’t bear any obvious relation to the sound being generated. To play a note, sound envelopes and frequencies need to be set for every operator, the waveforms can be adjusted, and the overall configuration of the operators also needs to be set. Hardly intuitive stuff for people accustomed to thinking in terms of instruments and notes.

Physical Modelling Synthesis

A newer synthesis method has the potential to solve both the realism problem and the control problem, giving musicians virtual instruments that not only sound more realistic but are much easier to “play” and will correctly handle all situations, even ones that weren’t envisaged when the synth was designed. This is called Physical Modelling Synthesis, and it’s the basis for the project I’m working on just now.

The basic idea is that instead of doing something abstract that just happens to give the result you want (like e.g. FM synthesis), or “cheating” with recordings to give a better sounding result (like sample-based synthesis), you simulate exactly how a real instrument would behave. This means building a mathematical model of the entire instrument as well as anything else that’s relevant (the surrounding air, for example). Real instruments create sound because they vibrate in a certain audible way when they are played – whether that’s by hitting them, bowing them, plucking their strings, blowing into them, or whatever. Physical modelling synthesis works by calculating exactly how the materials that make up the instrument would vibrate given certain inputs.

How do we model an instrument mathematically? It can get very complex, especially for instruments that are made up of lots of different parts (for example, a piano has hundreds of strings, a sound board, and a box filled with air surrounding them all). But let’s start by looking at something simpler: a metal bar that could be, for example, one note of a glockenspiel.

glockdiagram1

To simulate the behaviour of the bar, we can divide it into pieces called elements. Then for each element we store a number, which will represent the movement of that part of the bar as it vibrates. To begin with, the bar will be still and not vibrating, so all these numbers will be zero:

glockdiagram2

We also need something else in this setup – we need a way to hear what’s going on, otherwise the whole exercise would be a bit pointless. So, we’ll take an output from towards the right hand end of the bar:

glockdiagram3

Think of this like a sort of “virtual microphone” that can be placed anywhere on our instrument model. All it does is take the number from the element it’s placed on – it doesn’t care about any of the other elements at all. At the moment the number (like all the others) is stuck at zero, which means the microphone will be picking up silence. As it should be, because a static, non-moving bar doesn’t make any sound.

Now we need to make the bar vibrate so that it does generate some sound. To do this, we will simulate hitting the bar with a beater near its left hand end:

glockdiagram4

What happens when the beater hits the bar? Essentially, it just makes the bar move slightly. So now, instead of all zeroes in our element numbers, we have a non-zero value in the element that’s just been hit by the beater, to represent this movement:

glockdiagram5

But the movement of the bar won’t stay confined to this little section nearest where the beater hit. Over time, it will propagate along the whole length of the bar, causing it to vibrate at its resonant frequency. After some short length of time, the bar might look like this:

glockdiagram6

and then like this:

glockdiagram7

then this:

glockdiagram8

As you can see, the value from the beater strike has “spread out” along the bar so now the majority of the bar is displaced in one direction or another. The details of how this is done depend on the material and exactly how the bar is modelled, but basically each time the computer updates the bar, the number in each box is calculated based on the previous numbers in all the surrounding boxes. (The values that were in those boxes immediately before the update are the most influential, but for some models numbers from longer ago come into play as well). Sometimes the boxes at the ends of the bar are treated differently from the other boxes – in fact, they are different, because unlike the boxes in the middle they only have a neighbouring box on one side of them, not both. There are various different ways of treating the edge boxes, and these are referred to as the model’s boundary conditions. They can get quite complex so I won’t say more about them here.

Above I said “some short length of time”, but that’s quite vague. We actually want to wait a very specific length of time, called the timestep, between updates to the bar. The timestep is generally chosen to match the sampling rate of the audio being output, so that the microphone can just pick up one value each time the bar is updated and output it. So, for a CD quality sample rate of 44100Hz, a timestep lasts 1/44100th of a second, or 0.0000226757 seconds.

If the model is working properly, the result of all this will be that the bar vibrates at its resonant frequency – just like the bar of a real glockenspiel. Every timestep, the “microphone” will pick up a value, and when this sequence of values is played back through speakers, it should sound like a metal bar being hit by a beater.

Here are the first 20 values picked up by the microphone: 0, 0, 0.022, -0.174, -0.260, 0.111, 0.255, 0.123, 0.426, 0.705, 0.495, 0.342, 0.293, 0.116, 0.016, 0.009, 0.033, -0.033, -0.312, -0.321, -0.030

and here’s a graph showing the wave produced by them:

pmgraph

To simulate a whole glockenspiel, we can model several of these bars, each one a slightly different length so as to produce a different note, and take audio outputs from all of them. Then if we hit them with our virtual beater at the right times, we can hear our test sample, this time generated by physical modelling synthesis:

pmsynth

I used a very primitive version of physical modelling synthesis to generate this sample, so it doesn’t sound amazing. I also used a bit of trial and error tweaking to get the bar lengths I wanted, so the tuning isn’t perfect. Both the project, and my knowledge of this type of synthesis, are still in fairly early stages just now! In the next section I’ll talk about what we can do do improve the accuracy of the models, and therefore also the quality of the sound produced.

Accuracy and model complexity

In our project we are mainly going for quality rather than speed. We want to try and generate the best quality of sound that we can from these models; if it takes a minute (or even an hour) of computer time to generate a second of audio, we don’t see that as a huge problem. But obviously we’d like things to run as fast as possible, and if it’s taking days or weeks to generate short audio samples, that is a problem. So I’ll say a bit about how we’re trying to improve the quality of the models, as well as how we hope to keep the compute time from becoming unmanageable.

A long thin metal bar is one of the simplest things to model and we can get away with using a one-dimensional row of elements (as demonstrated above) for this. But for other instruments (or parts of instruments), more complex models may be required. To model a cymbal, for example, we will need a two-dimensional grid of elements spaced across the surface of the cymbal. And for something big and complicated like a whole piano, we would most likely need individual 1D models for each string, a 2D model for the sound board, and a 3D model for the air surrounding everything, all connected and interacting with each other in order to get an accurate synthesis. In fact, any instrument model can generally be improved by embedding it in a 3D space model, so that it is affected by the acoustics of the room it is in.

There are also different ways of updating the model’s elements each timestep. Simple linear models are very easy and fast to compute and are sufficient for many purposes (for example, modelling the vibration of air in a room). Non-linear models are much more complicated to update and need more compute time, but may be necessary in order to get accurate sound from gongs, brass instruments, and others.

Inputs (for example, striking, bowing, blowing the model instruments) and how they are modelled can have an effect as well. The simplest way to model a strike is to add a number to one of the elements of the model for just a single timestep as shown in the example above, but it’s more realistic to add a force that gradually increases and then gradually decreases again across several timesteps. Bowing and blowing are more complicated. With most of these there is some kind of trade-off between the accuracy of the input and the amount of computational resources needed to model it.

2D models and especially 3D models can consume a lot of memory and take a huge number of calculations to update. For CD quality audio, quite a finely spaced grid is required and even a moderately sized 3D room model can easily max out the memory available on most current computers. Accurately modelling the acoustics of a larger room, such as a concert hall, using this method is currently not realistic due to lack of memory, but should become feasible within a few years.

The number of calculations required to update large models is also a challenge, but not an insurmountable one. Especially for the 3D acoustic models, the largest ones, we usually want to do the same (or very similar) calculations again and again and again on a massive number of points. Fortunately, there is a type of computer hardware that is very good at doing exactly this: the GPU.

GPU stands for graphics processing unit, and these processors were indeed originally designed for generating graphics, where the same relatively simple calculations need to be applied to every polygon or every pixel on the screen many, many times. In the last few years there has been a lot of interest in using GPUs for other sorts of calculations, for example scientific simulations, and now many of the world’s most powerful supercomputers contain GPUs. They are ideal for much of the processing in our synthesis project where the simple calculations being applied to every point in a 3D room model closely parallel the calculations being applied to every pixel on the screen when rendering an image.

Advantages of Physical Modelling Synthesis

You might wonder, when sample-based synthesis is getting so good and is so much easier to perform, why bother with physical modelling synthesis? There are three main reasons:

  • Sound quality. With a good enough model, physical modelling synthesis can theoretically sound just as good as a real instrument. Even with simpler models, certain instrument types (e.g. brass) can sound a lot better than sample-based synthesis.
  • Flexibility. If you want to do something more unusual, for example hitting the strings of a violin with the wooden side of the bow instead of bowing them with the hair, or playing a wind instrument with the valves half-open, you are probably going to be out of luck with a sample-based synthesiser. Unless whoever designed the synthesiser foresaw exactly what you want and included samples of it, there will be no way to do it. But physical modelling synthesis can – you can use the same instrument model and just modify the inputs however you want.
  • Ease of control. I mentioned at the beginning that older types of synthesiser can be hard to control – although they may theoretically be able to generate the sound you want, it might not be at all obvious how to get them to do it, because the input parameters don’t bear much obvious relation to things in the “real world”. FM is particularly bad for this – to play a note you might have to do something like: “Set the frequency of operator 1 to 1000Hz, set its waveform type to full sine wave, set its attack rate to 32, its decay rate to 18, its sustain level to 5 and its release rate to 4. Now set operator 2’s frequency to 200Hz, its attack rate to 50, decay rate 2, sustain level 14, release rate 3. Now chain the operators together so that 2 is modulating 1”. (In reality the quoted text would be some kind of programming language rather than English, but you get the idea). Your only options for getting the sound you want are likely to be trial and error, or using a library of existing sounds that someone else came up with by trial and error.

Contrast this with how you might play a note on a physical modelling synthesiser: “Hit the left hand bar of my glockenspiel model with the virtual beater 10mm from its front end, with a force of 10N”. Much better, isn’t it? You might still use a bit of trial and error to find the optimum location and force for the hit, but the model’s input parameters are a lot closer to things we understand from the real world, so it will be a lot less like groping around in the dark. This is because we are trying to model the real world as accurately as possible, unlike FM and sample-based synthesisers which are abstract systems attempting to generate sound as simply as possible.

Here’s a link to the Next Generation Sound Synthesis project website. The project’s been running for a year and has four years still to go. We’re investigating several different areas, including how to make good quality mathematical models for various types of instruments, how to get them to run as fast as possible, and also how to make them effective and easy to use for musicians.

Of course, whatever happens I doubt we will be able to synthesise the bassoon ;).

Sound Synthesis III: Early Synthesis Methods

Digital Sound Synthesis

Before I delve into describing different types of synthesis, I should start with a disclaimer: I’m coming at this mainly from the angle of how old computers (and video game systems) used to synthesise sound rather than talking about music synthesisers, because that’s where most of my knowledge is. Although I have owned various keyboards, I don’t have a deep knowledge of exactly how they work as I’m more of a pianist than a keyboard player really. There is quite a bit of overlap between methods used in computers and methods used in musical instruments though, especially more recently.

To illustrate the different synthesis methods, I’m going to be using the same example sound over and over again, synthesised in different ways. It’s the glockenspiel part from the opening of Sonic Triangle‘s sort-of Christmas song “It Could Be Different”. For comparison to the synthesised versions, here it is played (not particularly well, but you should get the idea!) on a real glockenspiel:

glockenspiel

(In fact, in the original recording of the song, it isn’t a real glockenspiel. It’s the sample-based synthesis of my Casio keyboard… there’ll be more about that sort of synthesis later).

If you have trouble hearing the sounds in this post, try right clicking the links, saving them to your hard drive and opening them from there. Seriously, I can’t believe that in 2013 there still isn’t an easy way of putting sounds on web pages that works on all major browsers. Grrrr!

Primitive Methods

As we saw last time, digital sound recordings (which include CDs, DVDs, and any music files on a computer) are just very long lists of numbers that were created by feeding a sound wave into an analogue-to-digital converter. To play them back, we feed the numbers into a digital-to-analogue converter and then play back the resulting sound using a loudspeaker. But what if, instead of using a list of numbers that was previously recorded, we used a computer program to generate a list of numbers and then played them back in the same way? This is the basis of digital sound synthesis – creating entirely new sounds that never existed in reality.

Very old (1980s) home computers and games consoles tended to only be able to generate very primitive, “beepy” sounding music. This was because they were generating basic sound wave shapes that aren’t like anything you’d get from a real musical instrument. The simplest of all, used by a lot of early computers, is a square wave:

synth3_1

square wave sound

Another option is the triangle wave, with a slightly softer sound:

synth3_2

triangle wave sound

The sound could be improved by giving each note a “shape” (known as its envelope), so that a glockenspiel sound, for example, would start loud and then die away, like a real glockenspiel does:

synth3_3

triangle wave with envelope sound

None of these methods sound particularly nice, and it’s hard to imagine any musician using them now unless they were deliberately going for a retro electronic sort of effect. But they have the advantage of being very easy to synthesise, requiring only a simple electronic circuit or a few lines of program code. (I wrote a program to generate the sound samples in this section from scratch in about half an hour). The square wave, for example, only has two possible levels, so all the computer has to do is keep track of how long to go before switching to the other level. The length of time spent on each level determines the pitch of the sound produced, and the difference in height between the levels determines the volume.

FM Synthesis

I remember being very excited when we upgraded from our old ZX Spectrum +3, which could only do square wave synthesis, to a PC and a Sega Megadrive that were capable of FM (Frequency Modulation) Synthesis. They could actually produce the sounds of different instruments! Looking back now, they didn’t sound very much like the instruments they were supposed to, but it was still a big improvement on square waves.

FM synthesis involves combining two (or sometimes more) waves together to produce a single, more complex wave. The waves are generally sine waves and the combination process is called frequency modulation – it means the frequency of one wave (the “carrier”) is altered over time in a way that depends on the other wave (the “modulator”) to produce the final sound wave. So, at low points on the modulator wave, the carrier wave’s peaks will be spread out with a longer distance between them, while at the high points of the modulator they will be bunched up closer together, like this:

synth3_4

Some FM synthesisers can combine more than two waves together in various ways to give a richer range of possible sounds.

Here’s our glockenspiel snippet synthesised in FM:

fm sound

(In case you’re curious, this was done using DOSBox, which emulates the Yamaha OPL-2 FM synthesiser chip used in the old Adlib and SoundBlaster sound cards common in DOS PCs, and the Allegro MIDI player example program. Describing how to get an ancient version of Allegro up and running on a modern computer would make a whole blog post in itself, but probably not a very interesting one).

It’s certainly a step up from the square wave and triangle wave versions. But it still sounds unnatural; you would be unlikely to mistake it for a real glockenspiel.

FM synthesis is a lot more complicated to perform than the older primitive methods, but by the 90s FM synthesiser chips were cheap enough to put in games consoles and add-in sound cards for PCs. Contrary to popular belief, they are not analogue (or hybrid analogue-digital) synths; they are fully digital devices apart from the final conversion to analogue at the end of the process.

In case you were wondering, this is pretty much the same “frequency modulation” process that is used in FM radio. The main difference between the two is that in FM radio, you have a modulator wave that is an audio signal, but the carrier wave is a very high frequency radio wave (up in the megahertz, millions-of-hertz range). In FM synthesis, both the carrier and modulator are audio frequency waves.

Sample-based Synthesis

Today, when you hear decent synthesised sound coming from a computer or a music keyboard, it’s very likely to be using sample-based methods. (This is often referred to as “wavetable synthesis”, but strictly speaking this term refers to only a quite specific subset of the sample-based methods). Sample-based synthesis is not really true synthesis in the same way that the other methods I’ve talked about are – it’s more a clever mixture of recording and synthesis.

Sample-based synthesis works by using short recordings of real instruments and manipulating and combining them to generate the final sound. For example, it might contain a recording of someone playing middle C on a grand piano. When it needs to play back a middle C, it can play back the recording unchanged. If it needs the note below, it will “stretch out” the sample slightly to increase its wavelength and lower its frequency. Similarly, for the note above it can “compress” the sample so that its frequency increases. It can also adjust the volume if the desired note is louder or quieter than the original recording. If a chord needs to be played, several instances of the sample can be played back simultaneously, adjusted to different pitches.

This synthesis method is not too computationally intensive; sound cards capable of sample-based synthesis (such as the Gravis Ultrasound and the SoundBlaster AWE 32/64) became affordable in the mid 90s and today’s computers can easily do it in software. Windows, for example, has a built-in sample-based synthesiser that is used to play back MIDI sound if there isn’t a hardware synth connected. Sound quality can be very good for some instruments – it is typically very good for percussion instruments, reasonable for ensemble sounds (like a whole string section or a choir), and not so good for solo string and wind instruments. The quality also depends on how good the samples themselves are and how intelligent the synth is at combining them.

Here’s the glockenspiel phrase played on a sample-based synth (namely my Casio keyboard):

sample based

This is a big step up from the other synths – this time we have something that might even be mistaken for a real glockenspiel! But it’s not perfect… if you listen carefully, you’ll notice that all of the notes sound suspiciously similar to each other, unlike the real glockenspiel recording where they are noticeably different.

Next time I’ll talk about the limitations of the methods I’ve described in this post, and what can be done about them.

 

Sound Synthesis II: Digital Recording

Digital Recording

Things changed with the advent of compact discs, and later DVDs and MP3s as well. Instead of storing the continuously changing shape of the sound wave, these store the sound digitally.

What do we mean by digitally? It means the sound is stored as a collection of numbers. In fact, the numbers are binary, which means only two digits are allowed – 0 and 1. The music on a CD, or in an MP3 file, is nothing more than a very long string of 0s and 1s.

How do you get from the shape of the sound to a string of numbers? After all, the sound wave graphics we saw last time looks very different from 1000110111011011010111011000100. First of all, you sample the sound signal. That means you look at where it is at certain individual points in time, and ignore it the rest of the time. Imagine drawing the shape of a sound wave on a piece of graph paper like this:

digital1

To sample this signal, we can look at where the signal is each time it crosses one of the vertical lines. We don’t care what it’s doing the rest of the time – only its intersections with the lines matter now. Here’s the same sound, but instead of showing the full wave, we just show the samples (as Xs):

digital2

To simplify things further so we can stick to dealing with whole numbers, we’re also going to move each sample to the nearest horizontal grid line. This means that all the samples will be exactly on an intersection where two of the grid lines cross:

digital3

So far, so good. We have a scattering of Xs across the graph paper. Hopefully you can see that they still form the shape of the original sound wave quite well. From here, it’s easy to turn our sound wave into a stream of numbers, one for each sample. We just look at each vertical grid line and note the number of the horizontal grid line where our sample is:

digital4

The wave we started with is now in digital form: 5, 9, 5, 6, 7, 1, 2, 6, 4, 6. It’s still in ordinary decimal numbers, but we could convert it to binary if we wanted to. (I won’t go into details of how to convert to binary here, but if you’re curious, there are plenty of explanations of binary online – here’s one). We can record this stream of numbers in a file on a computer disk, on a CD, etc. When we want to play it back, we can reverse the process we went through above to get back the original sound wave. First we plot the positions of the samples onto graph paper:

digital3

And now we draw the sound wave – all we have to do is join up our samples:

digital5

Voila! All ready to be played back again.

This might look very spiky and different from the original smooth sound wave. That’s because I’ve used a widely spaced grid with only a few points here so you can see what’s going on. In real digital audio applications, very fine grids and lots of samples are used so that the reconstructed wave is very, very close to the original – to show just one second of CD quality sound, you would need a grid with 65,536 horizontal and 44,100 vertical lines!

(In electronics, the device that turns an analogue sound wave into samples is called an analogue to digital converter, and its cousin that performs the inverse task is a digital to analogue converter. As you probably guessed, it’s not really done using graph paper).

But why?

At this point you may be wondering, why bother with digital recording? It seems like we just went through a complicated process and gained nothing – in fact, we actually lost some detail in the sound wave, which doesn’t look quite the same after what it’s been through! There are several advantages to digital recording:

  • Digital recordings can be easily manipulated and edited using a computer. Computers (at least all the ones in common use today) can only deal with digital information – anything analogue, such as sounds and pictures, has to be digitised before they will work with it. This opens up a huge range of possibilities, allowing much more sophisticated effects and editing techniques than could be accomplished in the analogue domain. It also allows us to do clever things like compressing the information so it takes up less space while still sounding much the same (this is what the famous MP3 files do).
  • I noted above that we lost a bit of detail in our sound wave when we converted it to digital and then converted it back. However, in real life situations digital recordings generally give much better sound quality than analogue recordings. This is because the small inaccuracies introduced in the digitisation process are usually much smaller and less noticeable than the background noise that inevitably gets into analogue recording and playback equipment no matter how careful you are. Digital is more or less immune to background noise for reasons I’ll explain shortly.
  • Digital recordings can be copied an unlimited number of times without losing any quality. This is closely related to the point above about sound quality. If you’re old enough to have copied records or cassettes onto blank tapes, or taped songs off the radio, you may have noticed this in action. The copy always sounds worse than the original, with more background noise. If you make another copy from that copy instead of from the original, it will be worse still. But it isn’t like that with digital recording – if you copy a CD to another CD, or copy an MP3 file from one computer to another, there is no loss of quality – the copy sounds exactly like the original, and if you make another copy from the copy, it will also sound exactly like the original. (This isn’t just a case of the loss in quality being so small you can’t hear it – there genuinely is no loss whatsoever. The copies are absolutely identical!).

Notes on background noise

I mentioned above that digital recordings are more or less immune to background noise and that’s one of their big advantages. But first of all, what is background noise, where does it come from, and what does it do to our sound signals?

Background noise is any unwanted interference that gets into the sound signal at some point during the recording or playback process. It can come from several different sources – if the electrical signal is weak (like the signal from a microphone or from a record player’s pick-up), it can be affected by electromagnetic interference from power lines or other devices in the area. If there is dust or dirt on the surface of a record or tape, this will also distort the signal that’s read back from it.

There is no getting away from background noise, it will always appear from somewhere. If we have a vinyl record with a sound signal recorded onto it that looks like this:

digital1

by the time it gets played back through the speakers, noise from various sources will have been added to the original signal and it might look more like this:

digital7

Once the noise is there, it’s very difficult or impossible to get rid of it again, mainly because there’s no reliable way to tell it apart from the original signal. So ideally we want to minimise its chances of getting there in the first place. This is where digital recording comes in. Let’s say we have the same sound signal recorded onto a CD instead of a vinyl record. Because it’s in digital form, it will be all 0s and 1s instead of a continuously varying wave like on the vinyl. So the information on the CD will look something like this:

digital8

This time there are only two levels, one representing binary 0 and the other binary 1.

There will still be some noise added to the signal when it gets read back from the CD – maybe there is dust on the disc’s surface or electrical interference getting to the laser pick-up. So the signal we get back will look more like this:

digital9

But this time the noise doesn’t matter. As long as we can still tell what is meant to be a 0 and what is a 1, small variations don’t make any difference. In this case it’s very obvious that the original signal shape was meant to be this:

digital8

So, despite the noise, we recovered exactly the original set of samples. We can pass them through the digital to analogue converter (DAC) and get back this:

digital1

a much more accurate version of the original sound wave than we got from the analogue playback. Although the noise still got into the signal we read from the CD, it’s disappeared as if by magic and doesn’t affect what we hear like it did with the record.

(Of course, digital recording isn’t completely immune to noise. If the noise level was so high that we could no longer tell what was meant to be a 0 and what was a 1, the system would break down, but it’s normally easy enough to stop this from happening. Also, we can’t prevent noise from getting into the signal after it’s converted back to analogue form, but again this is a relatively small problem as the majority of the recording and playback system works in digital form).

Does digital recording really sound better?

Not everyone thinks so. A lot of people say they prefer the sound of analogue recordings, often saying they have a “warmer” sound compared with the “colder” sound of digital. In my opinion, yes there is a difference, but digital is more faithful to the original sound – the “warmth” people talk about is actually distortion introduced by the less accurate recording method! It’s absolutely fine to prefer that sound, in the same way that it’s absolutely fine to prefer black and white photography or impressionist paintings even though they’re less realistic than a colour photograph or a painting with lots of fine detail.

“Ah”, you might say. “But surely a perfect analogue recording would have to be better than a digital recording? Because you’re recording everything rather than just samples of it”. Technically this is true… but in reality (a) there’s no such thing as a perfect analogue recording because there are so many ways for noise to get in, and (b) at CD quality or better, the loss of information from digitising the sound is miniscule, too small for anyone to be able to hear. Double-blind tests have been done where audio experts listened to sounds and had to determine whether the sound had been converted to digital and back or not. No-one was able to reliably tell.

Phew! That was longer than I meant it to be. That’s the background… next time I really will start on actual sound synthesis, I promise!

 

Sound Synthesis I: How Sound Recording Works

Hello, and welcome to the first of several blog entries inspired by one of my projects, Project Noah. Actually that’s just my code name – its real name is Next Generation Sound Synthesis. I actually get paid for working on this one!

As the name suggests, the project is about new ways of synthesising sound, creating more realistic sounding digital instruments and better acoustic models. I think it’s a pretty interesting area, and the approach being taken (physical modelling synthesis) shows a lot of promise. But before I get onto that, I’d like to go back to basics a bit and talk about computer synthesis in general, giving some examples of the different ways it’s been done over the years, what they sound like, their strengths and weaknesses, etc. I’ll be talking mainly from a computer programmer’s perspective rather than a musicians, so my examples will draw mainly from the sound chips and software used to make sounds in computers and games consoles rather than from music synthesizers. (Although I do play music keyboards, I don’t know a great deal about the technical side of them, especially not the earlier ones).

In fact, before I even start on that, I’m going to go even further back to basics and talk about how sound recording works in this first entry. (If it’s not immediately clear how that’s relevant to synthesis, I hope it will become clearer by the end).

Recording Sound

Sound is vibrations in the air that we can hear when they are picked up by our ear drums. To record sound and be able to play it back later, we need some means of capturing and storing the shape of those vibrations as they happen – and also a means of turning the stored shapes back into vibrations again so that they can be heard.

The earliest methods of sound recording didn’t even rely on any electronics – they were entirely mechanical. A diaphragm would pick up the vibrations from the air, then a needle connected to the diaphragm would etch out a groove in some soft medium – initially wax cylinders, later flat vinyl discs. The cylinder or disc would be turned by hand or by clockwork. The groove’s shape would correspond to the shape of the sound waves as they changed over time.

gramophone

This isn’t actually a mechanical gramophone, but it is the oldest one I could easily get hold of. It used to be my Granny’s.

To play back the sound, the process was reversed; the needle was made to run along the groove, transmitting its shape to the diaphragm, which would vibrate in the right way to recreate the original sound (more or less – the quality of these early systems left a lot to be desired).

It’s worth pausing for a moment to say something about how the shapes of sound wave relate to what they actually sound like to us. First of all, and maybe not surprisingly, a sound wave with bigger variations (larger peaks and troughs) sounds louder than one with smaller variations. So this:

loudwave

sounds exactly the same as this:

quietwave

except the first one is a lot louder.

So the height of the peaks in the wave (often called the amplitude) determines the loudness, more or less. The pitch (that is, whether the sound is high like a flute or low like a tuba) depends on how close together the peaks are. When there are a lot of peaks in quick succession like this:

highwave

the sound is high pitched. When there aren’t so many, like this:

lowwave

the sound will be deeper. This is called the frequency of the sound wave. Frequency is measured in Hertz (abbreviated to Hz), which is simply the number of peaks the wave has per second. Humans can hear frequencies in the range of about 20Hz up to 20,000Hz, but are much better at hearing sounds around 1,000Hz than sounds at either of those extremes. Also, ability to hear high frequencies tends to tail off quite dramatically with age, so it’s unlikely adults will be able to hear all the way up to 20,000Hz (20kHz).

Real sound waves (such as speech, music, everyday noises) are usually more complex than the ones I’ve shown above and are made up of a whole mixture of different frequencies and amplitudes, which also vary over time. This makes things more interesting from the perspective of synthesising sounds.

Electronic Recording

The simple mechanical recording system was improved with the advent of electronics. Electronic recording was more complex but resulted in much better sound quality. In the electronic system, a microphone is used to turn the sound vibrations into electrical signals whose voltage varies over time in the same shape as the sound waves. Having the sound in electronic form opens up lots more possibilities – for example, it can be boosted by electronic amplifiers, allowing a stronger signal to be stored, and the sound to be played back at much louder volumes. It can also be much more easily mixed with other sound signals, very useful for editing recordings.

analoguerecording

The first electronic systems still stored the sound as a groove cut in a vinyl disc, just as the original mechanical systems had. And as in the mechanical systems, the groove was the same shape as the original sound waves – there was no fancy encoding or conversion going on. Later, sound was also stored as a varying magnetic field on magnetic tape. The variations in magnetic field strength, like the shape of the grooves, corresponded exactly to the shape of the sound being recorded. This is known as analogue recording.

tapeplayer

Tune in next time for lots of information about the next big innovation in sound recording: digital recording!

 

The Lunatic Fringe (of Edinburgh)

When you’ve lived somewhere all your life (or at least for the vast majority of the portion of your life that you can actually remember, as in my case), you tend to take the things that are there for granted… even if they’re the very same things other people will travel thousands of miles to come and gawk at. Case in point: when Edinburgh goes crazy with the largest arts festival in the world every August, I always used to spend more time getting irritated by all the tourists getting on buses and gazing around in wide-eyed astonishment, as if they needed not just the fares and destinations but the entire concepts of fiat currency and motorised transport explained to them while I just wanted to get home from work before midnight, than I did actually going to shows. But this year will be different, thanks mainly to living with a certified Fringe addict.

We’ve been to two shows already, which is quite good going considering there have been exactly two nights of August so far. Last night’s was a complete live performance of Mike Oldfield’s Tubular Bells by two men. (For those of you that don’t know me, I am a massive Mike Oldfield fan. For those that do, my apologies, I’m afraid I’m going to bang on about Tubular Bells yet again).

This is my collection of Mike Oldfield CDs. Please don’t be alarmed, I am seeking professional help.

I didn’t know what to expect at all. It always feels a bit dangerous going to see a new interpretation of something that’s so close to my heart (in this case the album that first made me realise how amazing music could be), but this time I wasn’t disappointed!

For a start, it was very faithful to the original version, more than I would have thought possible with only two musicians and no pre-recorded backings, and probably even more so than some of Mike Oldfield’s performances of it. They also had a pretty impressive range of different instruments; although there were synths and samplers, they didn’t rely on them too heavily. I counted six guitars, a mandolin, one and a half drum kits, a glockenspiel, two kazoos and of course the eponymous long thin metallic hanging things in addition to the four assorted keyboards and the bewildering tangle of wiring underneath. The two guys both switched from one instrument to the next at an incredible speed, sometimes playing two at once while also adjusting things with their feet. I also didn’t see any sheet music or notes anywhere on the stage, so the whole thing must have been quite a memory test. But despite all this, one of them still found the time to down a half-glass of red wine during one of the quieter moments.

I’d definitely recommend it if you’re a Tubular Bells fan. Even if you’re not, it has to be one of the more entertaining ways to spend an hour of your August. Where else do you get to see one man going mental at a drumkit and a kazoo simultaneously while another hammers out piano chords and makes caveman noises into a microphone? 🙂