I’ve been meaning to have a proper play around with modern artificial intelligence techniques for a while, and during lockdown with more time on my hands seemed like a good time to give it a go. So I trained a deep learning neural network to recognise characters from The Simpsons. (As you do).
This was actually my third foray into neural networks: I used one (not very successfully) for my final year university project way back in the mists of time, and I also experimented with training one to generate text last year. (Among other things, I fed it megabytes of text from my diary and got it to generate its own diary entries based on them, which was pretty hilarious if not particularly useful). But this was the first time I’ve attempted to use one in what’s probably their biggest application area, namely computer vision and image recognition.
I thought recognising Simpsons characters would be a good way to get started with this, for several reasons. Firstly, I really like the Simpsons (or at least I did until it all went downhill in the late 90s or so). Secondly, it was relatively easy to get hold of large amounts of Simpsons images for training and testing the network (more on that in a moment). And thirdly, because cartoon characters look so distinctive, it would be easier to get a computer to tell them apart than it would in the case of (for example) real people.
Before I go any further I’d like to give a shout out to the fantastic Practical Deep Learning For Coders course made by the developers of fast.ai. I watched all the course videos a few months ago and found them incredibly interesting and inspiring, possessing the rare combination of being instantly accessible but also going into the subject in great depth. As an illustration of what I mean, after the first half hour or so of the opening lecture you’re already up and running with training a classifier to tell different cat and dog breeds apart, while the second half of the videos delve right into the code, explaining it right down to a line-by-line analysis of the algorithms that make up a neural network. Highly, highly recommended for anyone who knows how to code and is at all interested in AI.
Preparing the data
The first step in building a deep learning model is getting together some data that you can use for training and testing the neural network. In my case, I needed as many images from The Simpsons as I could get hold of, and I also needed to “tag” them (or at least most of them) with the names of the characters that appeared in them.
I decided to write a Python script that would download random images from Frinkiac, which is basically a Simpsons screen grab search engine, often used for making memes and so on. I felt a bit bad as it probably wasn’t intended for this usage, but in my defence I was quite gentle with it – I left my script running over a period of days, grabbing a single image at a time and then sleeping for a while so as not to hammer the site’s bandwidth. By the end of this process I had a completely random selection of around 3,000 screen captures from the first 17 seasons of the show sitting on my hard drive.
The next step was to “tag” these with the names of the characters that appeared in them. You might wonder why I had to do this… after all, my aim was to get the computer to identify the characters automatically, not to have to do it myself, right? Well yes, but in order to train a neural network to perform this sort of recognition task, you need to give it “labelled” data – that is, you show it an image along with a label describing what’s in it, in quite a similar way to how you might train a person to recognise characters they weren’t previously familiar with, in fact. So you need the data to be labelled.
I wasn’t looking forward to this bit as I knew it would take quite a bit of time consuming manual work – I was going to have to look at every image myself and identify the characters present, then enter that information into the computer somehow. To ease the pain, I built a little web app to try and make this process as fast as possible. It showed me the images in turn, allowing me to tag each one and move onto the next one with the minimum of key presses, writing the image names and tags into a CSV file that I could use with the AI software later on. In all I think it took me maybe an hour to write the web app and about 2 hours to tag the images, which wasn’t as bad as I’d feared.
Initially I had planned to train the network to recognise all the named characters in the show, but I later realised I probably didn’t have enough data for this – some of the more minor characters only showed up a handful of times in my training images, not really enough to make the recognition reliable. So instead I decided to focus on just the four main characters: Homer, Marge, Bart and Lisa.
Training the model
Once I had the tagged training data ready, I turned my attention to actually training a neural network to recognise it. I used the same software used in the fast.ai course I mentioned above, namely fast.ai itself (which is built on PyTorch), with the code written in the form of a Jupyter Notebook for easy experimentation. I used a ResNet34, a classic architecture for image recognition, though I also tried using a larger ResNet50 to see if it worked any better (it didn’t). Training (on my GeForce 1050Ti) only took about 5 minutes, then I was able to play with the resulting model, testing it on images it hadn’t seen before.
Overall, I was reasonably happy with it, for a first attempt. It worked very well indeed (almost perfectly) for images that included a reasonably close shot of one of the characters’ faces. For example:
(You may notice that the model doesn’t just give a straight yes or no prediction, but a percentage score indicating how confident it is that each character does appear in the image).
The model doesn’t work so well for more complicated situations such as characters being partially hidden, characters viewed from an unusual angle, characters wearing unusual clothing (especially clothing that covers up some of their distinctive features), characters far away in the distance so that they appear very small in the image, and so on. Below are some examples where it doesn’t make such a confident prediction, and my speculation as to why that might be.
Prediction: 35.4% Marge. The model thinks it’s more likely that Marge is in the image than any of the other characters (who all scored likelihoods of less than 10%), but still isn’t very confident, probably because she’s in a slightly unusual position and has her head turned away.
Prediction: 54.23% Homer. The model thinks there’s a decent chance Homer is in this image, but isn’t very sure, probably because only the top of his face is visible in this one.
Prediction: 99.88% Lisa, 11.55% Bart. The model is very certain Lisa is here, but nowhere near as confident about Bart. I think this is probably because Bart is partially hidden behind Lisa and Maggie, while Lisa is fully visible.
Prediction: 97.23% Bart, 97.49% Homer, 88.81% Lisa, 68.79% Marge. This time the model correctly identifies that all four characters are in the image, but it’s significantly less certain about Marge than the others, probably because her face is obscured behind Bart.
Prediction: 87.75% Bart, 53.09% Homer, 94.77% Lisa, 39.25% Marge. In this shot, all the characters are present but not in their usual clothing. Bart and Lisa are recognised with a high degree of confidence, but the model is understandably not so confident about Homer, since only the top of his face and head is visible. Surprisingly, it’s even less confident about Marge, maybe because her trademark hair is mostly hidden from view.
Prediction: 98.25% Homer, 74.86% Marge. The model is a lot less confident about Marge than Homer, presumably because Homer’s glass is obscuring most of her face.
Prediction: 91.71% Homer, 93.39% Marge, 67.27% Lisa. Homer and Marge are recognised with more than 90% certainty as expected. Interestingly, the model also thinks that Lisa is probably here, I’m guessing because Maggie looks very similar to Lisa in some ways, notably her hair and eyes.
So that’s my model. I have no doubt at all that it could be done much better by someone with more expertise (or, for that matter, a better training data set), but as someone who started programming back in the days when it would have been unimaginable for a computer to do this, it’s amazingly cool to see it working even as well as it is.
Can I play with it?
More seriously, I’d like to find out how to make models like this available online for people to have a go of, but I’m not there yet. I’m new to all this and don’t want to end up overloading my web host, or running up a huge bill if I go down the cloud hosting route, so I’d definitely want to do some research or testing before attempting this.