If you Google ‘neural networks’ you’ll find millions of articles explaining how Naive Bayesian regression is used to drive reverse propagation across hidden layers of artificial neurons, with cool pictures of lots of circles connected to lots of other circles. Unfortunately, very few of those articles explain neural networks in plain English, hopefully this becomes one that does.
What is a Neural Network?
It’s software, let’s be clear on that, it’s just an application. Sorry to disappoint you but it’s not a physical network of brains floating in slimy green liquid, or an army of cognitively connected robots, it’s just an application. In fact, it’s often just part of another application, such as Snapchat.
A neural network is just a particular type of application that is very good at taking a bunch of data and giving you an answer to a question. There are dozens of examples of where neural networks are used in applications, such as:
- Spotting fraudulent financial transactions
- Detecting faces and objects in pictures
- Predicting house prices
- Natural language and speech recognition
- Predicting failure rates of machinery
- Translating languages
One of the most commonly used examples is of course identifying photos with cats in them. As we know, the entire internet was invented to enable the sharing of funny cat videos and photos, so of course this would be one of the first targets for the use of artificial intelligence.
Artificial what? Weren’t we talking about neural networks, how did we get to AI?
Neural networks are a form of artificial intelligence – a subset of it. Kind of like tennis is a form of sport. Just as there are many different forms of sport, there are many different approaches to artificial intelligence, AI for short. AI is big topic, so we’ll get to that in another article. Right now we’re focused on neural nets (tip for beginners: all the cool people say neural nets, only your dad would say network).
A neural net is a bit of software, one that is very good at taking a set of inputs (photos, videos, financial data, speech, etc.), running it through a complex mathematical model, and giving you a probability of a certain answer (e.g. that Visa transaction is 97.4% likely to be fraudulent). There’s nothing that special about this – software has been able to do that for years without the need for neural nets, so what makes them special?
What makes Neural Networks special?
It’s all about the difference between programming and teaching. This is a fundamental difference. It’s revolutionary, and it’s what makes neural nets so powerful.
People are rather clever – we can look at several factors and determine an answer based on those factors. Such as we know that if it’s raining, it’s 8am on a weekday, and the buses are on strike, traffic will be bad. We take those three factors (weather, time and the state of public transport) and we can calculate the likelihood of our commute being pure hell. Easy. We can even program a computer to take those three inputs and calculate an output. Those who are good at math could even write an equation to calculate traffic congestion levels based on those three factors.
The problem with programing
However, there are also plenty of days when traffic is hell and yet it’s sunny and the busses are running on time, so there must be other factors that have an impact on traffic outside of the three we know about. Traffic congestion on any given road is of course impacted by hundreds of factors, such as the traffic congestion on the roads leading onto and off that road, special events in the area, the price of gas, the state of the economy, the price of public transportation, and so on. There are far too many factors for a human to work out, and even the world’s best mathematicians would struggle to define an equation to calculate all these factors. In the traditional world of computer programing this would be an impossible problem to solve, you can’t write a program to do something that you don’t know how to do.
Program do not, teach you must
This is where neural networks come in. We don’t need to program a computer and tell it the connections between the hundreds of traffic impacting factors, we just need to teach it. How do you teach a computer about traffic congestion? First you feed it all the data you can find about the hundreds of factors that might impact traffic (historical weather reports, accident data, economic data, gas price history, etc.). Then you let it guess based on all that info what the traffic was like on Tuesday the 2nd of July at 8:03am. It will get it wrong, as we haven’t told it how those factors impact traffic – it’s still very dumb at this stage. After it guesses it wrong you tell it the right answer, it takes that answer, learns from it and guesses again. Rinse and repeat hundreds of times, and it will get closer and closer until it gets the answer close enough that we are happy. At this point it has learnt how all the different factors interrelate to affect traffic congestion, at 8:03am on the 2nd of July. Now give it a different date and time and repeat the process. Keep going a few hundred thousand more times.
By the end of the training the neural network will have made the connections between all the factors you gave it to work out the likelihood of traffic congestion. The amazing part is youwill still have no idea how those factors interrelate, the neural network could appear to be a magic box that gives you the correct answer (correct the majority of the time, depending on how well you’ve trained it).
That’s magic – how does it work?
You don’t need to know how it works, seriously I have no idea how my microwave oven works but I know its great at reheating food, crap at browning a roast and that bad things happen if I put metal in it. That’s the level of knowledge we need for neural networks. We need to know what they are good at, what they are crap at, and what could go wrong if we do the wrong thing.
You definitely don’t need to know the standard explanation about weighted synapses, neural nodes, backwards propagation, hidden layers and Naïve Bayesian regression. That’s all very fascinating stuff but completely irrelevant to what a neural net can and should be used for.
To give you a tiny bit of insight as to how it works, a neural network is a complex mathematical model that takes a bunch of input numbers and turns them into an output number. Its made up of a bunch of separate equations that get adjusted during the training until they all align to produce the right answer. The training isn’t magic, it just tweaks the equations each time you train it so the answer gets closer each time. The cool part being that the neural network software does this by itself, you don’t need to understand the equation or what to tweak, it does that for you.
Is it actually intelligent?
That depends on how we define intelligence. From the point of view that it self learns and self improves, then yes it is. Could it take over the world and create SkyNet? No. It’s like comparing an earthworm to a human, you’d say an earthworm was dumb, unintelligent, but compare an earthworm to a rock and then you might say its quite smart. Neural networks are a form of artificial intelligence, they are intelligent compared to traditional coded programs, but they are a far cry from the sci-fi image of a sentient computer.
What are Neural Networks great at?
You’ll find neural networks everywhere. On your iPhone there’s one that analyses your photos so you can search for cats in your photo library and your phone will show you all the photos with cats. On your TV you’ll find one in Netflix that recommends the next show for you to watch based on your past history. On Amazon you’ll find one doing the “customers also bought this…” recommendations. Spotify, Pandora, Tidal and all the music streaming services use one to make radio mixes for you. Your credit card provider runs one every time you swipe your card to check that it’s not fraud. Your self-driving car uses a series of them to detect objects and drive by itself.
Neural networks are great at predictions, finding correlations between lots of variables that would be beyond a human comprehension, and for solving problems where a set of rules just doesn’t work.
This brings us back to cats. If I asked you to list a set of rules to identify a cat in a photo you would probably come up with some obvious ones such as:
- They have fur, a tail, pointy ears, about the size of a small dog, and four legs.
Easy isn’t it, assuming you could program you could turn that into code and presto you have an application that can spot cats in a photo.
But wait… what if the cat is a Manx cat, they don’t have tails. What if it’s missing a leg, what if you can only see half of the cat in the photo? Does your application still work? Nope, your program is looking for a small four-legged animal with a tail and fur, so all those photos end up in the no-cat pile.
The problem is that we don’t actually know how to describe or write an application for identifying a cat in a photo because it’s actually really, really complicated. Just when we think we’ve got it, a photo that we weren’t expecting comes along with a hairless Manx cat sitting upside down in a vase. Trying to codify the millions of possible cat photo combinations simply isn’t possible by writing rules in an application.
This is where neural networks come in. We don’t need to tell a neural net how to spot a cat, we just tell it when it has spotted one. Then every time it gets it right it learns and updates itself (changes its network connections) to make it more likely to get it right again, and before you know it, the neural net can spot a cat in a photo with almost perfect accuracy. And here’s the key thing, you never even had to tell it what your cat looked like.
No cats were harmed in the writing of this article, nor were they described.
If you liked this and want to think some more about neural nets have a read of this article: