Programming "Deep" Neural Nets


No computer is ever smarter than the person(s) that programmed it. When they say "Artificial Intelligence" they mean that they have artificially put some of their own intelligence into the computer, where some of that intelligence was coded in a language that does not look like a conventional programming language.

Neural Nets (NNs) in particular are not very smart. You can see that in this picture:

Each circle in this diagram represents a single number, the weighted sum of all the circles to its immediate left. The circles on the left edge are called inputs, they could be any numerical input, perhaps one pixel in an image, or one letter of text. The circles on the right are the outputs, whatever it is this NN is supposed to determine. All of the "intelligence" of this NN is in those arrows, each one representing a weighting factor (a number) multiplied times the number in the circle to its left, and added to the sum in the circle to its right. That's all it is. Really. One programmer boasted "20 lines of C code." Of course this simplified diagram shows only five nodes (circles) in each layer, but a 32x32 pixel image used to train number recognition is a thousand number circles in each layer, a million weighting factors; recognizing a face from pixels would involve billions of numbers, say 200x200 = 40,000 pixels and 1,600,000,000 weights in each layer.

The three layers of green circles in the middle are what makes it "deep" but adding more layers is still mathematically equivalent to a single (very large) layer. It's all just the weighted average of the input pixels. Nothing intelligent is going on here, just adding up very large quantities of numbers, each the product of one number in the previous layer times one weighting factor. This is not where the NN is programmed, it only represents the "machine language" of the program after it has been programmed.

The programming is called "training" and it consists of another set of arrows pointing the the opposite direction, still just numbers, the products of numbers and weighting factors, all included in that "20 lines of C code."

The first part of the training is giving initial values to 1.6 billion weights. They use random initial weights. That matters, because if you give the same initial value to all the weights, it cannot converge, it won't recognize anything. You cannot give it just any random weights -- all zeros is one of those random weights that you might randomly start with (unlikely, but possible) -- you must give it numbers that happen to cover the solution space in such a way that the training process will migrate those weights toward factors that actually recognize what you want it to recognize. Random numbers are not intelligent, they are just noise. Any intelligence those initial weights contribute to what the NN does is applied by the person who wrote and ran the program that assigned those random initial weights, they must be smart enough to see that it didn't converge and give it a different set of random numbers. But that's a tiny part of the human intelligence involved in programming a NN.

The biggest part of the job is preparing the training data. If you want your NN to recognize the difference between Santa Claus and Uncle Sam, you must give it a huge number of pictures of Santa Claus, and another huge number of Uncle Sam, and yet another huge number of pictures of everybody else, each picture carefully "tagged" as being SC or US or neither -- there's your human intelligence, your programming, which probably involves more thoughtful human effort-hours than just writing a program to look for a wide face with a wide (white) beard and a red coat and hat (SC) versus a narrow face and narrow (white) beard and a blue coat and stars in the hat-band. But programmers are more expensive than people to look at and tag pictures. If you get the data wrong, then the program will behave in undesirable ways, like when they trained a text-writing engine with unfiltered internet data, and were appalled when it learned hate speech. A human child you can train by telling them, "You don't like it when other kids call you that, so don't you use those words," and they figure it out. The NN has no feelings, no introspection, no reflection, it just does what it is programmed to do, no more and no less.

During training, you give it an input (picture or whatever), which filters through the weighting multiplications and additions and produces an output, then that output is compared to the tag on the data, and the difference is filtered back through the net, adjusting each weighting factor up or down according to whether its contribution to the correct output was better or worse. It's all very determined: these pixels, apply these weights, produce this output, then feed the factors back to adjust the weights, then repeat. After hundreds or thousands of input samples, repeated dozens or hundreds of times, through billions or trillions of multiplications and additions, maybe the weights arrive at values that recognize what the programmer wanted, maybe not. If not, they fiddle with the initial values, maybe the encoding, and try again. I suspect the only kind of computation that adds more carbon to the atmosphere is blockchain (said to be responsible for "20% of the new carbon in the atmosphere"). If you believe what they say and care.

Anyway, NNs used to write text -- like "Deep-Speare" which generated poems that were designed to resemble Shakespeare sonnets -- are much more complicated, involving far more explicit human programming (in a conventional programming language) to convert the data into a form that the accumulation of random averages will actually converge on something intentional. They had to tag the words with stress and rhyme so that they had numbers (more or less stress, different rhyme endings) for the engine to compute the averages of when these words occurred in different parts of a line or quatrain (4-line segment of a sonnet), and then they needed to feed their engine random words for it to choose which word combinations more often occured together in the real Shakespeare. The results were still often ungrammatical because words that on the average occur together, there is no guarantee that they actually make sense together as a sentence, let alone as Shakespearean poetry. Modern poetry tries hard to make no sense at all, so it's no challenge to produce it; even the "Deep-Speare" output was (I suspect intentionally) evaluated by people more familiar with the modern nonsense, so they were less likely to see how bad the computer-generated output really was.

NNs are good at recognizing pictures that are just combinations of pixels, "this combination of pixels in the same picture are more likely to be a horse than a cat." A computer beat a human at Go because games are played against a clock, and if you throw enough compute power at the game, it can evaluate the rules faster than a human. Deep Blue beat world chess champion Garry Kasparov because the programmers cheated: they (multiple programmers, all knowing chess, together perhaps a better player than Kasparov alone) changed the program mid-match. IBM's Watson (medical diagnosis) was a catastrophic billion-dollar loss for IBM because (unlike trained physicians) it couldn't think, it only did averages. Amazon funds what they call the Alexa Challenge (or something like that), a million dollar prize to the first AI robot that can engage a human in small talk "chat" for 20 minutes. They repeat every year, because nobody won. The top three contenders in a recent iteration, the NN-only entry finished third, behind "handcoded" and a combination of the two.

The bottom line is that NNs are deterministic programs that do exactly what they were programmed to do, first at the low "20 lines of C code" level, and then also in the training and running of the trained engine. You put these numbers in, turn the crank, and get these numbers out. The NN cannot do math, but it could be conceivably trained on a large number of sums and products, so if you threw math problems at it, it might (most of the time) get credible answers, but if you give it a problem it has not seen, all bets are off. And that, ladies and gentlemen, is why self-driving cars kill pedestrians. They don't have any idea what they are doing, just "on the average, that does not look like a pedestrian, so it's OK to keep going." They are still programmed deterministically, but very badly. We can do better. The Alexa Challenge almost-winner did better. You can do better.

Tom Pittman
2021 May 5a