Artificial Neural Networks: An Introduction
-- Anuj Ranjan
This article is the first in a three-part introduction into the world of artificial
neural networks. Don't get fooled by the word 'introduction' in the title. By the end of
these three parts, you will have all the information you will need to be able to fully
understand the concepts of neural nets and even design your own ANNs to perform whatever
complex tasks you want. Part I, which is below, introduces the concept of a neural network
in computing terms, and also describes how one could go about creating a simple basic ANN
using grandmother cells and the Grandmothering technique (Don't ask why they call it
that). Parts II and III will cover more complex algorithms for neural net design, namely
the Adaline and the Backpropagation networks.
Part 1: What is a Neural Network?
An excellent one sentence description of what encompasses a neural network is provided
by Robert Hecht-Nielsen: A neural network is a computing system which is made up of a
number of simple, highly interconnected processing elements, and which processes
information by its dynamical state response to external inputs.
Now, a lot of you may be thinking, "What the heck does that mean?" Well, it
means a number of things. It says that a neural network is not a serial computer (in that
it does not execute a sequential set of instructions), it is not deterministic, and it has
no separate memory array for the storage of data. In fact, knowledge within a neural
network is not stored in a particular location (of memory). Instead, knowledge is stored
in the way that the processing elements are connected (how the outputs of one processing
element are used as inputs into others), and in the weighting (or importance) of each
input to the processing elements. Knowledge is more of a function of the architecture of
the network rather than the contents of it.
Neural Network design was inspired by current studies of the cerebral cortex in the
human brain. The cerebral cortex is made up of billions of neurons (processing elements,
or neurodes, in our context) and interconnections between them. These neurons, when
presented with input (electrical signals, or binary 1 for a strong signal and 0 for no
signal in our terms), either fire or do not fire. The outputs of these neurons are then
sent to many other neurons (and possibly back to itself) as input signals via the
interconnections.
The structure of a neural network is made up of the interconnection architecture
between the neurodes, the function that will determine whether or not the neurode will
fire, and the rules that determine the changes in the importance (weighting) of the
neurodes inputs (training laws - will be discussed at greater length later).
Thus, a neural network developer would spend his/her time specifying the
interconnections, transfer functions, and training laws, which does not follow the
traditional methods of programming. This is because a neural network is not the
traditional computer system. Instead of executing programs like most systems, neural nets
react, self-organize, learn, and even forget according to their inputs.
Why should we learn about ANNs?
Neural networks appear to be able to solve "monster" problems of AI that
traditional systems have found difficulty with. These include, but are not limited to,
speech recognition and synthesis, vision, and pattern recognition.
Neural Nets appear to be good at solving the kinds of problems that people can also
solve easily. However, they are also usually terrible at solving problems that traditional
computers are very good at. For example, a neural net would not be able to make a precise,
numerical computation (which is the basis of traditional systems). On the other hand, a
neural net can be taught to recognize whether or not a visual image of a face is that of a
particular person, even with a different facial expression or hairdo. People are very good
at this, but try doing this with a digital computer!
It is also important to point out that neural networks are not a replacement for
traditional systems, but are rather a partner to them. Most neural networks are used in
conjunction with other systems, and are operated by calling procedures when a network
application is encountered.
Basics of Neural Networks
A neurode is basically an extremely simple processing element that has a
number of input signals and only one output signal (see Figure 1). Each input
signal xi has an associated weight wi, so that the effective input
to the neurode is the weighted total input (or the sum of all of the products of each
input and its assigned weight). The simplest kind of neurode simply compares this weighted
sum to an arbitrary threshold. If the input is greater than this threshold, the neurode
will fire or generate an output signal. Otherwise, the processing element will not fire
and no output will be generated.
Figure 1: A neurode is a simple processing element that has input signals (each
with an assigned weight) and an output signal.
The output signal of a neurode then splits out to act as inputs to other
neurodes (see Figure 2). It can also act as an input on itself, depending on the
network architecture. Also, these outputs (and subsequent inputs) can be either excitatory
or inhibitory (either the signal tends to cause the neurode to fire or it tends to keep
the neurode from firing).
Figure 2: A neurode's output signal splits out to act as inputs to other neurodes
(or even back as input to itself)
If you think of the inputs and their corresponding weights as vectors, then the total
input signal is just the dot or inner product of the weight and input vectors. It thus
follows that the projection of the weight vector on the input vector will be strongest
when the two are pointed in almost the same direction, and will be smallest when the two
are pointed in near perpendicular directions. The projection is a measure of the closeness
of the two vectors to each other.
Now, imagine a network with a set of only four neurodes, each with the
same set of inputs but different weights on those inputs. Also, suppose only one of these
neurodes can fire, and that the firing neurode is that with the largest input signal (its
weight and input vectors are closest together). We can imagine the four neurodes with
their weight vectors pointing in completely different directions. The one with the weight
vector pointing most closely to the direction of input vector will be the one that fires
(see Figure 3). Using this visual image of weight and input vectors is extremely
helpful in gaining insight into their operations.
Figure 3:The neurode that will fire will be the one whose weight vector is pointing
closest to the direction of the input vector. Thus the neurode represented by weight
vector w1 will fire strongest.
This system, although primitive, can already perform useful functions. Suppose we want
a neural net to recognize inputs and classify them into one of four distinct patterns. We
could set each of the neurons so that their weight vectors point to each of the four
patterns we want the system to recognize. We then present an input vector from some
unknown sample, and the neurode with its weight vector closest to the input vector (i.e.
it is the best match) will fire with the greatest strength, which would appropriately
classify the given pattern.
You might think that neurodes would have to be much more complicated in order to make
up interesting neural nets. Well, I hate to disappoint you, but as this example shows,
even very simple networks can be made to perform surprisingly complex tasks. This simple
system is, in fact, an example of the use of grandmother cells (I have no idea
where the term comes from), which are processing elements that respond to exactly one type
of pattern.
Grandmothering, which is the type of network we've just described, is not really
learning (as some of you might have noticed), but is instead memorization. It is a static
system in which the weights of a neurode are never changed during the system's operation.
To recognize a new pattern, you have to add a new grandmother cell, and as you can
imagine, for real-world tasks the number of grandmother cells can quickly get out
of hand. Sure, memorizing that 25 + 56 = 81 is useful, but it would be much more useful if
the system could use a general technique (such as training) to come up with the sum of any
two numbers.
Part II, "The Adaline", will cover one of the first effective learning laws,
the Widrow-Hoff LMS algorithm. This learning algorithm bypasses the disadvantages
of the grandmothering system (by not plainly memorizing), and is still widely used
in network architectures and offers excellent solutions for specific problems.
|