As a first step, a three--layer feedforward network must be
constructed with full connectivity between input and hidden layer and
between hidden and output layer. Either the graphical editor or the
tool ` BIGNET` (both built into SNNS) can be used for this purpose.

The output function of all neurons is set to ` Out_``
Identity`. The activation function of all hidden layer neurons is
set to one of the three special activation functions `
Act_RBF_` (preferably to ` Act_RBF_Gaussian`). For the
activation of the output units, a function is needed which takes the
bias into consideration. These functions are ` Act_Logistic` and
` Act_Identity`` Plus`` Bias`.

The next step consists of the creation of teaching patterns. They can
be generated manually using the graphical editor, or automatically
from external data sets by using an appropriate conversion program. If
the initialization procedure ` RBF_Weights_Kohonen` is going to
be used, the center vectors should be normalized to length 1, or to
equal length.

It is necessary to select an appropriate bias for the hidden units
before the initialization is continued. Therefore, the link weights
between input and hidden layer are set first, using the procedure `
RBF_Weights_Kohonen` so that the center vectors which are
represented by the link weights form a subset of the available teaching
patterns. The necessary initialization parameters are: * learn
cycles* **=0**, * learning rate* , * shuffle* .
Thereby teaching patterns are used as center vectors without
modification.

To set the bias, the activation of the hidden units is checked for
different teaching patterns by using the button TEST of the
SNNS control panel. When doing this, the bias of the hidden neurons
have to be adjusted so that the activations of the hidden units are as
diverse as possible. Using the Gaussian function as base function, all
hidden units are uniformly highly activated, if the bias is chosen too
small (the case bias **=0** leads to an activation of 1 of all hidden
neurons). If the bias is chosen too large, only the unit is activated
whose link weights correspond to the current teaching pattern. A
useful procedure to find the right bias is to first set the bias to 1,
and then to change it uniformly depending on the behavior of the
network. One must take care, however, that the bias does not become
negative, since some implemented base functions require the bias to be
positive. The optimal choice of the bias depends on the dimension of
the input layer and the similarity among the teaching patterns.

After a suitable bias for the hidden units has been determined, the
initialization procedure ` RBF_Weights` can be started. Depending
on the selected activation function for the output layer, the two *
scale* parameters have to be set (see
page ). When ` Act_IdentityPlusBias`
is used, the two values 0 and 1 should be chosen. For the logistic
activation function ` Act_Logistic` the values -4 and 4 are
recommended (also see figure ). The parameters *
smoothness* and * deviation* should be set to 0 first. The *
bias* is set to the previously determined value. Depending on the
number of teaching patterns and the number of hidden neurons, the
initialization procedure may take rather long to execute. Therefore,
some processing comments are printed on the terminal during
initialization.

After the initialization has finished, the result may be checked by
using the button. However, the exact network error can
only be determined by the teaching function. Therefore, the learning
function ` RadialBasisLearning` has to be selected first. All
learning parameters are set to 0 and the number of learning cycles
(CYCLES) is set to 1. After pressing the button , the
learning function is started. Since the learning parameters are set to
0, no changes inside the network will occur. After the
presentation of all available teaching patterns, the actual error is
printed to the terminal. As usual, the error is defined as the sum of
squared errors of all output units (see
formula ). Under certain conditions it can be
possible that the error becomes very large. This is mostly due to
numerical problems. A poorly selected bias, for example, has shown to
be a difficult starting point for the initialization. Also, if the
number of teaching patterns is less than or equal to the number of hidden
units a problem arises. In this case the number of unknown
weights plus unknown bias values of output units exceeds the number of
teaching patterns, i.e. there are more unknown parameters to be
calculated than equations available. One or more neurons less inside
the hidden layer then reduces the error considerably.

After the first initialization it is recommended to save the current
network to test the possibilities of the learning function. It has
turned out that the learning function becomes quickly unstable if too
large learning rates are used. It is recommended to first set only one
of the three learning rates (* centers, bias (p), weights*) to a
value larger than 0 and to check the sensitivity of the learning
function on this single learning rate. The use of the parameter *
bias (p)* is exceptionally critical because it causes serious changes
of the base function. If the bias of any hidden neuron is getting
negative during learning, an appropriate message is printed to the
terminal. In that case, a continuing meaningful training is
impossible and the network should be reinitialized.

Immediately after initialization it is often useful to train only the link weights between hidden and output layer. Thereby the numerical inaccuracies which appeared during initialization are corrected. However, an optimized total result can only be achieved if also the center vectors are trained, since they might have been selected disadvantageously.

The initialization procedure used for direct link weight calculation is
unable to calculate the weights between input and output layer.
If such links are present, the following procedure is recommended:
Even before setting the center vectors by using `
RBF_Weights_Kohonen`, and before searching an appropriate bias, all
weights should be set to random values between and
by using the initialization procedure ` Randomize_Weights`.
Thereby, all links between input and output layer are
preinitialized. Later on, after executing the procedure `
RBF_Weights`, the error of the network will still be relatively large,
because the above mentioned links have not been considered. Now it is
easy to train these weights by only using the teaching parameter *
weights* during learning.

Niels.Mache@informatik.uni-stuttgart.de

Tue Nov 28 10:30:44 MET 1995