The principle of radial basis functions derives from the theory of functional approximation. Given N pairs () we are looking for a function f of the form:
h is the radial basis function and are the K centers which have to be selected. The coefficients are also unknown at the moment and have to be computed. and are elements of an n--dimensional vector space.
h is applied to the Euclidian distance between each center and the given argument . Usually a function h which has its maximum at a distance of zero is used, most often the Gaussian function. In this case, values of which are equal to a center yield an output value of 1.0 for the function h, while the output becomes almost zero for larger distances.
The function f should be an approximation of the N given pairs and should therefore minimize the following error function H:
The first part of the definition of H (the sum) is the condition which minimizes the total error of the approximation, i.e. which constrains f to approximate the N given points. The second part of H ( ) is a stabilizer which forces f to become as smooth as possible. The factor determines the influence of the stabilizer.
Under certain conditions it is possible to show that a set of coefficients can be calculated so that H becomes minimal. This calculation depends on the centers which have to be chosen beforehand.
Introducing the following vectors and matrices
the set of unknown parameters can be calculated by the formula:
By setting to 0 this formula becomes identical to the computation of the Moore Penrose inverse matrix, which gives the best solution of an under-determined system of linear equations. In this case, the linear system is exactly the one which follows directly from the conditions of an exact interpolation of the given problem:
The method of radial basis functions can easily be represented by a three layer feedforward neural network. The input layer consists of n units which represent the elements of the vector . The K components of the sum in the definition of f are represented by the units of the hidden layer. The links between input and hidden layer contain the elements of the vectors . The hidden units compute the Euclidian distance between the input pattern and the vector which is represented by the links leading to this unit. The activation of the hidden units is computed by applying the Euclidian distance to the function h. Figure shows the architecture of the special form of hidden units.
Figure: The special radial basis unit
The single output neuron gets its input from all hidden neurons. The links leading to the output neuron hold the coefficients . The activation of the output neuron is determined by the weighted sum of its inputs.
The previously described architecture of a neural net, which realizes an approximation using radial basis functions, can easily be expanded with some useful features: More than one output neuron is possible which allows the approximation of several functions f around the same set of centers . The activation of the output units can be calculated by using a nonlinear invertible function (e.g.\ sigmoid). The bias of the output neurons and a direct connection between input and hidden layer (shortcut connections) can be used to improve the approximation quality. The bias of the hidden units can be used to modify the characteristics of the function h. All in all a neural network is able to represent the following set of approximations:
This formula describes the behavior of a fully connected feedforward net with n input, K hidden and m output neurons. is the activation of output neuron k on the input to the input units. The coefficients represent the links between hidden and output layer. The shortcut connections from input to output are realized by . is the bias of the output units and is the bias of the hidden neurons which determines the exact characteristics of the function h. The activation function of the output neurons is represented by .
The big advantage of the method of radial basis functions is the possibility of a direct computation of the coefficients (i.e. the links between hidden and output layer) and the bias . This computation requires a suitable choice of centers (i.e. the links between input and hidden layer). Because of the lack of knowledge about the quality of the , it is recommended to append some cycles of network training after the direct computation of the weights. Since the weights of the links leading from the input to the output layer can also not be computed directly, there must be a special training procedure for neural networks that uses radial basis functions.
The implemented training procedure tries to minimize the error E by using gradient descent. It is recommended to use different learning rates for different groups of trainable parameters. The following set of formulas contains all information needed by the training procedure:
It is often helpful to use a momentum term. This term increases the learning rate in smooth error planes and decreases it in rough error planes. The next formula describes the effect of a momentum term on the training of a general parameter g depending on the additional parameter . is the change of g during the time step t+1 while is the change during time step t:
Another useful improvement of the training procedure is the definition of a maximum allowed error inside the output neurons. This prevents the network from getting overtrained, since errors that are smaller than the predefined value are treated as zero. This in turn prevents the corresponding links from being changed.