Download ЛЖЖЛ - Cognitive Systems

Transcript
148
CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS
9.1.5 Backpropagation with Weight Decay
Weight Decay was introduced by P. Werbos ([Wer88]). It decreases the weights of the
links while training them with backpropagation. In addition to each update of a weight
by backpropagation, the weight is decreased by a part d of its old value. The resulting
formula is
wij (t + 1) = Æj oi d wij (t)
The eect is similar to the pruning algorithms (see chapter 10). Weights are driven to
zero unless reinforced by backpropagation. For further information, see [Sch94].
9.2 Quickprop
One method to speed up the learning is to use information about the curvature of the
error surface. This requires the computation of the second order derivatives of the error
function. Quickprop assumes the error surface to be locally quadratic and attempts to
jump in one step from the current position directly into the minimum of the parabola.
Quickprop [Fah88] computes the derivatives in the direction of each weight. After computing the rst gradient with regular backpropagation, a direct step to the error minimum
is attempted by
(t + 1)wij = S (t)S (t S+(t1)+ 1) (t)wij
where:
wij
(t + 1)
S (t + 1)
S (t)
weight between units i and j
actual weight change
partial derivative of the error function by wij
the last partial derivative
9.3 RPROP
9.3.1 Changes in Release 3.3
The implementation of Rprop has been changed in two ways: First, the implementation
now follows a slightly modied adaptation scheme. Essentially, the backtracking step is
no longer performed, if a jump over a minimum occurred. Second, a weight-decay term is
introduced. The weight-decay parameter (the third learning parameter) determines the
relationship of two goals, namely to reduce the output error (the standard goal) and to
reduce the size of the weights (to improve generalization). The composite error function
is:
E = (ti oi )2 + 10 wij2
X
X