Download ЛЖЖЛ - Cognitive Systems
Transcript
148 CHAPTER 9. NEURAL NETWORK MODELS AND FUNCTIONS 9.1.5 Backpropagation with Weight Decay Weight Decay was introduced by P. Werbos ([Wer88]). It decreases the weights of the links while training them with backpropagation. In addition to each update of a weight by backpropagation, the weight is decreased by a part d of its old value. The resulting formula is wij (t + 1) = Æj oi d wij (t) The eect is similar to the pruning algorithms (see chapter 10). Weights are driven to zero unless reinforced by backpropagation. For further information, see [Sch94]. 9.2 Quickprop One method to speed up the learning is to use information about the curvature of the error surface. This requires the computation of the second order derivatives of the error function. Quickprop assumes the error surface to be locally quadratic and attempts to jump in one step from the current position directly into the minimum of the parabola. Quickprop [Fah88] computes the derivatives in the direction of each weight. After computing the rst gradient with regular backpropagation, a direct step to the error minimum is attempted by (t + 1)wij = S (t)S (t S+(t1)+ 1) (t)wij where: wij (t + 1) S (t + 1) S (t) weight between units i and j actual weight change partial derivative of the error function by wij the last partial derivative 9.3 RPROP 9.3.1 Changes in Release 3.3 The implementation of Rprop has been changed in two ways: First, the implementation now follows a slightly modied adaptation scheme. Essentially, the backtracking step is no longer performed, if a jump over a minimum occurred. Second, a weight-decay term is introduced. The weight-decay parameter (the third learning parameter) determines the relationship of two goals, namely to reduce the output error (the standard goal) and to reduce the size of the weights (to improve generalization). The composite error function is: E = (ti oi )2 + 10 wij2 X X