Download Quantified Maximum Entropy MemSys5 Users` Manual
Transcript
30 CHAPTER 2. CLASSIC MAXIMUM ENTROPY M X = Rkj j=1 L X Cji hi i=1 and hence F = RC h and ∂F ∂h = R C. Note again the dimensions of the three spaces: L = dimension of hidden space 2.6 M = dimension of visible space N = dimension of data space. Inferences about the noise level and other variables The classic maximum entropy analysis carries with it the overall probability of the data from (2.5) on page 19: Pr(D) = Z b | D) dα Pr(α | D) = Pr(α N/ 2 = (2π)− 1 b − L(h)) b (det B)− /2 . b h) det[σ −1 ] exp(αS( This expression becomes useful when it is realised that, like any other probabilistic expression, it is conditional on the underlying assumptions. In fact all the probabilities derived in the classic maximum entropy analysis have been conditional upon a choice of model m, experimental variables defining the response functions, noise amplitudes σ, and so on, so that Pr(D) is really a shorthand for Pr(D | m, R, σ, . . . ). If such variables are imperfectly known, these conditional probability values can be used to refine our knowledge of them, by using Bayes’ theorem in the form Pr(variables | D) = constant × Pr(variables) Pr(D | variables). Ideally, one would set up a full prior for unknown variables and integrate them out in order to determine Pr(h | D). In practice, though, with large datasets it usually suffices to select the single “best” values of the variables, just as was the case for the regularisation constant α. A common special case of this concerns experimental data in which the overall noise level is uncertain, so that all the standard deviations σ in Pr(D) above should be scaled with some coefficient c. Rescaling α to α/c2 for convenience gives N/ 2 Pr(D | α, c) = (2πc2 )− 1 2 b − L(h))/c b det[σ −1 ] exp (αS(h) (det B)− /2 . For this case, the maximum entropy trajectory itself is unaltered and parameterised by the same values of α, though the probability clouds for h are of different overall size. The Evidence is maximised over c when c2 = 2 (L − αS)/N. At this scaling, we note that (for linear data) the χ2 misfit statistic χ2 = (D − F )T [c−2 σ −2 ] (D − F )