Download iCSC - Cern
Transcript
outlook temp. humidity windy play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high false yes rainy cool normal false yes rainy cool normal true no overcast cool normal true yes sunny mild high false no sunny cool normal false yes rainy mild normal false yes sunny mild normal true yes overcast mild high true yes overcast hot normal false yes rainy mild high true no sunny cool high true ? outlook temp. humidity windy play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high temp. false yes rainy cool false yes rainy overcast normal hot 0:2cool mild normal 1:1 cool normal sunny mild humidity high sunny cool rainy high mild 0:3 sunny mild overcast mild overcast false hot 1:2 mild rainy cool 1:0true true no yes false no normal normalfalse yes normal2:0 false yes normal true yes true yes normal true false yes high 1:1 true no windy high yes sunny cool high true overall Data Management and Database Technologies 2/9 3/9 3/9 3/9 9/14 Decision trees ID3: A recursive algorithm ● 0.0053 20.5 % ● no sunny cool high true overall ● 3/5 1/5 4/5 3/5 5/14 ● Select the attribute with the biggest information gain to place at the root node. Make one branch for each possible value. Build the subtrees. Information required to specify the class – – – 0.0206 79.5 % ● example: Naïve Bayes 18 when a branch is empty: zero when the branches are equal: a maximum f(a, b, c) = f(a, b + c) + g(b, c) ∑ pi = 1 Entropy: e( p1, p2 ,K, pn ) = − p1 log p1 − p2 log p2 − K − pn log pn Petr Olmer: Data Mining outlook sunny 2:3 overcast 4:0 outlook rainy 3:2 sunny humidity temp. hot 2:2 overcast mild 4:2 cool 3:1 high no humidity rainy windy yes normal yes false true yes no normal 6:1 high 3:4 windy false 6:2 true 3:3 example: ID3 iCSC 2005 23-25 February 2005, CERN example: ID3 Data Bases Theme Lecture 5 5