Download iCSC - Cern

Transcript
outlook
temp.
humidity
windy
play
sunny
hot
high
false
no
sunny
hot
high
true
no
overcast
hot
high
false
yes
rainy
mild
high
false
yes
rainy
cool
normal
false
yes
rainy
cool
normal
true
no
overcast
cool
normal
true
yes
sunny
mild
high
false
no
sunny
cool
normal
false
yes
rainy
mild
normal
false
yes
sunny
mild
normal
true
yes
overcast
mild
high
true
yes
overcast
hot
normal
false
yes
rainy
mild
high
true
no
sunny
cool
high
true
?
outlook
temp.
humidity
windy
play
sunny
hot
high
false
no
sunny
hot
high
true
no
overcast
hot
high
false
yes
rainy
mild
high
temp.
false
yes
rainy
cool
false
yes
rainy
overcast
normal
hot
0:2cool
mild
normal
1:1
cool
normal
sunny
mild humidity
high
sunny
cool
rainy
high
mild
0:3
sunny
mild
overcast
mild
overcast
false
hot
1:2
mild
rainy
cool
1:0true
true
no
yes
false
no
normal
normalfalse
yes
normal2:0 false
yes
normal
true
yes
true
yes
normal true false
yes
high 1:1 true
no
windy
high
yes
sunny
cool
high
true
overall
Data Management and Database Technologies
2/9
3/9
3/9
3/9
9/14
Decision trees
ID3: A recursive algorithm
●
0.0053
20.5 %
●
no
sunny
cool
high
true
overall
●
3/5
1/5
4/5
3/5
5/14
●
Select the attribute with the biggest information
gain to place at the root node.
Make one branch for each possible value.
Build the subtrees.
Information required to specify the class
–
–
–
0.0206
79.5 %
●
example: Naïve Bayes
18
when a branch is empty: zero
when the branches are equal: a maximum
f(a, b, c) = f(a, b + c) + g(b, c)
∑ pi = 1
Entropy:
e( p1, p2 ,K, pn ) = − p1 log p1 − p2 log p2 − K − pn log pn
Petr Olmer: Data Mining
outlook
sunny
2:3
overcast
4:0
outlook
rainy
3:2
sunny
humidity
temp.
hot
2:2
overcast
mild
4:2
cool
3:1
high
no
humidity
rainy
windy
yes
normal
yes
false
true
yes
no
normal
6:1
high
3:4
windy
false
6:2
true
3:3
example: ID3
iCSC 2005
23-25 February 2005, CERN
example: ID3
Data Bases Theme
Lecture 5
5