Download Open pdf - TIBCO Spotfire Analytics Server

Transcript
Working with the Data
Mi = number of rows in D where the A value = the A value in group i
Ni = number of rows in D where the B value = the B value in group i
The p-value for the group with index i can then be calculated as follows:
Pi = P(X ≥ Ki | R, Ni, Mi) = ∑ P(X = x | R, Ni, Mi); x = Ki, ..., min(Ni, Mi)
where X is a random variable with a hypergeometric distribution. In probability theory, this
distribution describes the number of successes in a sequence of a certain number of draws from
a finite population without replacement.
This means that the probability formula can be written as follows:
where
is the binomial coefficient of n and k.
Example:
Let us consider a data set D which contains information about the country of origin and the
number of cylinders for 18 different cars:
Model
Origin
Cylinders
VW 1131
EU
4
Saab 99
EU
4
Chevrolet Impala
USA
8
Pontiac Catalina
USA
8
Plymouth Fury
USA
8
Mercury Monarch
USA
6
Buick Century
USA
6
Audi 100
EU
4
Renault 12
EU
4
Mercedes 280
EU
6
Chevrolet Caprice
USA
8
Oldsmobile Cutlass
USA
8
Peugeot 604
EU
6
Pontiac Lemans
USA
6
343