Slide 18
Slide 18 text
The NPFS Algorithm1
D
Dataset
Map
D1
D2
Dn A (Dn
, k)
A (D2
, k)
A (D1
, k)
X:,2
X:,1
X:,n
…
2
6
6
6
6
6
4
1 1 0 · · · 1 1
0 1 0 · · · 0 0
1 0 1 · · · 1 1
.
.
.
.
.
.
.
.
.
...
.
.
.
.
.
.
1 1 1 · · · 1 1
3
7
7
7
7
7
5
# features
# of runs
Reduce & Inference
X
i
Xj,i
⇣crit
!
!
if feature is relevant
j
X
Λ(Z) = P(T(Z)|H1
)
P(T(Z)|H0
)
H1
≷
H0
ζcrit
→
n
z
pz
1
(1 − p1
)n−z
n
z
pz
0
(1 − p0
)n−z
H1
≷
H0
ζcrit
α = P(T(Z) > ζcrit
|H0
)
1G. Ditzler, R. Polikar, and G. Rosen, “A bootstrap based Neyman-Pearson test for identifying variable
importance,” IEEE Transactions on Neural Networks and Learning Systems, 2014.
EESI Group Meeting (February 2015) An Introduction to MapReduce