Richard Schneeman
May 08, 2017
2.6k

# Bayes is BAE

Before programming, before formal probability there was Bayes. He introduced the notion that multiple uncertain estimates which are related could be combined to form a more certain estimate. It turns out that this extremely simple idea has a profound impact on how we write programs and how we can think about life. The applications range from machine learning and robotics to determining cancer treatments. In this talk we'll take an in depth look at Bayses rule and how it can be applied to solve problems in programming and beyond.

May 08, 2017

## Transcript

1. WELCOME

2. Bayes
is BAE

3. Introducing
our
Protagonist

4. Divine Benevolence, or an
Attempt to Prove That the
Principal End of the Divine
Providence and
Government is the
Happiness of His Creatures

5. &

6. An Introduction to the
Doctrine of Fluxions,
and a Defence of the
Mathematicians Against
the Objections of the
Author of The Analyst

7. Harry
Potter & the
Sorcerer’s Stone

8. Why do we care?

9. 1720

10. 1720s

11. 1720

12. No

13. Machine
learning

14. Artificial
Intelligence

15. They
Call me
@Schneems

16. Maintain
Sprockets

17. Georgia
Tech
Online
Masters

18. Georgia
Tech
Online
Masters

19. Automatic
Certificate
Management

20. SSL

21. Heroku
CI

22. Review
Apps

23. Self Promotion

24. But wait
Schneems,
what can we
do?

representatives

26. But wait
Schneems,
what can we
do more?

27. degerrymander
texas
.org

28. Un-Patriotic
Un-Texan

29. Back to Bayes

30. Artificial
Intelligence

31. Low
Information
state

32. Predict

33. Measure

34. Measure +
Predict

35. Convolution

36. Kalman
Filter

37. Do you like
money?

38. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

39. Probability
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

40. Probability
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

41. Probability
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

42. probability of
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
P(B) =

43. probability of
P(B) =
H H H
T

44. P(B) =
H H H
T
probability of
P(B) =
0.5 * 0.5 + 0.5 * 1
0.75
P(B) =

45. 0.75
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
P(B) =
0.75

46. P(A) =
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
0.75
probability of
\$3.7 million

47. probability of
\$3.7 million
\$\$\$ Nope
P(A) =

48. \$\$\$ Nope
0.5
probability of
\$3.7 million
P(A) =
P(A) =

49. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
0.50
0.75
0.50
P(A) =

50. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
P(B ∣ A) =
0.75
0.50
probability of
\$3.7

51. probability of
\$3.7
H T
P(B ∣ A) =

52. P(B ∣ A) = 0.5
P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
0.75
0.5 * 0.5

P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
0.75
0.5 * 0.5
P(A ∣ B) =
1
3
= 0.3333

54. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

55. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

Art of the Problem

57. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

Rule

59. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

60. P(Ai
∣ B) =
P(B ∣ Ai
) P(Ai
)

j
P(B ∣ Aj
) P(Aj
)

61. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
P(Ai
∣ B) =
P(B ∣ Ai
) P(Ai
)

j
P(B ∣ Aj
) P(Aj
)

62. Total Probability

63. \$3.7
mil
\$0

64. \$3.7
mil
\$0

65. \$3.7
mil
\$0

66. \$3.7
mil
\$0
Tails

67. \$3.7
mil
\$0
Tails

68. \$3.7
mil
\$0
Tails

69. P(Hea d s) = P(Hea d s ∣ \$\$\$)P(\$\$\$) +
P(Hea d s ∣ \$0)P(\$0)
\$3.7
mil
\$0
Tails

70. \$3.7
mil
\$0
P(Hea d s) = P(Hea d s ∣ \$\$\$)P(\$\$\$) +
P(Hea d s ∣ \$0)P(\$0)
Tails

71. P(B) =

j
P(B ∣ Aj
) P(Aj
)
Total Probability

72. P(B) =
H H H
T
probability of
P(B) =
0.5 * 0.5 + 0.5 * 1
0.75
P(B) =

73. P(B) =

j
P(B ∣ Aj
) P(Aj
)
P(Hea d s) = P(Hea d s ∣ \$\$\$)P(\$\$\$) +
P(Hea d s ∣ \$0)P(\$0)
Total Probability

74. Let’s make it
tougher

75. P(Ai
∣ B) =
P(B ∣ Ai
) P(Ai
)

j
P(B ∣ Aj
) P(Aj
)

76. P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)

77. P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)
P(HH ∣ Coini
) = 0.5 * 0.5

78. P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)
0.5
P(Coini
) =

79. P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)

j
P(B ∣ Aj
) P(Aj
) = P(HH ∣ \$\$\$)P(\$\$\$) + P(HH ∣ \$0)P(\$0)

j
P(B ∣ Aj
) P(Aj
) = 0.25(0.5) + 1.0(0.5)

80. P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)

j
P(B ∣ Aj
) P(Aj
) = P(HH ∣ \$\$\$)P(\$\$\$) + P(HH ∣ \$0)P(\$0)

j
P(B ∣ Aj
) P(Aj
) = 0.25(0.5) + 1.0(0.5)

81. P(Coin\$\$\$
∣ HH ) =
0.25(0.5)
0.625
=
1
5
= 0.2
P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)

82. P(Coin\$\$\$
∣ HH ) =
0.25(0.5)
0.625
=
1
5
= 0.2
P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)

83. P(Coini
∣ HH ) =
0.25(0.5)
0.625
=
1
5
= 0.2
P(Coini
∣ HH ) =
P(HH ∣ Coini
) P(Coini
)

j
P(HH ∣ Coinj
) P(Coinj
)

84. Who is ready for a
break?

85. Lets take a break
from math

86. With more math

87. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)

88. P(A ∣ B) =
P(B ∣ A) P(A)
P(B)
P(Ai
∣ B) =
P(B ∣ Ai
)
P(B)
P(Ai
)

89. P(Ai
∣ B) =
P(B ∣ Ai
)
P(B)
P(Ai
)
Prior

90. P(Ai
∣ B) =
P(B ∣ Ai
)
P(B)
P(Ai
)
Posterior

91. The kalman filter
is a recursive
bayes estimation

92. Prediction/
Prior

93. Measure/
Posterior

94. Simon D. Levy

95. alt it u decurrent time
= 0.75 alt it u deprevious time

96. alt it u decurrent time
= 0.75 alt it u deprevious time

97. a = rate_of_decent = 0.75
x = initial_position = 1000
r = measure_error = x * 0.20

98. x_guess = measure_array[0]
p = estimate_error = 1
x_guess_array = []

99. for k in range(10):
measure = measure_array[k]

100. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess

101. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a

102. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + gain * (measure - x_guess)

103. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + gain * (measure - x_guess)
Low Predict Error, low gain

104. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + 0 * (measure - x_guess)
Low Predict Error, low gain

105. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + 0 * (measure - x_guess)
Low Predict Error, low gain

106. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + 1 * (measure - x_guess)
High Predict Error, High gain

107. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + 1 * (measure - x_guess)
High Predict Error, High gain

108. Prediction less
certain

109. Prediction more
certain

110. for k in range(10):
measure = measure_array[k]
# Predict
x_guess = a * x_guess
p = a * p * a
# Update
gain = p / (p + r)
x_guess = x_guess + gain * (measure - x_guess)
p = (1 - g) * p

111. That’s it for
Kalman Filters

112. Bayes Rule

113. Two most
important
parts

114. Algorithms
to
Live
By

115. The Signal
and
the
Noise

116. Audio: Mozart
Requiem in D
minor

117. http://
bit.ly/
kalman-tutorial

118. http://
bit.ly/
kalman-notebook

119. Udacity &
Georgia Tech

120. BAE

121. BAE

122. BAE

123. BAE

124. Questions?

125. Questions?

126. Test Audio

127. Test Audio 2

128. Simon D. Levy

129. What is g?

130. Prediction

131. Measurement

132. Convolution

133. Prediction less
certain

134. Prediction more
certain

135. Prediction error
is not constant

136. What is g?

137. What is g?

138. Introducing r

139. Prediction +
Measurement

140. i.e.
Prediction +
Update

141. Prediction
Update

142. Prediction
Update

143. Prediction

144. Prediction

145. Prediction
Update

146. \$3.7
mil
\$0

147. \$3.7
mil
\$0