230

# Typical Sets: What They Are and How to (Hopefully) Find Them

Although typical sets are important in understanding how/why sampling algorithms (do not) work, they are rarely taught when most astronomers are introduced to sampling methods such as Markov Chain Monte Carlo (MCMC). I introduce the idea of typical sets using some basic examples and show why they make sampling difficult in higher dimensions. I then outline how their behavior shapes various MCMC algorithms such as (Adaptive) Metropolis-Hastings, ensemble sampling, and Hamiltonian Monte Carlo. See https://github.com/joshspeagle/typical_sets for additional resources. ## Josh Speagle

September 20, 2017

## Transcript

1. Typical Sets:
What They Are and How
to (Hopefully) Find Them
Josh Speagle
[email protected]
Based on this talk by Michael Betancourt at StanCon.

2. Intended Audience
• Some experience with the basics of Bayesian statistics.

3. Intended Audience
• Some experience with the basics of Bayesian statistics.
• Some experience using MCMC for research.

4. Intended Audience
• Some experience with the basics of Bayesian statistics.
• Some experience using MCMC for research.
• Have heard of ensemble sampling methods such as
emcee.

5. Bayesian Inference

6. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem

7. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Parameters

8. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Data
Parameters

9. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Data
Parameters
Model

10. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem

11. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Prior

12. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Prior
Likelihood

13. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Prior
Likelihood
Posterior

14. Bayesian Inference
Pr , M =
Pr , M Pr |M
Pr M
Bayes’ Theorem
Prior
Likelihood
Posterior
Evidence

15. Bayesian Inference
=

Bayes’ Theorem

Ω

Posterior
Likelihood Prior
Evidence

16. Bayesian Inference
=

Bayes’ Theorem
Posterior
Likelihood Prior
Evidence ≡
Ω

17. Where is the posterior?

Ω

18. Where is the posterior?

{: =}

19. Where is the posterior?

0

20. Where is the posterior?

0

=

21. Where is the posterior?

0

“Amplitude”
“Volume”

=

22. =

Where is the posterior?

0

“Typical Set”

23. Typical Sets: Gaussian Example

24. Typical Sets: Gaussian Example

0

2
2

25. Typical Sets: Gaussian Example

0

2
2 ∝
0

2
2 −1

26. Typical Distance
Typical Sets: Gaussian Example

27. =

Where is the posterior?

0

“Typical Set”

28. =

Where is the posterior?

0

“Typical Set”

29. =

Where is the posterior?

0

“Typical Set”

30. =

Where is the posterior?

0

“Typical Set”
MCMC wants to draw
samples from this “shell”

31. Tension in the Metropolis Update
′ = min 1,

32. Tension in the Metropolis Update
′ = min 1,

Proposal

33. Tension in the Metropolis Update
′ = min 1,

“Volume”

34. Tension in the Metropolis Update
′ = min 1,

“Volume”
“Amplitude”

35. Metropolis-Hastings

36. Metropolis-Hastings
′ = Normal ′ = , =

37. Metropolis-Hastings ′ = Normal ′ = , =
Typical Distance

38. Metropolis-Hastings ′ = Normal ′ = , =

Typical Distance

39. Metropolis-Hastings ′ = Normal ′ = , =

40. Metropolis-Hastings ′ = Normal ′ = , =

41. Ideal
Metropolis-Hastings ′ = Normal ′ = , =
Typical Separation

42. Ideal
Metropolis-Hastings ′ = Normal ′ = , =
Typical Separation
M-H

43. Ideal
Metropolis-Hastings ′ = Normal ′ = , = s
Typical Separation
M-H

44. Ensemble Sampling

45. Ensemble Sampling

46. Ensemble Sampling

47. Ensemble Sampling

48. Ensemble Sampling

49. Ensemble Sampling

50. emcee
′ = min 1,

−1
~ =
1

from
1

,
0 otherwise
“Stretch” factor

51. Ideal
Typical Separation
emcee
M-H

52. Ideal
Typical Separation
emcee
M-H
emcee

53. Ideal
Typical Separation
emcee
M-H
emcee

54. Ideal
Typical Separation
emcee
M-H
emcee
After weighting by
acceptance probability

55. emcee
′ = min 1,

−1
~ =
1

from
1

,
0 otherwise
“Stretch” factor

56. emcee
′ = min 1,

−1
~ =
1

from
1

,
0 otherwise
“Stretch” factor

57. Summary
• Volume scales as .
• The posterior density depends on both volume and
amplitude.
• Most of the posterior is concentrated in a “shell”
around the best solution called the typical set.
• MCMC draws samples from the typical set.

58. But what about corner plots?

59. But what about corner plots?
2-dimensional projection
of D-dimensional shell

60. But what about corner plots?
2-dimensional projection
of D-dimensional shell

61. But what about corner plots?
2-dimensional projection
of D-dimensional shell

62. Hamiltonian Monte Carlo

63. Hamiltonian Monte Carlo

64. Hamiltonian Monte Carlo

65. Hamiltonian Monte Carlo
Treat the particle at position q as a point mass
with mass matrix M and momentum p.
Pr , ∝ , = −
−1
2
Hamiltonian

66. Hamiltonian Monte Carlo
Pr , ∝ , = −
−1
2
Treat the particle at position q as a point mass
with mass matrix M and momentum p.

=
= −1

= −
=
ln
Hamiltonian
Hamilton’s Equations

67. Hamiltonian Monte Carlo
′, −′ , = min 1,
Pr ′, −′
Pr ,
∼ Normal = , =

68. Typical Distance
Hamiltonian Monte Carlo

∼ Normal = , =

69. Typical Distance
Hamiltonian Monte Carlo

∼ Normal = , =

70. Ideal
Typical Separation
M-H
emcee
Hamiltonian Monte Carlo ∼ Normal = , =

71. Ideal
Typical Separation
M-H
emcee
Hamiltonian Monte Carlo ∼ Normal = , =
HMC