Pomegranate: Fast and Flexible Probabilistic Modeling in Python

Slide 1

Slide 1 text

fast and flexible probabilistic modelling in python Jacob Schreiber Paul G. Allen School of Computer Science University of Washington jmschreiber91 @jmschrei @jmschreiber91

Slide 2

Slide 2 text

Acknowledgements 2

Slide 3

Slide 3 text

Overview pomegranate is more flexible than other packages, faster, is intuitive to use, and can do it all in parallel 3

Slide 4

Slide 4 text

Overview: this talk 4 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel

Slide 5

Slide 5 text

Overview: supported models Six Main Models: 1. Probability Distributions 2. General Mixture Models 3. Markov Chains 4. Hidden Markov Models 5. Bayes Classifiers / Naive Bayes 6. Bayesian Networks 5 Two Helper Models: 1. k-means++/kmeans|| 2. Factor Graphs

Slide 6

Slide 6 text

Overview: model stacking in pomegranate 6 Distributions Bayes Classifiers Markov Chains General Mixture Models Hidden Markov Models Bayesian Networks D BC MC GMM HMM BN

Slide 7

Slide 7 text

Overview: model stacking in pomegranate 7 Distributions Bayes Classifiers Markov Chains General Mixture Models Hidden Markov Models Bayesian Networks D BC MC GMM HMM BN

Slide 8

Slide 8 text

The API is common to all models 8 All models have these methods! All models composed of distributions (like GMM, HMM...) have these methods too! model.log_probability(X) / model.probability(X) model.sample() model.fit(X, weights, inertia) model.summarize(X, weights) model.from_summaries(inertia) model.predict(X) model.predict_proba(X) model.predict_log_proba(X) Model.from_samples(X, weights)

Slide 9

Slide 9 text

pomegranate supports many models 9 Univariate Distributions 1. UniformDistribution 2. BernoulliDistribution 3. NormalDistribution 4. LogNormalDistribution 5. ExponentialDistribution 6. BetaDistribution 7. GammaDistribution 8. DiscreteDistribution 9. PoissonDistribution Kernel Densities 1. GaussianKernelDensity 2. UniformKernelDensity 3. TriangleKernelDensity Multivariate Distributions 1. IndependentComponentsDistribution 2. MultivariateGaussianDistribution 3. DirichletDistribution 4. ConditionalProbabilityTable 5. JointProbabilityTable

Slide 10

Slide 10 text

10 mu, sig = 0, 2 a = NormalDistribution(mu, sig) X = [0, 1, 1, 2, 1.5, 6, 7, 8, 7] a = GaussianKernelDensity(X) Models can be created from known values

Slide 11

Slide 11 text

11 Models can be learned from data X = numpy.random.normal(0, 1, 100) a = NormalDistribution.from_samples(X)

Slide 12

Slide 12 text

12 pomegranate can be faster than numpy Fitting a Normal Distribution to 1,000 samples

Slide 13

Slide 13 text

13 pomegranate can be faster than numpy Fitting Multivariate Gaussian to 10,000,000 samples of 10 dimensions

Slide 14

Slide 14 text

14 pomegranate uses BLAS internally

Slide 15

Slide 15 text

15 pomegranate will soon have GPU support

Slide 16

Slide 16 text

16 pomegranate uses additive summarization pomegranate reduces data to sufficient statistics for updates and so only has to go datasets once (for all models). Here is an example of the Normal Distribution sufficient statistics

Slide 17

Slide 17 text

17 pomegranate supports out-of-core learning Batches from a dataset can be reduced to additive summary statistics, enabling exact updates from data that can’t fit in memory.

Slide 18

Slide 18 text

18 Parallelization exploits additive summaries Extract summaries + + + + New Parameters

Slide 19

Slide 19 text

19 pomegranate supports semisupervised learning Summary statistics from supervised models can be added to summary statistics from unsupervised models to train a single model on a mixture of labeled and unlabeled data.

Slide 20

Slide 20 text

20 pomegranate supports semisupervised learning Supervised Accuracy: 0.93 Semisupervised Accuracy: 0.96

Slide 21

Slide 21 text

21 pomegranate can be faster than scipy

Slide 22

Slide 22 text

22 pomegranate uses aggressive caching

Slide 23

Slide 23 text

Slide 24

Slide 24 text

24 Example ‘blast’ from Gossip Girl Spotted: Lonely Boy. Can't believe the love of his life has returned. If only she knew who he was. But everyone knows Serena. And everyone is talking. Wonder what Blair Waldorf thinks. Sure, they're BFF's, but we always thought Blair's boyfriend Nate had a thing for Serena.

Slide 25

Slide 25 text

25 Example ‘blast’ from Gossip Girl Why'd she leave? Why'd she return? Send me all the deets. And who am I? That's the secret I'll never tell. The only one. —XOXO. Gossip Girl.

Slide 26

Slide 26 text

26 How do we encode these ‘blasts’? Better lock it down with Nate, B. Clock's ticking. +1 Nate -1 Blair

Slide 27

Slide 27 text

27 How do we encode these ‘blasts’? This just in: S and B committing a crime of fashion. Who doesn't love a five-finger discount. Especially if it's the middle one. -1 Blair -1 Serena

Slide 28

Slide 28 text

28 Simple summations don’t work well

Slide 29

Slide 29 text

29 Beta distributions can model uncertainty

Slide 30

Slide 30 text

30 Beta distributions can model uncertainty

Slide 31

Slide 31 text

31 Beta distributions can model uncertainty

Slide 32

Slide 32 text

Overview: this talk 32 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel

Slide 33

Slide 33 text

GMMs can model complex distributions 33

Slide 34

Slide 34 text

GMMs can model complex distributions 34 model = GeneralMixtureModel.from_samples(NormalDistribution, 2, X)

Slide 35

Slide 35 text

GMMs can model complex distributions 35

Slide 36

Slide 36 text

An exponential distribution is not right 36 model = ExponentialDistribution.from_samples(X)

Slide 37

Slide 37 text

A mixture of exponentials is better 37 model = GeneralMixtureModel.from_samples(ExponentialDistribution, 2, X)

Slide 38

Slide 38 text

Heterogeneous mixtures natively supported 38 model = GeneralMixtureModel.from_samples([ExponentialDistribution, UniformDistribution], 2, X)

Slide 39

Slide 39 text

GMMs faster than sklearn 39

Slide 40

Slide 40 text

Overview: this talk 40 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel

Slide 41

Slide 41 text

CG enrichment detection HMM 41 GACTACGACTCGCGCTCGCACGTCGCTCGACATCATCGACA

Slide 42

Slide 42 text

CG enrichment detection HMM GACTACGACTCGCGCTCGCACGTCGCTCGACATCATCGACA 42

Slide 43

Slide 43 text

pomegranate HMMs are feature rich 43

Slide 44

Slide 44 text

GMM-HMM easy to define 44

Slide 45

Slide 45 text

HMMs are faster than hmmlearn 45

Slide 46

Slide 46 text

Overview: this talk 46 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel

Slide 47

Slide 47 text

Bayesian networks 47 Bayesian networks are powerful inference tools which define a dependency structure between variables. Sprinkler Wet Grass Rain

Slide 48

Slide 48 text

Bayesian networks 48 Sprinkler Wet Grass Rain Two main difficult tasks: (1) Inference given incomplete information (2) Learning the dependency structure from data

Slide 49

Slide 49 text

Bayesian network structure learning 49 ??? Three primary ways: ● “Search and score” / Exact ● “Constraint Learning” / PC ● Heuristics

Slide 50

Slide 50 text

Bayesian network structure learning 50 ??? pomegranate supports: ● “Search and score” / Exact ● “Constraint Learning” / PC ● Heuristics

Slide 51

Slide 51 text

Exact structure learning is intractable ??? 51

Slide 52

Slide 52 text

pomegranate supports four algorithms 52

Slide 53

Slide 53 text

Constraint graphs merge data + knowledge 53 BRCA 2 BRCA 1 LCT BLOAT LE LOA VOM AC PREG LI OC genetic conditions diseases symptoms

Slide 54

Slide 54 text

Constraint graphs merge data + knowledge 54 genetic conditions diseases symptoms

Slide 55

Slide 55 text

Modeling the global stock market 55

Slide 56

Slide 56 text

Constraint graph published in PeerJ CS 56

Slide 57

Slide 57 text

Overview: this talk 57 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel

Slide 58

Slide 58 text

Bayes classifiers rely on Bayes’ rule 58

Slide 59

Slide 59 text

Naive Bayes assumes independent features 59

Slide 60

Slide 60 text

Naive Bayes produces ellipsoid boundaries 60 model = NaiveBayes.from_samples(NormalDistribution, X, y)

Slide 61

Slide 61 text

Naive Bayes can be heterogenous 61

Slide 62

Slide 62 text

Data can fall under different distributions 62

Slide 63

Slide 63 text

Using appropriate distributions is better 63 model = NaiveBayes.from_samples(NormalDistribution, X_train, y_train) print "Gaussian Naive Bayes: ", (model.predict(X_test) == y_test).mean() clf = GaussianNB().fit(X_train, y_train) print "sklearn Gaussian Naive Bayes: ", (clf.predict(X_test) == y_test).mean() model = NaiveBayes.from_samples([NormalDistribution, LogNormalDistribution, ExponentialDistribution], X_train, y_train) print "Heterogeneous Naive Bayes: ", (model.predict(X_test) == y_test).mean() Gaussian Naive Bayes: 0.798 sklearn Gaussian Naive Bayes: 0.798 Heterogeneous Naive Bayes: 0.844

Slide 64

Slide 64 text

This additional flexibility is just as fast 64

Slide 65

Slide 65 text

Bayes classifiers don’t require independence 65 naive accuracy: 0.929 bayes classifier accuracy: 0.966

Slide 66

Slide 66 text

Gaussian mixture model Bayes classifier 66

Slide 67

Slide 67 text

Creating complex Bayes classifiers is easy 67 gmm_a = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X[y == 0]) gmm_b = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X[y == 1]) model_b = BayesClassifier([gmm_a, gmm_b], weights=numpy.array([1-y.mean(), y.mean()]))

Slide 68

Slide 68 text

Creating complex Bayes classifiers is easy 68 mc_a = MarkovChain.from_samples(X[y == 0]) mc_b = MarkovChain.from_samples(X[y == 1]) model_b = BayesClassifier([mc_a, mc_b], weights=numpy.array([1-y.mean(), y.mean()])) hmm_a = HiddenMarkovModel.from_samples(X[y == 0]) hmm_b = HiddenMarkovModel.from_samples(X[y == 1]) model_b = BayesClassifier([hmm_a, hmm_b], weights=numpy.array([1-y.mean(), y.mean()])) bn_a = BayesianNetwork.from_samples(X[y == 0]) bn_b = BayesianNetwork.from_samples(X[y == 1]) model_b = BayesClassifier([bn_a, bn_b], weights=numpy.array([1-y.mean(), y.mean()]))

Slide 69

Slide 69 text

Overview: this talk 69 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel

Slide 70

Slide 70 text

Training a mixture of HMMs in parallel 70 Creating a mixture of HMMs is just as simple as passing the HMMs into a GMM as if it were any other distribution

Slide 71

Slide 71 text

Training a mixture of HMMs in parallel 71 fit(model, X, n_jobs=n)

Slide 72

Slide 72 text

Overview pomegranate is more flexible than other packages, faster, is intuitive to use, and can do it all in parallel 72

Slide 73

Slide 73 text

Documentation available at Readthedocs 73

Slide 74

Slide 74 text

Tutorials available on github 74 https://github.com/jmschrei/pomegranate/tree/master/tutorials

Slide 75

Slide 75 text

Thank you for your time. 75