2017 - Rachel Thomas - Using Randomness to make code much faster

Using randomness to make code much faster Rachel Thomas fast.ai
& USF Data Institute @math_rachel

My Background Swarthmore: Math + CS Duke: Math PhD Quant
Uber Data Scientist & Engineer Hackbright Instructor fast.ai co-founder & USF researcher @math_rachel data science blog: fast.ai https://medium.com/@racheltho

Computers treat pictures and words as numbers Source: Adam Geitgey

If you think you’re not good at math… “A Mathematician’s
Lament”--Lockhart

Computational Linear Algebra • This talk is based on a
course I created for USF MS in Analytics students & released for FREE: github.com/fastai/numerical-linear-algebra How do we get computers to do matrix computations with acceptable speed and acceptable accuracy?

That’s random… • Random Data Structures • Bloom Filters •
HyperLogLog • Locality-Sensitive Hashing • Skip Lists • Count-min sketch • Min-hash • Random Algorithms • Karger's algorithm (minimum cut of a graph) • Randomized regression • Monte Carlo simulation • Randomized LU decomposition (Gaussian Elimination) • Randomized SVD

Word Embeddings • Words are represented as high-dimensional vectors Source:
Adrian Colyer Source: Rachel Thomas

Demo #1 The demos are available on github: https://github.com/fastai/randomized-SVD

Surveillance Video

One moment in time:  Too tall to fit on
the slide

The whole surveillance video:

This is the background:

These are the people walking around:

is the sum of these two: This matrix…

Matrix Decompositions • We use matrix multiplication and addition to
put matrices together. • We use matrix decompositions to take them apart!

Factorization • Multiplication: 2 * 2 * 3 * 3
* 2 * 2  144 • Factorization is the “opposite” of multiplication: 144  2 * 2 * 3 * 3 * 2 * 2 • Factorization is much harder than multiplication… (which is good, because it’s the heart of encryption).

Matrix Decomposition • Taking matrices apart • Harder than putting
them together • One application:

Parts of Faces Source: Essid & Ozerov

Term-Document Matrix source: Schütze & Lioma

Source: Essid & Ozerov Identify Topics for Documents

Corrupted Data Source: Jean Kossaifi, Robust Tensor PCA with TensorLy

Demo #2 The demos are available on github: https://github.com/fastai/randomized-SVD

SVD on a big matrix is slow  • For
our video matrix (19,200 x 6,000): 57 seconds • Let’s do SVD on a smaller matrix! 100 1,000 10,000 100 0.006 0.009 0.043 1000 0.004 0.259 0.992 10000 0.019 0.984 218.726

Punchline: big speedup!

Applications of SVD • Principal Component Analysis • Data compression
• Pseudo-inverse • Collaborative Filtering • Topic Modeling • Background Removal • Removing Corrupted Data

SVD is just 1 of many randomized algorithms • Random
Data Structures • Bloom Filters • HyperLogLog • Locality-Sensitive Hashing • Skip Lists • Count-min sketch • Min-hash • Random Algorithms • Karger's algorithm (minimum cut of a graph) • Randomized regression • Monte Carlo simulation • Randomized LU decomposition (Gaussian Elimination) • Randomized SVD

You often don’t need to be super accurate • When
your measurements aren’t that accurate • When you have a ton of data • In machine learning, less accuracy can help reduce overfitting • Run approximate algorithms multiple times to increases probability of correct answer • Depending on the application

Resources • Word Embeddings Workshop • Computational Linear Algebra

Questions? @math_rachel fast.ai

2017 - Rachel Thomas - Using Randomness to make...

2017 - Rachel Thomas - Using Randomness to make code much faster

PyBay

More Decks by PyBay

Other Decks in Programming

Featured

Transcript

Using randomness to make code much faster Rachel Thomas fast.ai

My Background Swarthmore: Math + CS Duke: Math PhD Quant

Computers treat pictures and words as numbers Source: Adam Geitgey

If you think you’re not good at math… “A Mathematician’s

Computational Linear Algebra • This talk is based on a

That’s random… • Random Data Structures • Bloom Filters •

Word Embeddings • Words are represented as high-dimensional vectors Source:

Demo #1 The demos are available on github: https://github.com/fastai/randomized-SVD

Surveillance Video

One moment in time:  Too tall to fit on

The whole surveillance video:

This is the background:

These are the people walking around:

is the sum of these two: This matrix…

Matrix Decompositions • We use matrix multiplication and addition to

Factorization • Multiplication: 2 * 2 * 3 * 3

Matrix Decomposition • Taking matrices apart • Harder than putting

Parts of Faces Source: Essid & Ozerov

Term-Document Matrix source: Schütze & Lioma

Source: Essid & Ozerov Identify Topics for Documents

Corrupted Data Source: Jean Kossaifi, Robust Tensor PCA with TensorLy

Demo #2 The demos are available on github: https://github.com/fastai/randomized-SVD

SVD on a big matrix is slow  • For

Punchline: big speedup!

Applications of SVD • Principal Component Analysis • Data compression

SVD is just 1 of many randomized algorithms • Random

You often don’t need to be super accurate • When

Resources • Word Embeddings Workshop • Computational Linear Algebra

Questions? @math_rachel fast.ai