Getting Started with Vowpal Wabbit

Getting Started with Vowpal Wabbit Ike Okonkwo (@ikeondata) Data Scientist
@MerchantAtlas 1.19.15 SF Vowpal Wabbit Meetup

Agenda • Installation • Background • Demo • Other features
• References

About Me • Data Scientist • Merchant Atlas (enterprise digital
sales automation using machine learning) • Organizer - SF Vowpal Wabbit Meetup • Background • Physics / Electrical Engineering • Industrial & Systems Engineering

Installation • Local Install : https://www.github.com/JohnLangford/vowpal_wabbit • Docker Image :
bradleypallen/ml-dev • On OSX : http://yet-another-data-blog.blogspot.com/2014/08/getting-started- with-vowpal-wabbit-part.html • On Windows : http://mlwave.com/install-vowpal-wabbit-on-windows-and-cygwin/

Background • John Langford - Yahoo Research / MS Research
• Fast out-of-core Scalable ML • Can learn on Terafeature datasets (10^12) • Supports Online Learning / Feature Hashing • Learning Reductions • Cloud Deployment via Azure ML • Progressive Validation , Linear Learning, Fixed Memory footprint Terafeature Learning http://arxiv.org/pdf/1110.4198v3.pdf

Input Format • Labels [-1,1] : binary, [1..n] : multi-class
• Weight : is a +ve number indicating importance of example over others. Default :1 • Namespace is used for grouping features - string • Features - string[:ﬂoat] [label][ Weight] | Namespace Feature ... Feature |Namespace Feature ... Feature https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format

Wrappers / Input • Ingest : text, binary, compressed data,
io : ﬁle, pipe, tcp • python : pyvw, rosetta, wabbit_wappa, vowpal_porpoise • R : rvowpalwabbit

Useful Command Line Arguments • -f <file_name> : save model
• -t : test mode • -i : load predictor • -p <file_name> : save predictions • --passes <n> : iterate over data n times • --loss_function : loss function , default : squared loss • --l1 ,--l2 : lasso and ridge regularization • --oaa, --etc, --csoaa : multiclass classification

Demo • IRIS • 3 classes • 150 examples •
4 features

Demo • MNIST • 10 classes (0 -9) • 60000
examples • 784 features (28 x 28)

Demo • RCV1 (Reuters Corpora) • 2 classes ( CCAT
or not) • 781k examples (train) , 23k (test)

Other Features • Allreduce - Distributed Linear Learning • Contextual
Bandits • Matrix Factorization • Sequence Predictions • Topic Modeling / LDA • Variety of loss functions and optimizers • Utilities : perf, vw-varinfo, vw-hypersearch,vw-top-errors

References • FastML : http://fastml.com/ • MLWave : http://mlwave.com/ •
Kaggle Competition boards • John Langford : github.com/JohnLangford/vowpal_wabbit • Terafeature Linear Learning : http://arxiv.org/pdf/1110.4198v3.pdf • Docker image : https://registry.hub.docker.com/u/bradleypallen/ml-dev/ • NYU Large Scale Learning : http://cilvr.cs.nyu.edu/doku.php? id=courses:bigdata:start

Getting Started with Vowpal Wabbit

Getting Started with Vowpal Wabbit

Ike Okonkwo

More Decks by Ike Okonkwo

Other Decks in Technology

Featured

Transcript

Getting Started with Vowpal Wabbit Ike Okonkwo (@ikeondata) Data Scientist

Agenda • Installation • Background • Demo • Other features

About Me • Data Scientist • Merchant Atlas (enterprise digital

Installation • Local Install : https://www.github.com/JohnLangford/vowpal_wabbit • Docker Image :

Background • John Langford - Yahoo Research / MS Research

Input Format • Labels [-1,1] : binary, [1..n] : multi-class

Wrappers / Input • Ingest : text, binary, compressed data,

Useful Command Line Arguments • -f <ﬁle_name> : save model

Demo

Demo • IRIS • 3 classes • 150 examples •

Demo • MNIST • 10 classes (0 -9) • 60000

Demo • RCV1 (Reuters Corpora) • 2 classes ( CCAT

Other Features • Allreduce - Distributed Linear Learning • Contextual

References • FastML : http://fastml.com/ • MLWave : http://mlwave.com/ •