Getting Started with Vowpal Wabbit
Ike Okonkwo (@ikeondata)
Data Scientist @MerchantAtlas
1.19.15
SF Vowpal Wabbit Meetup
Slide 2
Slide 2 text
Agenda
• Installation
• Background
• Demo
• Other features
• References
Slide 3
Slide 3 text
About Me
• Data Scientist
• Merchant Atlas (enterprise digital sales automation using machine
learning)
• Organizer - SF Vowpal Wabbit Meetup
• Background
• Physics / Electrical Engineering
• Industrial & Systems Engineering
Slide 4
Slide 4 text
Installation
• Local Install : https://www.github.com/JohnLangford/vowpal_wabbit
• Docker Image : bradleypallen/ml-dev
• On OSX : http://yet-another-data-blog.blogspot.com/2014/08/getting-started-
with-vowpal-wabbit-part.html
• On Windows : http://mlwave.com/install-vowpal-wabbit-on-windows-and-cygwin/
Slide 5
Slide 5 text
Background
• John Langford - Yahoo Research / MS Research
• Fast out-of-core Scalable ML
• Can learn on Terafeature datasets (10^12)
• Supports Online Learning / Feature Hashing
• Learning Reductions
• Cloud Deployment via Azure ML
• Progressive Validation , Linear Learning, Fixed Memory footprint
Terafeature Learning http://arxiv.org/pdf/1110.4198v3.pdf
Slide 6
Slide 6 text
Input Format
• Labels [-1,1] : binary, [1..n] : multi-class
• Weight : is a +ve number indicating importance of example over others.
Default :1
• Namespace is used for grouping features - string
• Features - string[:float]
[label][ Weight] | Namespace Feature ... Feature |Namespace
Feature ... Feature
https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format
Useful Command Line
Arguments
• -f <file_name> : save model
• -t : test mode
• -i : load predictor
• -p <file_name> : save predictions
• --passes : iterate over data n times
• --loss_function : loss function , default : squared loss
• --l1 ,--l2 : lasso and ridge regularization
• --oaa, --etc, --csoaa : multiclass classification