on Google Cloud Platform; I spend my time designing and documenting architectural patterns and solutions. Before that I worked for MongoDB. Before that...well a bunch of other places. I’ve been in Austin for about 12 years so I get to complain about everything. @crcsmnky
an incredibly detailed and thorough Machine Learning Service Comparison [2]. She covers: • Data sourcing • Data preprocessing • Model building • Model evaluation [1] https://blog.onliquid.com/author/isbalmeida/ [2] https://blog.onliquid.com/machine-learning-service-benchmark/
predicted rating (and probabilities of other labels) Upload dataset and run batch predictions One time, but need to generate dataset Repeat as-needed (daily? weekly?)
regression or multiclass classification and easy model evaluation. Azure has crazy robust Studio interface with lots of algorithms and power - including ability to use BYO Python or R code Google has a limited interface and everything is opaque but supports standard PMML for metadata and transformation. Missing quality UI and scoring/eval are all behind the scenes. Marketing would tout this as “simple”
would be useful for this application Azure’s algorithm and code support is top-notch. Generates code for C#, Python, and R for batch and streaming endpoints Google requires you to manually batch but you can update your trained model with more ratings come in, reducing re-training time
training dataset had userId and movieId - those aren’t nearly enough to predict using multiclass regression Training data should have included userId and movie metadata (like year, genres, title) to generate the best model
same thing, I might like other things you like” ...Or something like that Other tools can do this pretty easily with userId and movieId See http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html
== Garbage Out too Can’t just throw algorithms against the wall and see what sticks That’s not data science - your output and user experience will suffer
they performed Didn’t even talk about cost or pricing so YMMV Consider other approaches Not just about finding well-performing algorithms Must also consider what makes sense for your use case and/or application
a panacea Expertise and an understanding of the underlying analyses is critical to making this useful Be careful going down this road - make sure you’ve understood the data science problem before leveraging