AzureML - Zero to Hero

The presentation for SQLUG meetup.

Govind Kanshi

August 02, 2014

  1. AzureML - where experiments are done and deployed as web services
    AzureML studio has “toolbar” which has modules for data ingestion/transformation,
    statistics, machine learning. Some of them have properties which can be set.
    AzureML has Datasets which can be bought in at runtime or persisted inside. It has
    public datasets too.

  2. Classification algorithms can be measured by these metrics
    Regression have just RMSE which many people are questioning in present
    circumstances (Sum through all instances (actual class value - predicted one))
    Clustering has different mechanism and requires tests/re-runs to ensure
    grouped/clustered points have cohesion of somekind
    Types of classification errors often incur different costs.
    Total error = (FP+FN)/(TP+FP+TN+FN)
    Lift charts
    Sort instances by their predicted probability of being a true positive (TP).
    X axis is sample size and Y axis is number of true positives (TP).
    ROC curves (ROC means receiver operating characteristic, a term from signal
    X axis shows %of false positives (FP)
    Y axis shows %of true positives (TP).
    Recall - precision (IR world- search world has these terms too ):
    Precision (retrieved relevant / total retrieved) = TP / (TP+FP)
    Recall (retrieved relevant / total relevant) = TP / (TP + FN)

  3. Desirables
    Ipython like “executable” documented – DS – how to achieve in simple way
    Model interpretation
    More visualization
    Native Time series
    Text analysis – IR integration

