AzureML - where experiments are done and deployed as web services
AzureML studio has “toolbar” which has modules for data ingestion/transformation,
statistics, machine learning. Some of them have properties which can be set.
AzureML has Datasets which can be bought in at runtime or persisted inside. It has
public datasets too.
Classification algorithms can be measured by these metrics
Regression have just RMSE which many people are questioning in present
circumstances (Sum through all instances (actual class value - predicted one))
Clustering has different mechanism and requires tests/re-runs to ensure
grouped/clustered points have cohesion of somekind
Types of classification errors often incur different costs.
Total error = (FP+FN)/(TP+FP+TN+FN)
Sort instances by their predicted probability of being a true positive (TP).
X axis is sample size and Y axis is number of true positives (TP).
ROC curves (ROC means receiver operating characteristic, a term from signal
X axis shows %of false positives (FP)
Y axis shows %of true positives (TP).
Recall - precision (IR world- search world has these terms too ):
Precision (retrieved relevant / total retrieved) = TP / (TP+FP)
Recall (retrieved relevant / total relevant) = TP / (TP + FN)
Ipython like “executable” documented – DS – how to achieve in simple way
Native Time series
Text analysis – IR integration