2. Fitting a model to the data 3. Computing the performance of the model on unseen data Drawbacks: 1. Requires a lot of memory if dataset is huge 2. Can't elegantly learn from new data 3. Not easy to respond to changes of available features Some solutions: learn the data in chuck or mini-batches (Dask and Spark's MLlib.)
each observation. • Feature scaling (running statistics). • SGD (Stochastic Gradient Descent) to update weights. Pros • Model can be updated seamlessly. • Concept drift can be detected. • Quick response to recent actions. Cons • Performance might be not as good as batch learning. • Systems become more complex.
Learning: • Pipeline needs to be different. Preprocess Train Predict Preprocess Train Preprocess Train Preprocess Train Preprocess Train Preprocess and train for each observation
prediction. 3. Update a running average of the error. 4. Update the model. All samples can be used as a validation set. In some situation, leakage might happen.
of the trip, is only known once the taxi arrives at the desired destination. → Instead of updating the model immediately after making a prediction, update it once the ground truth is available Delayed progressive validation • https://www.kaggle.com/c/nyc-taxi-trip-duration • https://maxhalford.github.io/blog/online-learning-evaluation/
timestamp for the observation. 2. Specify delay one of str, int, timedelta, or callable. Example: impression (moment) → click or non-click (delayed) → Model will be updated only when the delay has been passed.