Amazon Forecast and Personalize

Amazon Forecast and Personalize

A peek into the future with Amazon Forecast and Personalize. AWS DevDay DACH 2109.

Companies today use everything from simple spreadsheets to complex financial planning software to attempt to accurately forecast future business outcomes such as product demand, resource needs, or financial performance. These tools build forecasts by looking at a historical series of data, which is called time series data. For example, such tools may try to predict the future sales of a raincoat by looking only at its previous sales data with the underlying assumption that the future is determined by the past. This approach can struggle to produce accurate forecasts for large sets of data that have irregular trends. Also, it fails to easily combine data series that change over time (such as price, discounts, web traffic, and number of employees) with relevant independent variables like product features and store locations.
Based on the same technology used at, Amazon Forecast uses machine learning to combine time series data with additional variables to build forecasts. Amazon Forecast requires no machine learning experience to get started. You only need to provide historical data, plus any additional data that you believe may impact your forecasts. For example, the demand for a particular color of a shirt may change with the seasons and store location. This complex relationship is hard to determine on its own, but machine learning is ideally suited to recognize it. Once you provide your data, Amazon Forecast will automatically examine it, identify what is meaningful, and produce a forecasting model capable of making predictions that are up to 50% more accurate than looking at time series data alone.


Frank Munz

October 24, 2019


  1. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Solve complex business problems with Amazon Personalize and Amazon Forecast Frank Munz Sr Technical Evangelist AWS D A C H 2 0 1 9 @frankmunz
  2. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T About me • Software Architect / DevOps Engineer • Technical Evangelist @ AWS • Published an AWS book • Containers, serverless and a sprinkle of ML & big / fast data @frankmunz
  3. Customers often ask, “How can we tap into Amazon’s experience

    in machine learning?”
  4. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T M L F R A M E W O R K S & I N F R A S T R U C T U R E The Amazon ML stack: Broadest & deepest set of capabilities A I S E R V I C E S A m a z o n R e k o g n i t i o n I m a g e A m a z o n P o l l y A m a z o n T r a n s c r i b e A m a z o n T r a n s l a t e A m a z o n C o m p r e h e n d & C o m p r e h e n d M e d i c a l A m a z o n L e x A m a z o n R e k o g n i t i o n V i d e o Vision Speech Language Chatbots A m a z o n S a g e M a k e r B U I L D T R A I N A m a z o n F o r e c a s t Forecasting A m a z o n T e x t r a c t A m a z o n P e r s o n a l i z e Recommendations D E P L O Y Pre-built algorithms Data labeling (G r o u n d T r u t h ) One-click model training & tuning Optimization (N e o ) M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e A m a z o n E C 2 P 3 & P 3 d n A m a z o n E C 2 C 5 F P G A s A W S I o T G r e e n g r a s s A m a z o n E l a s t i c I n f e r e n c e Reinforcement learning Algorithms & models ( A W S M a r k e t p l a c e f o r M a c h i n e L e a r n i n g ) A W S I n f e r e n t i a Notebook Hosting One-click deployment & hosting Automatic scaling Virtual private cloud AWS PrivateLink Amazon Elastic Inference integration Hyper parameter optimization
  5. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Accurate time-series forecasting service, based on the same technology used at No ML experience required.
  6. Predicting future points in a time-series Product demand Workforce demand

    Financial metrics Inventory planning
  7. (Understanding) Accuracy is key in Forecasting Under forecasting leads to

    lost opportunity Over-forecasting leads to wasted resources
  8. Traditional time-series models Trend + Seasonality

  9. Never trust statistics alone; visualize your data.

  10. Traditional time-series models • Independent forecasts • Strong structural assumptions

    • De-facto industry standard • Well-understood, > 50 yrs. research • High data efficiency • Data must match the structural assumptions • Cannot identify patterns across time series
  11. Traditional methods struggle with real-world forecasting Don’t consider metadata Don’t

    consider external factors such as holidays and promotions Can’t handle time-series with no history
  12. A Real World Example The Visual Miscellaneum by David McCandless

  13. Using deep learning increases forecast accuracy

  14. Discovering shared patterns with deep learning

  15. Deep learning time-series models • Global models: identify patterns using

    all available time series • Group-dependent seasonality and lifecycle • Behavior in response to covariate inputs • Weak structural assumptions • Can be significantly more accurate than traditional methods • Can easily incorporate and learn from rich metadata • Support cold-start forecasts for new items
  16. Probabilistic forecasts • Quantification of uncertainty • Support optimal decision

    making • Make “wrong” forecasts useful • All Amazon Forecast algorithms support generating probabilistic forecasts • Forecasts can be obtained for different quantiles of the predictive distribution
  17. Learning with covariates • Additional inputs can • Explain historical

    data • Drive forecast behavior • Examples from retail demand forecasting • Price information • Information about promotions • Out-of-stock information • Web page views • Categorical inputs can be used to identify group-level patterns Fashion Women’s Clothing Shoes Watches Men’s Clothing Shoes Watches Girls' Clothing Shoes Watches Boys' Clothing Shoes Watches
  18. Amazon Forecast web traffic,

  19. Amazon Forecast workflow 1. Create related datasets and a dataset

    group 2. Get training data • Import historical data to the dataset group 3. Train a predictor (trained model with HPO) using an algorithm or AutoML 4. Evaluate the predictor version using metrics 5. Create a forecast (for every item in the dataset group) 6. Retrieve forecasts for users
  20. Comparision: AWS Rekognition with builtin Model

  21. Algorithms Algorithm What ARIMA Autoregressive integrated moving average (ARIMA) is

    a commonly used local statistical algorithm for time-series forecasting DeepAR+ A supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNNs); supports hyperparameter optimization (HPO) ETS Exponential smoothing (ETS) is a commonly used local statistical algorithm for time-series forecasting NPTS Non-parametric time series (NPTS) is a scalable, probabilistic baseline forecaster algorithm; NPTS is especially useful when the time series is intermittent (or sparse, containing many 0s) and bursty Prophet A popular local Bayesian structural time series model
  22. TARGET_TIME_SERIES dataset timestamp item_id store demand 2019-01-01 socks NYC 25

    2019-01-05 socks SFO 45 2019-02-01 shoes ORD 10 … 2019-06-01 socks NYC 100 2019-06-05 socks SFO 5 2019-07-01 shoes ORD 50
  23. Data alignment Data is automatically aggregated by forecast frequency, for

    example, hourly, daily, or weekly.
  24. Training & testing

  25. Applicable across multiple different domains

  26. Predictor metrics: Quantiles

  27. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved.
  28. Getting a forecast: Interpreting P-numbers

  29. Amazon Forecast examples & notebooks

  30. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Real-time personalization and recommendation service, based on the same technology used at No ML experience required.
  31. Personalizing user experience is proven to increase discoverability, engagement, user

    satisfaction, and revenue 30% of page views on Amazon are from recommendations … However, most customers find personalization hard to get right
  32. Effective personalization requires solving multiple hard problems Reacting to user

    interactions in real time Avoiding mostly showing popular items Handling cold start (insufficient data about new users/items) Scale
  33. Traditional recommender systems aren’t adequate • Rule-based systems perform poorly,

    don’t scale and are hard to maintain • Collaborative filtering and matrix factorization methods are good for v1, but deep neural networks, esp. recurrent neural networks, that take into account the sequence of a user’s activity (clicks) out-perform other methods
  34. State of the Art Performance 0,954 0,928 0,925 0,922 0,91

    0,856 Rolling Average T-SVD [2009] PMF [2008] RRN [2017] DeepRec [2017] HRNN Ratings RMSE on Netflix 98 MM interactions, 500k users, 18k items Rolling Average T-SVD [2009] PMF [2008] RRN [2017] DeepRec [2017] HRNN 0,933 0,916 0,871 0,857 0,846 Rolling Average FM [2012] I-AutoRec [2015] RNN HRNN Ratings RMSE on MovieLens 20 MM interactions, 173k users, 131k items Rolling Average FM [2012] I-AutoRec [2015] RNN
  35. Common applications & use cases Personalized recommendations Search reranking Notifications

    and emails Related Items
  36. Real-time data can be consumed by Amazon Personalize Historical user

    activity User attributes Item catalog Real-time data Mobile SDKs (coming soon) JavaScript SDK Amazon S3 bucket Server-Side SDKs Offline data Real-time data can be consumed by Amazon Personalize Historical user activity User attributes Item catalog Real-time data Mobile SDKs (coming soon) JavaScript SDK Amazon S3 bucket Server-Side SDKs Offline data
  37. Recurrent Neural Networks (RNN) RNN

  38. Modeling for personalization

  39. HRNN - Modeling sessions Learned user representation Hierarchical recurrent network

    User representation
  40. Hierarchical recurrent neural networks (HRNNs) Personalizing Session-based Recommendations with Hierarchical

    Recurrent Neural Networks Massimo Quadrana Politecnico di Milano, Milan, Italy Alexandros Karatzoglou Telefonica Research, Barcelona, Spain Balázs Hidasi Gravity R&D, Budapest, Hungary Paolo Cremonesi Politecnico di Milano, Milan, Italy ABSTRACT Session-based recommendations are highly relevant in many mod- ern on-line services (e.g. e-commerce, video streaming) and rec- ommendation settings. Recently, Recurrent Neural Networks have been shown to perform very well in session-based settings. While in many session-based recommendation domains user identi￿ers are hard to come by, there are also domains in which user pro￿les are readily available. We propose a seamless way to personalize RNN models with cross-session information transfer and devise a Hierarchical RNN model that relays end evolves latent hidden states of the RNNs across user sessions. Results on two industry datasets show large improvements over the session-only RNNs. CCS CONCEPTS • Information systems → Recommender systems; • Comput- ing methodologies → Neural networks; KEYWORDS recurrent neural networks; personalization; session-based recom- mendation; session-aware recommendation 1 INTRODUCTION In many online systems where recommendations are applied, inter- actions between a user and the system are organized into sessions. A session is a group of interactions that take place within a given time frame. Sessions from a user can occur on the same day, or over several days, weeks, or months. A session usually has a goal, such as ￿nding a good restaurant in a city, or listening to music of a certain style or mood. Providing recommendations in these domains poses unique chal- lenges that until recently have been mainly tackled by applying conventional recommender algorithms [10] on either the last inter- action or the last session (session-based recommenders). Recurrent Neural Networks (RNNs) have been recently used for the purpose of session-based recommendations [7] outperforming item-based Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro￿t or commercial advantage and that copies bear this notice and the full citation on the ￿rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci￿c permission and/or a fee. Request permissions from RecSys’17, August 27–31, 2017, Como, Italy. © 2017 ACM. 978-1-4503-4652-8/17/08...$15.00 DOI: methods by 15% to 30% in terms of ranking metrics. In session-based recommenders, recommendations are provided based solely on the interactions in the current user session, as user are assumed to be anonymous. But in many of these systems there are cases where a user might be logged-in (e.g. music streaming services) or some form of user identi￿er might be present (cookie or other identi￿er). In these cases it is reasonable to assume that the user behavior in past sessions might provide valuable information for providing recommendations in the next session. A simple way of incorporating past user session information in session-based algorithm would be to simply concatenate past and current user sessions. While this seems like a reasonable approach, we will see in the experimental section that this does not yield the best results. In this work we describe a novel algorithm based on RNNs that can deal with both cases: (i) session-aware recommenders, when user identi￿ers are present and propagate information from the previ- ous user session to the next, thus improving the recommendation accuracy, and (ii) session-based recommenders, when there are no past sessions (i.e., no user identi￿ers). The algorithm is based on a Hierarchical RNN where the hidden state of a lower-level RNN at the end of one user session is passed as an input to a higher-level RNN which aims at predicting a good initialization (i.e., a good context vector) for the hidden state of the lower RNN for the next session of the user. We evaluate the Hierarchical RNNs on two datasets from in- dustry comparing them to the plain session-based RNN and to item-based collaborative ￿ltering. Hierarchical RNNs outperform both alternatives by a healthy margin. 2 RELATED WORK Session-based recommendations. Classical CF methods (e.g. ma- trix factorization) break down in the session-based setting when no user pro￿le can be constructed from past user behavior. A natural solution to this problem is the item-to-item recommendation ap- proach [11, 16]. In this setting an item-to-item similarity matrix is precomputed from the available session data, items that are often clicked together in sessions are deemed to be similar. These similar- ities are then used to create recommendations. While simple, this method has been proven to be e￿ective and is widely employed. Though, these methods only take into account the last click of the user, in e￿ect ignoring the information of the previous clicks. arXiv:1706.04148v5 [cs.LG] 23 Aug 2017 s 1 s 2 i 2,4 i 1,3 c 2 c 0 c 1 user representation propagation i 2,3 i 2,1 i 2,2 prediction i 2,5 i 2,4 i 2,2 i 2,3 input item id i 1,4 i 1,2 i 1,3 user-level representation session-level representation session initialization i 1,1 i 1,2 s 1,0 Figure 1: Graphical representation of the proposed Hierarchical RNN model for personalized session-based recommendation. The model is composed of an hierarchy of two GRUs, the session-level GRU (GRUses ) and the user-level GRU (GRUusr ). The session-level GRU models the user activity within sessions and generates recommendations. The user-level GRU models the evolution of the user across sessions and provides personalization capabilities to the session-level GRU by initializing its hidden state and, optionally, by propagating the user representation in input. way, the user-level GRU can track the evolution of the user across sessions and, in turn, model the dynamics user interests seamlessly. Notice that the user-level representation is kept ￿xed throughout the session and it is updated only when the session ends. The user-level representation is then used to initialize the hidden state of the session-level GRU. Given cm, the initial hidden state sm+1,0 of the session-level GRU for the following session is set to sm+1,0 = tanh (Winitcm + binit ) (4) where Winit and binit are the initialization weights and biases respectively. In this way, the information relative to the preferences expressed by the user in the previous sessions is transferred to the session-level. Session-level representations are then updated as follows training) how user sessions evolve during time. We will see in the experimental section that this is crucial in achieving increased per- formance. In e￿ectGRUusr computes and evolves a user pro￿le that is based on the previous user sessions, thus in e￿ect personalizing the GRUses . In the original RNN, users who had clicked/interacted with the same sequence of items in a session would get the same recommendations; in HRNN this is not anymore the case, recom- mendations will be in￿uenced by the the users past sessions as well. In summary, we considered the following two di￿erent HRNN settings, depending on whether the user representation cm is con- sidered in Equation 5: • HRNN Init, in which cm is used only to initialize the repre- sentation of the next session.
  41. Large Parameter Space ‒ HPO and AutoML SIMS DeepFM HRNN

    Automatic within algorithm parameter tuning (HPO) Automatic algorithm selection (AutoML) (time decay) (depth, size) (depth, size, height)
  42. Recommendation: get_personalized_ranking() userId and inputList from data you used to

    train the solution Ranked results, first item matters most
  43. Amazon Personalize & Customer Success Story

  44. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved.
  45. Amazon Personalize examples & notebooks

  46. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved.
  47. Takeaways • Forecasting and personalization can help improve your business

    efficiency • Amazon Forecast provides accurate time-series forecasting • Amazon Personalize provides real-time personalization and recommendations • They are both based on the same technology used at and don’t require machine learning expertise to be used
  48. References

  49. Links • Blogs • • • recommendation-for-everyone/

    • • Examples & Notebooks • •
  50. Links • Training algorithms • DeepAR – • HRNN

    – • Evaluating performance of a trained model • (MAPE) • • •
  51. Thank you! © 2019, Amazon Web Services, Inc. or its

    affiliates. All rights reserved. Frank Munz Sr Technical Evangelist Twitter: @frankmunz Blog: