Simplifying time-series forecasting and real-time personalization

Slide 1

Slide 1 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplifying time-series forecasting and real-time personalization Danilo Poccia Principal Evangelist, Serverless Amazon Web Services @danilop S e s s i o n I D

Slide 15

Slide 15 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. DeepAR algorithm DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks David Salinas, Valentin Flunkert, Jan Gasthaus Amazon Research Germany Abstract Probabilistic forecasting, i.e. estimating the probability distribution of a time series’ future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place. In this paper we propose DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto-regressive recurrent network model on a large number of related time series. We demonstrate how by applying deep learning techniques to forecasting, one can overcome many of the challenges faced by widely-used classical approaches to the problem. We show through extensive empirical evaluation on several real-world forecasting data sets accuracy improvements of around 15% compared to state-of-the-art methods. 1 Introduction Forecasting plays a key role in automating and optimizing operational processes in most businesses and enables data driven decision making. In retail for example, probabilistic forecasts of product supply and demand can be used for optimal inventory management, staff scheduling and topology planning [18], and are more generally a crucial technology for most aspects of supply chain opti- mization. The prevalent forecasting methods in use today have been developed in the setting of forecasting individual or small groups of time series. In this approach, model parameters for each given time series are independently estimated from past observations. The model is typically manually selected to account for different factors, such as autocorrelation structure, trend, seasonality, and other ex- planatory variables. The fitted model is then used to forecast the time series into the future according to the model dynamics, possibly admitting probabilistic forecasts through simulation or closed-form expressions for the predictive distributions. Many methods in this class are based on the classical Box-Jenkins methodology [3], exponential smoothing techniques, or state space models [11, 19]. In recent years, a new type of forecasting problem has become increasingly important in many appli- cations. Instead of needing to predict individual or a small number of time series, one is faced with forecasting thousands or millions of related time series. Examples include forecasting the energy consumption of individual households, forecasting the load for servers in a data center, or forecasting the demand for all products that a large retailer offers. In all these scenarios, a substantial amount of data on past behavior of similar, related time series can be leveraged for making a forecast for an individual time series. Using data from related time series not only allows fitting more complex (and hence potentially more accurate) models without overfitting, it can also alleviate the time and labor intensive manual feature engineering and model selection steps required by classical techniques. In this work we present DeepAR, a forecasting method based on autoregressive recurrent networks, which learns such a global model from historical data of all time series in the data set. Our method arXiv:1704.04110v3 [cs.AI] 22 Feb 2019 zi,t 2, xi,t 1 hi,t 1 `(zi,t 1 |✓i,t 1) zi,t 1 zi,t 1, xi,t hi,t `(zi,t |✓i,t) zi,t zi,t, xi,t+1 hi,t+1 `(zi,t+1 |✓i,t+1) zi,t+1 inputs network ˜ zi,t 2, xi,t 1 hi,t 1 `(zi,t 1 |✓i,t 1) ˜ zi,t 1 ˜ zi,t 1, xi,t hi,t `(zi,t |✓i,t) ˜ zi,t ˜ zi,t, xi,t+1 hi,t+1 `(zi,t+1 |✓i,t+1) ˜ zi,t+1 inputs network samples ˜ z ⇠ `(·|✓) Figure 2: Summary of the model. Training (left): At each time step t, the inputs to the network are the covariates xi,t , the target value at the previous time step zi,t 1 , as well as the previous network output hi,t 1 . The network output hi,t = h(hi,t 1, zi,t 1, xi,t, ⇥) is then used to compute the parameters ✓i,t = ✓(hi,t, ⇥) of the likelihood `(z|✓), which is used for training the model parameters. For prediction, the history of the time series zi,t is fed in for t < t0 , then in the prediction range (right) for t t0 a sample ˆ zi,t ⇠ `(·|✓i,t) is drawn and fed back for the next point until the end of the prediction range t = t0 + T generating one sample trace. Repeating this prediction process yields many traces representing the joint predicted distribution. often do not alleviate these conditions, forecasting methods have also incorporated more suitable likelihood functions, such as the zero-inflated Poisson distribution, the negative binomial distribution [20], a combination of both [4], or a tailored multi-stage likelihood [19]. Sharing information across time series can improve the forecast accuracy, but is difficult to accom- plish in practice, because of the often heterogeneous nature of the data. Matrix factorization methods (e.g. the recent work of Yu et al. [23]), as well as Bayesian methods that share information via hierarchical priors [4] have been proposed as mechanisms for learning across multiple related time series and leveraging hierarchical structure [13]. Neural networks have been investigated in the context of forecasting for a long time (see e.g. the numerous references in the survey [24], or [7] for more recent work considering LSTM cells). More recently, Kourentzes [17] applied neural networks specifically to intermittent data but ob- https://arxiv.org/abs/1704.04110

Slide 44

Slide 44 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hierarchical Recurrent Neural Networks Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks Massimo Quadrana Politecnico di Milano, Milan, Italy [email protected] Alexandros Karatzoglou Telefonica Research, Barcelona, Spain [email protected] Balázs Hidasi Gravity R&D, Budapest, Hungary [email protected] Paolo Cremonesi Politecnico di Milano, Milan, Italy [email protected] ABSTRACT Session-based recommendations are highly relevant in many mod- ern on-line services (e.g. e-commerce, video streaming) and recommendation settings. Recently, Recurrent Neural Networks have been shown to perform very well in session-based settings. While in many session-based recommendation domains user identiers are hard to come by, there are also domains in which user proles are readily available. We propose a seamless way to personalize RNN models with cross-session information transfer and devise a Hierarchical RNN model that relays end evolves latent hidden states of the RNNs across user sessions. Results on two industry datasets show large improvements over the session-only RNNs. CCS CONCEPTS • Information systems → Recommender systems; • Comput- ing methodologies → Neural networks; KEYWORDS recurrent neural networks; personalization; session-based recommendation; session-aware recommendation 1 INTRODUCTION In many online systems where recommendations are applied, interactions between a user and the system are organized into sessions. A session is a group of interactions that take place within a given time frame. Sessions from a user can occur on the same day, or over several days, weeks, or months. A session usually has a goal, such as nding a good restaurant in a city, or listening to music of a certain style or mood. Providing recommendations in these domains poses unique challenges that until recently have been mainly tackled by applying conventional recommender algorithms [10] on either the last inter- action or the last session (session-based recommenders). Recurrent Neural Networks (RNNs) have been recently used for the purpose of session-based recommendations [7] outperforming item-based Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. RecSys’17, August 27–31, 2017, Como, Italy. © 2017 ACM. 978-1-4503-4652-8/17/08...$15.00 DOI: http://dx.doi.org/10.1145/3109859.3109896 methods by 15% to 30% in terms of ranking metrics. In session-based recommenders, recommendations are provided based solely on the interactions in the current user session, as user are assumed to be anonymous. But in many of these systems there are cases where a user might be logged-in (e.g. music streaming services) or some form of user identier might be present (cookie or other identier). In these cases it is reasonable to assume that the user behavior in past sessions might provide valuable information for providing recommendations in the next session. A simple way of incorporating past user session information in session-based algorithm would be to simply concatenate past and current user sessions. While this seems like a reasonable approach, we will see in the experimental section that this does not yield the best results. In this work we describe a novel algorithm based on RNNs that can deal with both cases: (i) session-aware recommenders, when user identiers are present and propagate information from the previous user session to the next, thus improving the recommendation accuracy, and (ii) session-based recommenders, when there are no past sessions (i.e., no user identiers). The algorithm is based on a Hierarchical RNN where the hidden state of a lower-level RNN at the end of one user session is passed as an input to a higher-level RNN which aims at predicting a good initialization (i.e., a good context vector) for the hidden state of the lower RNN for the next session of the user. We evaluate the Hierarchical RNNs on two datasets from industry comparing them to the plain session-based RNN and to item-based collaborative ltering. Hierarchical RNNs outperform both alternatives by a healthy margin. 2 RELATED WORK Session-based recommendations. Classical CF methods (e.g. matrix factorization) break down in the session-based setting when no user prole can be constructed from past user behavior. A natural solution to this problem is the item-to-item recommendation approach [11, 16]. In this setting an item-to-item similarity matrix is precomputed from the available session data, items that are often clicked together in sessions are deemed to be similar. These similar- ities are then used to create recommendations. While simple, this method has been proven to be eective and is widely employed. Though, these methods only take into account the last click of the user, in eect ignoring the information of the previous clicks. arXiv:1706.04148v5 [cs.LG] 23 Aug 2017 s 1 s 2 i 2,4 i 1,3 c 2 c 0 c 1 user representation propagation i 2,3 i 2,1 i 2,2 prediction i 2,5 i 2,4 i 2,2 i 2,3 input item id i 1,4 i 1,2 i 1,3 user-level representation session-level representation session initialization i 1,1 i 1,2 s 1,0 Figure 1: Graphical representation of the proposed Hierarchical RNN model for personalized session-based recommendation. The model is composed of an hierarchy of two GRUs, the session-level GRU (GRUses ) and the user-level GRU (GRUusr ). The session-level GRU models the user activity within sessions and generates recommendations. The user-level GRU models the evolution of the user across sessions and provides personalization capabilities to the session-level GRU by initializing its hidden state and, optionally, by propagating the user representation in input. way, the user-level GRU can track the evolution of the user across sessions and, in turn, model the dynamics user interests seamlessly. Notice that the user-level representation is kept xed throughout the session and it is updated only when the session ends. The user-level representation is then used to initialize the hidden state of the session-level GRU. Given cm, the initial hidden state sm+1,0 of the session-level GRU for the following session is set to sm+1,0 = tanh (Winitcm + binit ) (4) where Winit and binit are the initialization weights and biases respectively. In this way, the information relative to the preferences expressed by the user in the previous sessions is transferred to the session-level. Session-level representations are then updated as follows training) how user sessions evolve during time. We will see in the experimental section that this is crucial in achieving increased per- formance. In eectGRUusr computes and evolves a user prole that is based on the previous user sessions, thus in eect personalizing the GRUses . In the original RNN, users who had clicked/interacted with the same sequence of items in a session would get the same recommendations; in HRNN this is not anymore the case, recommendations will be inuenced by the the users past sessions as well. In summary, we considered the following two dierent HRNN settings, depending on whether the user representation cm is considered in Equation 5: • HRNN Init, in which cm is used only to initialize the representation of the next session. https://arxiv.org/abs/1706.04148

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text