Slide 1

Slide 1 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplifying time-series forecasting and real-time personalization Danilo Poccia Principal Evangelist, Serverless Amazon Web Services @danilop S e s s i o n I D

Slide 2

Slide 2 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Time-series forecasting Amazon Forecast – introduction & demo Real-time personalization & recommendation Amazon Personalize – introduction & demo Takeaways

Slide 3

Slide 3 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X R E K O G N I T I O N V I D E O Vision Speech Language Chatbots A M A Z O N S A G E M A K E R B U I L D T R A I N F O R E C A S T Forecasting T E X T R A C T Recommendations D E P L O Y Pre-built algorithms Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization (N E O ) M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 & P 3 d n E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C I N F E R E N C E Reinforcement learning Algorithms & models ( A W S M A R K E T P L A C E F O R M A C H I N E L E A R N I N G ) I N F E R E N T I A Notebook Hosting One-click deployment & hosting Auto-scaling Virtual Private Cloud Private Link Elastic Inference integration Hyper Parameter Optimization P E R S O N A L I Z E

Slide 4

Slide 4 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 5

Slide 5 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Forecasting Product Demand Planning Financial planning Resource planning

Slide 6

Slide 6 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Forecast

Slide 7

Slide 7 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Forecast workflow 1. Create related datasets and a dataset group 2. Get training data • Import historical data to the dataset group 3. Train a predictor (trained model) using an algorithm or AutoML 4. Evaluate the predictor version using metrics 5. Create a forecast (for every item in the dataset group) 6. Retrieve forecasts for users

Slide 8

Slide 8 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How Amazon Forecast works • Dataset Groups • Datasets • TARGET_TIME_SERIES – (item_id, timestamp, demand) – demand is required • RELATED_TIME_SERIES – (item_id, timestamp, price) – no demand • ITEM_METADATA – (item_id, color, location, genre, category, …) • Predictors • Forecasts

Slide 9

Slide 9 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dataset domains Domain For RETAIL retail demand forecasting INVENTORY_PLANNING supply chain and inventory planning EC2_CAPACITY forecasting Amazon EC2 capacity WORK_FORCE work force planning WEB_TRAFFIC estimating future web traffic METRICS forecasting metrics, such as revenue and cash flow CUSTOM all other types of time-series forecasting

Slide 10

Slide 10 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. TARGET_TIME_SERIES dataset timestamp item_id store demand 2019-01-01 socks NYC 25 2019-01-05 socks SFO 45 2019-02-01 shoes ORD 10 . . . 2019-06-01 socks NYC 100 2019-06-05 socks SFO 5 2019-07-01 shoes ORD 50

Slide 11

Slide 11 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "attributes": [ { "attributeName": "timestamp", "attributeType": "timestamp" }, { "attributeName": "item_id", "attributeType": "string" }, { "attributeName": "store", "attributeType": "string" }, { "attributeName": "demand", "attributeType": "float" } ] } Dataset schema "YYYY-MM-DD hh:mm:ss"

Slide 12

Slide 12 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data alignment Data is automatically aggregated by forecast frequency, for example, hourly, daily, or weekly.

Slide 13

Slide 13 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. RELATED_TIME_SERIES dataset timestamp item_id store price 2019-01-01 socks NYC 10 2019-01-02 socks NYC 10 2019-01-03 socks NYC 15 . . . 2019-01-05 socks SFO 45 2019-06-05 socks SFO 10 2019-07-11 socks SFO 30 . . . 2019-02-01 shoes ORD 50 2019-07-01 shoes ORD 75 2019-07-11 shoes ORD 60

Slide 14

Slide 14 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Algorithms Algorithm What ARIMA Autoregressive Integrated Moving Average (ARIMA) is a commonly-used local statistical algorithm for time-series forecasting. DeepAR+ a supervised learning algorithm for forecasting scalar (one- dimensional) time series using recurrent neural networks (RNNs). Supports hyperparameter optimization (HPO). ETS Exponential Smoothing (ETS) is a commonly-used local statistical algorithm for time-series forecasting NPTS Non-Parametric Time Series (NPTS) is a scalable, probabilistic baseline forecaster algorithm. NPTS is especially useful when the time series is intermittent (or sparse, containing many 0s) and bursty. Prophet A popular local Bayesian structural time series model.

Slide 15

Slide 15 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. DeepAR algorithm DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks David Salinas, Valentin Flunkert, Jan Gasthaus Amazon Research Germany Abstract Probabilistic forecasting, i.e. estimating the probability distribution of a time se- ries’ future given its past, is a key enabler for optimizing business processes. In retail businesses, for example, forecasting demand is crucial for having the right inventory available at the right time at the right place. In this paper we propose DeepAR, a methodology for producing accurate probabilistic forecasts, based on training an auto-regressive recurrent network model on a large number of related time series. We demonstrate how by applying deep learning techniques to fore- casting, one can overcome many of the challenges faced by widely-used classical approaches to the problem. We show through extensive empirical evaluation on several real-world forecasting data sets accuracy improvements of around 15% compared to state-of-the-art methods. 1 Introduction Forecasting plays a key role in automating and optimizing operational processes in most businesses and enables data driven decision making. In retail for example, probabilistic forecasts of product supply and demand can be used for optimal inventory management, staff scheduling and topology planning [18], and are more generally a crucial technology for most aspects of supply chain opti- mization. The prevalent forecasting methods in use today have been developed in the setting of forecasting individual or small groups of time series. In this approach, model parameters for each given time series are independently estimated from past observations. The model is typically manually selected to account for different factors, such as autocorrelation structure, trend, seasonality, and other ex- planatory variables. The fitted model is then used to forecast the time series into the future according to the model dynamics, possibly admitting probabilistic forecasts through simulation or closed-form expressions for the predictive distributions. Many methods in this class are based on the classical Box-Jenkins methodology [3], exponential smoothing techniques, or state space models [11, 19]. In recent years, a new type of forecasting problem has become increasingly important in many appli- cations. Instead of needing to predict individual or a small number of time series, one is faced with forecasting thousands or millions of related time series. Examples include forecasting the energy consumption of individual households, forecasting the load for servers in a data center, or forecast- ing the demand for all products that a large retailer offers. In all these scenarios, a substantial amount of data on past behavior of similar, related time series can be leveraged for making a forecast for an individual time series. Using data from related time series not only allows fitting more complex (and hence potentially more accurate) models without overfitting, it can also alleviate the time and labor intensive manual feature engineering and model selection steps required by classical techniques. In this work we present DeepAR, a forecasting method based on autoregressive recurrent networks, which learns such a global model from historical data of all time series in the data set. Our method arXiv:1704.04110v3 [cs.AI] 22 Feb 2019 zi,t 2, xi,t 1 hi,t 1 `(zi,t 1 |✓i,t 1) zi,t 1 zi,t 1, xi,t hi,t `(zi,t |✓i,t) zi,t zi,t, xi,t+1 hi,t+1 `(zi,t+1 |✓i,t+1) zi,t+1 inputs network ˜ zi,t 2, xi,t 1 hi,t 1 `(zi,t 1 |✓i,t 1) ˜ zi,t 1 ˜ zi,t 1, xi,t hi,t `(zi,t |✓i,t) ˜ zi,t ˜ zi,t, xi,t+1 hi,t+1 `(zi,t+1 |✓i,t+1) ˜ zi,t+1 inputs network samples ˜ z ⇠ `(·|✓) Figure 2: Summary of the model. Training (left): At each time step t, the inputs to the network are the covariates xi,t , the target value at the previous time step zi,t 1 , as well as the previous network output hi,t 1 . The network output hi,t = h(hi,t 1, zi,t 1, xi,t, ⇥) is then used to compute the parameters ✓i,t = ✓(hi,t, ⇥) of the likelihood `(z|✓), which is used for training the model parameters. For prediction, the history of the time series zi,t is fed in for t < t0 , then in the prediction range (right) for t t0 a sample ˆ zi,t ⇠ `(·|✓i,t) is drawn and fed back for the next point until the end of the prediction range t = t0 + T generating one sample trace. Repeating this prediction process yields many traces representing the joint predicted distribution. often do not alleviate these conditions, forecasting methods have also incorporated more suitable likelihood functions, such as the zero-inflated Poisson distribution, the negative binomial distribution [20], a combination of both [4], or a tailored multi-stage likelihood [19]. Sharing information across time series can improve the forecast accuracy, but is difficult to accom- plish in practice, because of the often heterogeneous nature of the data. Matrix factorization methods (e.g. the recent work of Yu et al. [23]), as well as Bayesian methods that share information via hi- erarchical priors [4] have been proposed as mechanisms for learning across multiple related time series and leveraging hierarchical structure [13]. Neural networks have been investigated in the context of forecasting for a long time (see e.g. the numerous references in the survey [24], or [7] for more recent work considering LSTM cells). More recently, Kourentzes [17] applied neural networks specifically to intermittent data but ob- https://arxiv.org/abs/1704.04110

Slide 16

Slide 16 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training using a BackTestWindow

Slide 17

Slide 17 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training & Testing

Slide 18

Slide 18 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predictor metrics wQuantileLoss[0.5] Mean Absolute Percentage Error Root Mean Square Error

Slide 19

Slide 19 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predictor metrics – Quantiles

Slide 20

Slide 20 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting a forecast – Interpreting P-numbers

Slide 21

Slide 21 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 22

Slide 22 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Forecast examples & notebooks https://github.com/aws-samples/amazon-forecast-samples

Slide 23

Slide 23 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 24

Slide 24 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Personalization & Recommendation Personalized recommendations Personalized search Personalized notifications

Slide 25

Slide 25 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Personalize

Slide 26

Slide 26 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Personalize workflow 1. Create related datasets and a dataset group 2. Get training data • Import historical data to the dataset group • Record live events to the dataset group 3. Create a solution version (trained model) using a recipe or AutoML 4. Evaluate the solution version using metrics 5. Create a campaign (deploy the solution version) 6. Provide recommendations for users

Slide 27

Slide 27 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How Amazon Personalize works • Dataset Groups • Datasets • Users – age, gender, or loyalty membership • Items – price, type, or availability • Interactions – between users and items • User Events • Recipes and Solutions • Metrics • Campaigns • Recommendations

Slide 28

Slide 28 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dataset schemas Dataset Type Required Fields Reserved Keywords Users USER_ID (string) 1 metadata field Items ITEM_ID (string) 1 metadata field Interactions USER_ID (string) ITEM_ID (string) TIMESTAMP (long) EVENT_TYPE (string) EVENT_VALUE (string)

Slide 29

Slide 29 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training data userId movieId timestamp 1 1 964982703 1 3 964981247 1 6 964982224 2 47 964983815 2 50 964982931 2 70 964982400 . . .

Slide 30

Slide 30 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "type": "record", "name": "Users", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "GENDER", "type": "string", "categorical": true } ], "version": "1.0" } Training data schema – Users For categories, like genre

Slide 31

Slide 31 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "ITEM_ID", "type": "string" }, { "name": "TIMESTAMP", "type": "long" } ], "version": "1.0" } Training data schema – Interactions An interaction between a user and an item at a specific point in time

Slide 32

Slide 32 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "ITEM_ID", "type": "string" }, { "name": "EVENT_TYPE", "type": "string" }, { "name": "EVENT_VALUE", "type": ”float" }, { "name": "TIMESTAMP", "type": "long" } ], "version": "1.0" } Using EVENT_TYPE and EVENT_VALUE fields

Slide 33

Slide 33 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using categorical data You can include more than one category in the training data using the “vertical bar” character, also known as “pipe”: ITEM_ID,GENRE item_123,horror|comedy Multiple categories

Slide 34

Slide 34 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "solutionVersionArn": "arn:aws:personalize:…", "metrics": { "arn:aws:personalize:…": { "coverage": 0.27, "mean_reciprocal_rank_at_25": 0.0379, "normalized_discounted_cumulative_gain_at_5": 0.0405, "normalized_discounted_cumulative_gain_at_10": 0.0513, "normalized_discounted_cumulative_gain_at_25": 0.0828, "precision_at_5": 0.0136, "precision_at_10": 0.0102, "precision_at_25": 0.0091 } } } Solution metrics With the exception of coverage, higher is better

Slide 35

Slide 35 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating a solution version Metric How When coverage The proportion of unique recommended items from all queries out of the total number of unique items in the training data. mean_reciprocal_ rank_at_25 The mean of the reciprocal ranks of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest- ranked recommendation. normalized_discounted_ cumulative_gain_at_K Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is given a lower weight by a factor dependent on its position. This metric rewards relevant items that appear near the top of the list because the top of a list usually draws more attention. precision_at_K The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendations of the relevant items.

Slide 36

Slide 36 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. import boto3 personalize = boto3.client('personalize') response = personalize.create_event_tracker( name='MovieClickTracker', datasetGroupArn='arn:aws:personalize:…’ ) print(response['eventTrackerArn']) print(response['trackingId']) Recording live events – Getting a tracking ID

Slide 37

Slide 37 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "datasets": [ { "name": "ratings-dsgroup/EVENT_INTERACTIONS", "datasetArn": "arn:aws:personalize:…", "datasetType": "EVENT_INTERACTIONS", "status": "ACTIVE", "creationDateTime": 1554304597.806, "lastUpdatedDateTime": 1554304597.806 }, { "name": "ratings-dataset", "datasetArn": "arn:aws:personalize:…", "datasetType": "INTERACTIONS", "status": "ACTIVE", "creationDateTime": 1554299406.53, "lastUpdatedDateTime": 1554299406.53 } ], "nextToken": "..." } Recording live events – Event-interactions dataset New dataset created automatically for the tracking events

Slide 38

Slide 38 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. import boto3 personalize_events = boto3.client(service_name='personalize-events') personalize_events.put_events( trackingId = 'tracking_id', userId= 'USER_ID', sessionId = 'session_id', eventList = [{ 'sentAt': TIMESTAMP, 'eventType': 'EVENT_TYPE', 'properties': "{\"itemId\": \"ITEM_ID\"}" }] ) Recording live events – PutEvents operation

Slide 39

Slide 39 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. personalize_events.put_events( trackingId = 'tracking_id', userId= 'user555', sessionId = 'session1', eventList = [{ 'eventId': 'event1', 'sentAt': '1553631760', 'eventType': 'like', 'properties': json.dumps({ 'itemId': 'choc-panama', 'eventValue': 'true' }) }, { 'eventId': 'event2', 'sentAt': '1553631782', 'eventType': 'rating', 'properties': json.dumps({ 'itemId': 'movie_ten', 'eventValue': '4', 'numRatings': '13' }) }] ) More Advanced PutEvents operation Multiple events with more data

Slide 40

Slide 40 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. import { Analytics, AmazonPersonalizeProvider } from 'aws-amplify'; Analytics.addPluggable(new AmazonPersonalizeProvider()); // Configure the plugin after adding it to the Analytics module Analytics.configure({ AmazonPersonalize: { // REQUIRED - The trackingId to track the events trackingId: '', // OPTIONAL - Amazon Personalize service region region: 'XX-XXXX-X', // OPTIONAL - The number of events to be deleted from the buffer when flushed flushSize: 10, // OPTIONAL - The interval in ms to perform a buffer check and flush if necessary flushInterval: 5000, // 5s } }); Recording live events with AWS Amplify Using Amazon Personalize

Slide 41

Slide 41 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analytics.record({ eventType: "Identify", properties: { "userId": "" } }, 'AmazonPersonalize’); Analytics.record({ eventType: "", userId: "", (optional) properties: { "itemId": "", "eventValue": "" } }, "AmazonPersonalize"); Recording live events with AWS Amplify Send events from the browser https://aws-amplify.github.io/docs/js/analytics

Slide 42

Slide 42 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using predefined recipes Recipe type API userId itemId inputList USER_PERSONALIZATION GetRecommendations required optional N/A PERSONALIZED_RANKING GetPersonalizedRanking required N/A list of itemId's RELATED_ITEMS GetRecommendations not used required N/A

Slide 43

Slide 43 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predefined USER_PERSONALIZATION Recipes Recipe How When AutoML Metadata HRNN A hierarchical recurrent neural network, which can model the temporal order of user-item interactions. Recommended when user behavior is changing with time (the evolving intent problem). ✔ HRNN- Metadata HRNN with additional features derived from contextual metadata (Interactions dataset), along with user and item metadata (Users and Items datasets). Performs better than non- metadata models when high quality metadata is available. Can involve longer training times. ✔ ✔ HRNN- Coldstart Similar to HRNN-metadata with personalized exploration of new items. Recommended when frequently adding new items to a catalog and you want the items to immediately show up in recommendations. ✔ ✔ Popularity- Count Calculates popularity of items based on a count of events against that item in the user-item interactions dataset. Use as a baseline to compare other user-personalization recipes.

Slide 44

Slide 44 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hierarchical Recurrent Neural Networks Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks Massimo Quadrana Politecnico di Milano, Milan, Italy [email protected] Alexandros Karatzoglou Telefonica Research, Barcelona, Spain [email protected] Balázs Hidasi Gravity R&D, Budapest, Hungary [email protected] Paolo Cremonesi Politecnico di Milano, Milan, Italy [email protected] ABSTRACT Session-based recommendations are highly relevant in many mod- ern on-line services (e.g. e-commerce, video streaming) and rec- ommendation settings. Recently, Recurrent Neural Networks have been shown to perform very well in session-based settings. While in many session-based recommendation domains user identiers are hard to come by, there are also domains in which user proles are readily available. We propose a seamless way to personalize RNN models with cross-session information transfer and devise a Hierarchical RNN model that relays end evolves latent hidden states of the RNNs across user sessions. Results on two industry datasets show large improvements over the session-only RNNs. CCS CONCEPTS • Information systems → Recommender systems; • Comput- ing methodologies → Neural networks; KEYWORDS recurrent neural networks; personalization; session-based recom- mendation; session-aware recommendation 1 INTRODUCTION In many online systems where recommendations are applied, inter- actions between a user and the system are organized into sessions. A session is a group of interactions that take place within a given time frame. Sessions from a user can occur on the same day, or over several days, weeks, or months. A session usually has a goal, such as nding a good restaurant in a city, or listening to music of a certain style or mood. Providing recommendations in these domains poses unique chal- lenges that until recently have been mainly tackled by applying conventional recommender algorithms [10] on either the last inter- action or the last session (session-based recommenders). Recurrent Neural Networks (RNNs) have been recently used for the purpose of session-based recommendations [7] outperforming item-based Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. RecSys’17, August 27–31, 2017, Como, Italy. © 2017 ACM. 978-1-4503-4652-8/17/08...$15.00 DOI: http://dx.doi.org/10.1145/3109859.3109896 methods by 15% to 30% in terms of ranking metrics. In session-based recommenders, recommendations are provided based solely on the interactions in the current user session, as user are assumed to be anonymous. But in many of these systems there are cases where a user might be logged-in (e.g. music streaming services) or some form of user identier might be present (cookie or other identier). In these cases it is reasonable to assume that the user behavior in past sessions might provide valuable information for providing recommendations in the next session. A simple way of incorporating past user session information in session-based algorithm would be to simply concatenate past and current user sessions. While this seems like a reasonable approach, we will see in the experimental section that this does not yield the best results. In this work we describe a novel algorithm based on RNNs that can deal with both cases: (i) session-aware recommenders, when user identiers are present and propagate information from the previ- ous user session to the next, thus improving the recommendation accuracy, and (ii) session-based recommenders, when there are no past sessions (i.e., no user identiers). The algorithm is based on a Hierarchical RNN where the hidden state of a lower-level RNN at the end of one user session is passed as an input to a higher-level RNN which aims at predicting a good initialization (i.e., a good context vector) for the hidden state of the lower RNN for the next session of the user. We evaluate the Hierarchical RNNs on two datasets from in- dustry comparing them to the plain session-based RNN and to item-based collaborative ltering. Hierarchical RNNs outperform both alternatives by a healthy margin. 2 RELATED WORK Session-based recommendations. Classical CF methods (e.g. ma- trix factorization) break down in the session-based setting when no user prole can be constructed from past user behavior. A natural solution to this problem is the item-to-item recommendation ap- proach [11, 16]. In this setting an item-to-item similarity matrix is precomputed from the available session data, items that are often clicked together in sessions are deemed to be similar. These similar- ities are then used to create recommendations. While simple, this method has been proven to be eective and is widely employed. Though, these methods only take into account the last click of the user, in eect ignoring the information of the previous clicks. arXiv:1706.04148v5 [cs.LG] 23 Aug 2017 s 1 s 2 i 2,4 i 1,3 c 2 c 0 c 1 user representation propagation i 2,3 i 2,1 i 2,2 prediction i 2,5 i 2,4 i 2,2 i 2,3 input item id i 1,4 i 1,2 i 1,3 user-level representation session-level representation session initialization i 1,1 i 1,2 s 1,0 Figure 1: Graphical representation of the proposed Hierarchical RNN model for personalized session-based recommendation. The model is composed of an hierarchy of two GRUs, the session-level GRU (GRUses ) and the user-level GRU (GRUusr ). The session-level GRU models the user activity within sessions and generates recommendations. The user-level GRU models the evolution of the user across sessions and provides personalization capabilities to the session-level GRU by initializing its hidden state and, optionally, by propagating the user representation in input. way, the user-level GRU can track the evolution of the user across sessions and, in turn, model the dynamics user interests seamlessly. Notice that the user-level representation is kept xed throughout the session and it is updated only when the session ends. The user-level representation is then used to initialize the hidden state of the session-level GRU. Given cm, the initial hidden state sm+1,0 of the session-level GRU for the following session is set to sm+1,0 = tanh (Winitcm + binit ) (4) where Winit and binit are the initialization weights and biases respectively. In this way, the information relative to the preferences expressed by the user in the previous sessions is transferred to the session-level. Session-level representations are then updated as follows training) how user sessions evolve during time. We will see in the experimental section that this is crucial in achieving increased per- formance. In eectGRUusr computes and evolves a user prole that is based on the previous user sessions, thus in eect personalizing the GRUses . In the original RNN, users who had clicked/interacted with the same sequence of items in a session would get the same recommendations; in HRNN this is not anymore the case, recom- mendations will be inuenced by the the users past sessions as well. In summary, we considered the following two dierent HRNN settings, depending on whether the user representation cm is con- sidered in Equation 5: • HRNN Init, in which cm is used only to initialize the repre- sentation of the next session. https://arxiv.org/abs/1706.04148

Slide 45

Slide 45 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predefined PERSONALIZED_RANKING recipes Recipe How When AutoML Metadata Personalized- Ranking Use this recipe when you’re personalizing the results for your users, such as, personalized reranking of search results or curated lists.

Slide 46

Slide 46 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predefined RELATED_ITEMS recipes Recipe How When AutoML Metadata SIMS Item-to-item similarities (SIMS) is based on the concept of collaborative filtering. It generates items similar to a given item based on co-occurrence of the item in user history in the user-item interaction dataset. In the absence of sufficient user behavior data for an item, or if the specified item ID is not found, the algorithm returns popular items as recommendations. Use for improving item discoverability and in detail pages. Provides fast performance.

Slide 47

Slide 47 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 48

Slide 48 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Personalize examples & notebooks https://github.com/aws-samples/amazon-personalize-samples

Slide 49

Slide 49 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 50

Slide 50 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Takeaways • Forecasting and personalization are can help improve your business efficiency • Amazon Forecast provides accurate time-series forecasting • Amazon Personalize provides a real-time personalization and recommendation • They are both based on the same technology used at Amazon.com and don’t require machine learning expertise to be used

Slide 51

Slide 51 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Links • Blogs • https://aws.amazon.com/blogs/aws/amazon-forecast-time-series-forecasting-made-easy/ • https://aws.amazon.com/blogs/aws/amazon-forecast-now-generally-available/ • https://aws.amazon.com/blogs/aws/amazon-personalize-real-time-personalization-and- recommendation-for-everyone/ • https://aws.amazon.com/blogs/aws/amazon-personalize-is-now-generally-available/ • Examples & Notebooks • https://github.com/aws-samples/amazon-forecast-samples • https://github.com/aws-samples/amazon-personalize-samples

Slide 52

Slide 52 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Links • Training algorithms • DeepAR – https://arxiv.org/abs/1704.04110 • HRNN – https://arxiv.org/abs/1706.04148 • Evaluating performance of a trained model • https://en.wikipedia.org/wiki/Mean_absolute_percentage_error (MAPE) • https://en.wikipedia.org/wiki/Quantile_regression • https://en.wikipedia.org/wiki/Mean_reciprocal_rank • https://en.wikipedia.org/wiki/Discounted_cumulative_gain

Slide 53

Slide 53 text

Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Danilo Poccia @danilop