Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Simplifying time-series forecasting and real-time personalization

Simplifying time-series forecasting and real-time personalization

AWS Innovate – ML & AI Edition, October 17th, 2019

https://aws.amazon.com/events/aws-innovate-2019/emea-machine-learning-edition/

Personalization and forecasting have long been very complex problems for organizations to solve. In this session, we show you how to use Amazon Personalize and Amazon Forecast, two new services that enable you to create individualized recommendations for customers and deliver highly accurate forecasts. Both run on fully managed infrastructure and provide easy-to-use recipes that deliver high-quality models even if you have little machine learning experience.

Danilo Poccia

October 17, 2019
Tweet

More Decks by Danilo Poccia

Other Decks in Programming

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Simplifying time-series forecasting
    and real-time personalization
    Danilo Poccia
    Principal Evangelist, Serverless
    Amazon Web Services
    @danilop
    S e s s i o n I D

    View Slide

  2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Agenda
    Time-series forecasting
    Amazon Forecast – introduction & demo
    Real-time personalization & recommendation
    Amazon Personalize – introduction & demo
    Takeaways

    View Slide

  3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    M L F R A M E W O R K S &
    I N F R A S T R U C T U R E
    A I S E R V I C E S
    R E K O G N I T I O N
    I M A G E
    P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D &
    C O M P R E H E N D
    M E D I C A L
    L E X
    R E K O G N I T I O N
    V I D E O
    Vision Speech Language Chatbots
    A M A Z O N
    S A G E M A K E R
    B U I L D T R A I N
    F O R E C A S T
    Forecasting
    T E X T R A C T
    Recommendations
    D E P L O Y
    Pre-built algorithms
    Data labeling (G R O U N D T R U T H )
    One-click model training & tuning
    Optimization (N E O )
    M L S E R V I C E S
    F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
    E C 2 P 3
    & P 3 d n
    E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
    I N F E R E N C E
    Reinforcement learning
    Algorithms & models ( A W S M A R K E T P L A C E
    F O R M A C H I N E L E A R N I N G )
    I N F E R E N T I A
    Notebook Hosting
    One-click deployment & hosting
    Auto-scaling
    Virtual Private Cloud
    Private Link
    Elastic Inference integration
    Hyper Parameter Optimization
    P E R S O N A L I Z E

    View Slide

  4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Forecasting
    Product Demand Planning
    Financial planning
    Resource planning

    View Slide

  6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Amazon Forecast

    View Slide

  7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Amazon Forecast workflow
    1. Create related datasets and a dataset group
    2. Get training data
    • Import historical data to the dataset group
    3. Train a predictor (trained model) using an algorithm or AutoML
    4. Evaluate the predictor version using metrics
    5. Create a forecast (for every item in the dataset group)
    6. Retrieve forecasts for users

    View Slide

  8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    How Amazon Forecast works
    • Dataset Groups
    • Datasets
    • TARGET_TIME_SERIES – (item_id, timestamp, demand) – demand is required
    • RELATED_TIME_SERIES – (item_id, timestamp, price) – no demand
    • ITEM_METADATA – (item_id, color, location, genre, category, …)
    • Predictors
    • Forecasts

    View Slide

  9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Dataset domains
    Domain For
    RETAIL retail demand forecasting
    INVENTORY_PLANNING supply chain and inventory planning
    EC2_CAPACITY forecasting Amazon EC2 capacity
    WORK_FORCE work force planning
    WEB_TRAFFIC estimating future web traffic
    METRICS forecasting metrics, such as revenue and cash flow
    CUSTOM all other types of time-series forecasting

    View Slide

  10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    TARGET_TIME_SERIES dataset
    timestamp item_id store demand
    2019-01-01 socks NYC 25
    2019-01-05 socks SFO 45
    2019-02-01 shoes ORD 10
    . . .
    2019-06-01 socks NYC 100
    2019-06-05 socks SFO 5
    2019-07-01 shoes ORD 50

    View Slide

  11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    {
    "attributes": [
    {
    "attributeName": "timestamp",
    "attributeType": "timestamp"
    },
    {
    "attributeName": "item_id",
    "attributeType": "string"
    },
    {
    "attributeName": "store",
    "attributeType": "string"
    },
    {
    "attributeName": "demand",
    "attributeType": "float"
    }
    ]
    }
    Dataset schema
    "YYYY-MM-DD hh:mm:ss"

    View Slide

  12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Data alignment
    Data is automatically aggregated by forecast frequency,
    for example, hourly, daily, or weekly.

    View Slide

  13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    RELATED_TIME_SERIES dataset
    timestamp item_id store price
    2019-01-01 socks NYC 10
    2019-01-02 socks NYC 10
    2019-01-03 socks NYC 15
    . . .
    2019-01-05 socks SFO 45
    2019-06-05 socks SFO 10
    2019-07-11 socks SFO 30
    . . .
    2019-02-01 shoes ORD 50
    2019-07-01 shoes ORD 75
    2019-07-11 shoes ORD 60

    View Slide

  14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Algorithms
    Algorithm What
    ARIMA
    Autoregressive Integrated Moving Average (ARIMA) is a
    commonly-used local statistical algorithm for time-series
    forecasting.
    DeepAR+
    a supervised learning algorithm for forecasting scalar (one-
    dimensional) time series using recurrent neural networks (RNNs).
    Supports hyperparameter optimization (HPO).
    ETS
    Exponential Smoothing (ETS) is a commonly-used local statistical
    algorithm for time-series forecasting
    NPTS
    Non-Parametric Time Series (NPTS) is a scalable, probabilistic
    baseline forecaster algorithm. NPTS is especially useful when the
    time series is intermittent (or sparse, containing many 0s) and
    bursty.
    Prophet A popular local Bayesian structural time series model.

    View Slide

  15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    DeepAR algorithm
    DeepAR: Probabilistic Forecasting with
    Autoregressive Recurrent Networks
    David Salinas, Valentin Flunkert, Jan Gasthaus
    Amazon Research
    Germany

    Abstract
    Probabilistic forecasting, i.e. estimating the probability distribution of a time se-
    ries’ future given its past, is a key enabler for optimizing business processes. In
    retail businesses, for example, forecasting demand is crucial for having the right
    inventory available at the right time at the right place. In this paper we propose
    DeepAR, a methodology for producing accurate probabilistic forecasts, based on
    training an auto-regressive recurrent network model on a large number of related
    time series. We demonstrate how by applying deep learning techniques to fore-
    casting, one can overcome many of the challenges faced by widely-used classical
    approaches to the problem. We show through extensive empirical evaluation on
    several real-world forecasting data sets accuracy improvements of around 15%
    compared to state-of-the-art methods.
    1 Introduction
    Forecasting plays a key role in automating and optimizing operational processes in most businesses
    and enables data driven decision making. In retail for example, probabilistic forecasts of product
    supply and demand can be used for optimal inventory management, staff scheduling and topology
    planning [18], and are more generally a crucial technology for most aspects of supply chain opti-
    mization.
    The prevalent forecasting methods in use today have been developed in the setting of forecasting
    individual or small groups of time series. In this approach, model parameters for each given time
    series are independently estimated from past observations. The model is typically manually selected
    to account for different factors, such as autocorrelation structure, trend, seasonality, and other ex-
    planatory variables. The fitted model is then used to forecast the time series into the future according
    to the model dynamics, possibly admitting probabilistic forecasts through simulation or closed-form
    expressions for the predictive distributions. Many methods in this class are based on the classical
    Box-Jenkins methodology [3], exponential smoothing techniques, or state space models [11, 19].
    In recent years, a new type of forecasting problem has become increasingly important in many appli-
    cations. Instead of needing to predict individual or a small number of time series, one is faced with
    forecasting thousands or millions of related time series. Examples include forecasting the energy
    consumption of individual households, forecasting the load for servers in a data center, or forecast-
    ing the demand for all products that a large retailer offers. In all these scenarios, a substantial amount
    of data on past behavior of similar, related time series can be leveraged for making a forecast for an
    individual time series. Using data from related time series not only allows fitting more complex (and
    hence potentially more accurate) models without overfitting, it can also alleviate the time and labor
    intensive manual feature engineering and model selection steps required by classical techniques.
    In this work we present DeepAR, a forecasting method based on autoregressive recurrent networks,
    which learns such a global model from historical data of all time series in the data set. Our method
    arXiv:1704.04110v3 [cs.AI] 22 Feb 2019
    zi,t 2, xi,t 1
    hi,t 1
    `(zi,t 1
    |✓i,t 1)
    zi,t 1
    zi,t 1, xi,t
    hi,t
    `(zi,t
    |✓i,t)
    zi,t
    zi,t, xi,t+1
    hi,t+1
    `(zi,t+1
    |✓i,t+1)
    zi,t+1
    inputs
    network
    ˜
    zi,t 2, xi,t 1
    hi,t 1
    `(zi,t 1
    |✓i,t 1)
    ˜
    zi,t 1
    ˜
    zi,t 1, xi,t
    hi,t
    `(zi,t
    |✓i,t)
    ˜
    zi,t
    ˜
    zi,t, xi,t+1
    hi,t+1
    `(zi,t+1
    |✓i,t+1)
    ˜
    zi,t+1
    inputs
    network
    samples
    ˜
    z ⇠ `(·|✓)
    Figure 2: Summary of the model. Training (left): At each time step t, the inputs to the network
    are the covariates xi,t
    , the target value at the previous time step zi,t 1
    , as well as the previous
    network output hi,t 1
    . The network output hi,t = h(hi,t 1, zi,t 1, xi,t, ⇥) is then used to compute
    the parameters ✓i,t = ✓(hi,t, ⇥) of the likelihood `(z|✓), which is used for training the model
    parameters. For prediction, the history of the time series zi,t
    is fed in for t < t0
    , then in the
    prediction range (right) for t t0
    a sample ˆ
    zi,t
    ⇠ `(·|✓i,t) is drawn and fed back for the next
    point until the end of the prediction range t = t0 + T generating one sample trace. Repeating this
    prediction process yields many traces representing the joint predicted distribution.
    often do not alleviate these conditions, forecasting methods have also incorporated more suitable
    likelihood functions, such as the zero-inflated Poisson distribution, the negative binomial distribution
    [20], a combination of both [4], or a tailored multi-stage likelihood [19].
    Sharing information across time series can improve the forecast accuracy, but is difficult to accom-
    plish in practice, because of the often heterogeneous nature of the data. Matrix factorization methods
    (e.g. the recent work of Yu et al. [23]), as well as Bayesian methods that share information via hi-
    erarchical priors [4] have been proposed as mechanisms for learning across multiple related time
    series and leveraging hierarchical structure [13].
    Neural networks have been investigated in the context of forecasting for a long time (see e.g. the
    numerous references in the survey [24], or [7] for more recent work considering LSTM cells).
    More recently, Kourentzes [17] applied neural networks specifically to intermittent data but ob-
    https://arxiv.org/abs/1704.04110

    View Slide

  16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Training using a BackTestWindow

    View Slide

  17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Training & Testing

    View Slide

  18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Predictor metrics
    wQuantileLoss[0.5]
    Mean Absolute
    Percentage Error
    Root Mean
    Square Error

    View Slide

  19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Predictor metrics – Quantiles

    View Slide

  20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Getting a forecast – Interpreting P-numbers

    View Slide

  21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Amazon Forecast examples & notebooks
    https://github.com/aws-samples/amazon-forecast-samples

    View Slide

  23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Personalization & Recommendation
    Personalized recommendations
    Personalized search
    Personalized notifications

    View Slide

  25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Amazon Personalize

    View Slide

  26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Amazon Personalize workflow
    1. Create related datasets and a dataset group
    2. Get training data
    • Import historical data to the dataset group
    • Record live events to the dataset group
    3. Create a solution version (trained model) using a recipe or AutoML
    4. Evaluate the solution version using metrics
    5. Create a campaign (deploy the solution version)
    6. Provide recommendations for users

    View Slide

  27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    How Amazon Personalize works
    • Dataset Groups
    • Datasets
    • Users – age, gender, or loyalty membership
    • Items – price, type, or availability
    • Interactions – between users and items
    • User Events
    • Recipes and Solutions
    • Metrics
    • Campaigns
    • Recommendations

    View Slide

  28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Dataset schemas
    Dataset Type Required Fields Reserved Keywords
    Users
    USER_ID (string)
    1 metadata field
    Items
    ITEM_ID (string)
    1 metadata field
    Interactions
    USER_ID (string)
    ITEM_ID (string)
    TIMESTAMP (long)
    EVENT_TYPE (string)
    EVENT_VALUE (string)

    View Slide

  29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Training data
    userId movieId timestamp
    1 1 964982703
    1 3 964981247
    1 6 964982224
    2 47 964983815
    2 50 964982931
    2 70 964982400
    . . .

    View Slide

  30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
    {
    "name": "USER_ID",
    "type": "string"
    },
    {
    "name": "AGE",
    "type": "int"
    },
    {
    "name": "GENDER",
    "type": "string",
    "categorical": true
    }
    ],
    "version": "1.0"
    }
    Training data schema – Users
    For categories, like genre

    View Slide

  31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
    {
    "name": "USER_ID",
    "type": "string"
    },
    {
    "name": "ITEM_ID",
    "type": "string"
    },
    {
    "name": "TIMESTAMP",
    "type": "long"
    }
    ],
    "version": "1.0"
    }
    Training data schema – Interactions
    An interaction between
    a user and an item
    at a specific point in time

    View Slide

  32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
    {
    "name": "USER_ID",
    "type": "string"
    },
    {
    "name": "ITEM_ID",
    "type": "string"
    },
    {
    "name": "EVENT_TYPE",
    "type": "string"
    },
    {
    "name": "EVENT_VALUE",
    "type": ”float"
    },
    {
    "name": "TIMESTAMP",
    "type": "long"
    }
    ],
    "version": "1.0"
    }
    Using EVENT_TYPE and EVENT_VALUE fields

    View Slide

  33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Using categorical data
    You can include more than one category in the training data using the
    “vertical bar” character, also known as “pipe”:
    ITEM_ID,GENRE
    item_123,horror|comedy
    Multiple categories

    View Slide

  34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    {
    "solutionVersionArn": "arn:aws:personalize:…",
    "metrics": {
    "arn:aws:personalize:…": {
    "coverage": 0.27,
    "mean_reciprocal_rank_at_25": 0.0379,
    "normalized_discounted_cumulative_gain_at_5": 0.0405,
    "normalized_discounted_cumulative_gain_at_10": 0.0513,
    "normalized_discounted_cumulative_gain_at_25": 0.0828,
    "precision_at_5": 0.0136,
    "precision_at_10": 0.0102,
    "precision_at_25": 0.0091
    }
    }
    }
    Solution metrics
    With the exception of coverage,
    higher is better

    View Slide

  35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Evaluating a solution version
    Metric How When
    coverage
    The proportion of unique recommended items
    from all queries out of the total number of
    unique items in the training data.
    mean_reciprocal_
    rank_at_25
    The mean of the reciprocal ranks of the first
    relevant recommendation out of the top 25
    recommendations over all queries.
    This metric is appropriate if you're
    interested in the single highest-
    ranked recommendation.
    normalized_discounted_
    cumulative_gain_at_K
    Discounted gain assumes that
    recommendations lower on a list of
    recommendations are less relevant than higher
    recommendations. Therefore, each
    recommendation is given a lower weight by a
    factor dependent on its position.
    This metric rewards relevant items
    that appear near the top of the
    list because the top of a list
    usually draws more attention.
    precision_at_K
    The number of relevant recommendations out
    of the top K recommendations divided by K.
    This metric rewards precise
    recommendations of the relevant
    items.

    View Slide

  36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    import boto3
    personalize = boto3.client('personalize')
    response = personalize.create_event_tracker(
    name='MovieClickTracker',
    datasetGroupArn='arn:aws:personalize:…’
    )
    print(response['eventTrackerArn'])
    print(response['trackingId'])
    Recording live events – Getting a tracking ID

    View Slide

  37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    {
    "datasets": [
    {
    "name": "ratings-dsgroup/EVENT_INTERACTIONS",
    "datasetArn": "arn:aws:personalize:…",
    "datasetType": "EVENT_INTERACTIONS",
    "status": "ACTIVE",
    "creationDateTime": 1554304597.806,
    "lastUpdatedDateTime": 1554304597.806
    },
    {
    "name": "ratings-dataset",
    "datasetArn": "arn:aws:personalize:…",
    "datasetType": "INTERACTIONS",
    "status": "ACTIVE",
    "creationDateTime": 1554299406.53,
    "lastUpdatedDateTime": 1554299406.53
    }
    ],
    "nextToken": "..."
    }
    Recording live events – Event-interactions dataset
    New dataset created
    automatically for the
    tracking events

    View Slide

  38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    import boto3
    personalize_events = boto3.client(service_name='personalize-events')
    personalize_events.put_events(
    trackingId = 'tracking_id',
    userId= 'USER_ID',
    sessionId = 'session_id',
    eventList = [{
    'sentAt': TIMESTAMP,
    'eventType': 'EVENT_TYPE',
    'properties': "{\"itemId\": \"ITEM_ID\"}"
    }]
    )
    Recording live events – PutEvents operation

    View Slide

  39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    personalize_events.put_events(
    trackingId = 'tracking_id',
    userId= 'user555',
    sessionId = 'session1',
    eventList = [{
    'eventId': 'event1',
    'sentAt': '1553631760',
    'eventType': 'like',
    'properties': json.dumps({
    'itemId': 'choc-panama',
    'eventValue': 'true'
    })
    }, {
    'eventId': 'event2',
    'sentAt': '1553631782',
    'eventType': 'rating',
    'properties': json.dumps({
    'itemId': 'movie_ten',
    'eventValue': '4',
    'numRatings': '13'
    })
    }]
    )
    More Advanced PutEvents operation
    Multiple events
    with more data

    View Slide

  40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    import { Analytics, AmazonPersonalizeProvider } from 'aws-amplify';
    Analytics.addPluggable(new AmazonPersonalizeProvider());
    // Configure the plugin after adding it to the Analytics module
    Analytics.configure({
    AmazonPersonalize: {
    // REQUIRED - The trackingId to track the events
    trackingId: '',
    // OPTIONAL - Amazon Personalize service region
    region: 'XX-XXXX-X',
    // OPTIONAL - The number of events to be deleted from the buffer when flushed
    flushSize: 10,
    // OPTIONAL - The interval in ms to perform a buffer check and flush if necessary
    flushInterval: 5000, // 5s
    }
    });
    Recording live events with AWS Amplify
    Using
    Amazon Personalize

    View Slide

  41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Analytics.record({
    eventType: "Identify",
    properties: {
    "userId": ""
    }
    }, 'AmazonPersonalize’);
    Analytics.record({
    eventType: "",
    userId: "", (optional)
    properties: {
    "itemId": "",
    "eventValue": ""
    }
    }, "AmazonPersonalize");
    Recording live events with AWS Amplify
    Send events from the browser
    https://aws-amplify.github.io/docs/js/analytics

    View Slide

  42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Using predefined recipes
    Recipe type API userId itemId inputList
    USER_PERSONALIZATION GetRecommendations required optional N/A
    PERSONALIZED_RANKING GetPersonalizedRanking required N/A list of itemId's
    RELATED_ITEMS GetRecommendations not used required N/A

    View Slide

  43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Predefined USER_PERSONALIZATION Recipes
    Recipe How When AutoML Metadata
    HRNN
    A hierarchical recurrent neural network,
    which can model the temporal order of
    user-item interactions.
    Recommended when user
    behavior is changing with time
    (the evolving intent problem).

    HRNN-
    Metadata
    HRNN with additional features derived
    from contextual metadata (Interactions
    dataset), along with user and item
    metadata (Users and Items datasets).
    Performs better than non-
    metadata models when high
    quality metadata is available. Can
    involve longer training times.
    ✔ ✔
    HRNN-
    Coldstart
    Similar to HRNN-metadata with
    personalized exploration of new items.
    Recommended when frequently
    adding new items to a catalog
    and you want the items to
    immediately show up in
    recommendations.
    ✔ ✔
    Popularity-
    Count
    Calculates popularity of items based on
    a count of events against that item in
    the user-item interactions dataset.
    Use as a baseline to compare
    other user-personalization
    recipes.

    View Slide

  44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Hierarchical Recurrent Neural Networks
    Personalizing Session-based Recommendations with
    Hierarchical Recurrent Neural Networks
    Massimo Quadrana
    Politecnico di Milano, Milan, Italy
    [email protected]
    Alexandros Karatzoglou
    Telefonica Research, Barcelona, Spain
    [email protected]
    Balázs Hidasi
    Gravity R&D, Budapest, Hungary
    [email protected]
    Paolo Cremonesi
    Politecnico di Milano, Milan, Italy
    [email protected]
    ABSTRACT
    Session-based recommendations are highly relevant in many mod-
    ern on-line services (e.g. e-commerce, video streaming) and rec-
    ommendation settings. Recently, Recurrent Neural Networks have
    been shown to perform very well in session-based settings. While
    in many session-based recommendation domains user identiers
    are hard to come by, there are also domains in which user proles
    are readily available. We propose a seamless way to personalize
    RNN models with cross-session information transfer and devise
    a Hierarchical RNN model that relays end evolves latent hidden
    states of the RNNs across user sessions. Results on two industry
    datasets show large improvements over the session-only RNNs.
    CCS CONCEPTS
    • Information systems → Recommender systems; • Comput-
    ing methodologies → Neural networks;
    KEYWORDS
    recurrent neural networks; personalization; session-based recom-
    mendation; session-aware recommendation
    1 INTRODUCTION
    In many online systems where recommendations are applied, inter-
    actions between a user and the system are organized into sessions.
    A session is a group of interactions that take place within a given
    time frame. Sessions from a user can occur on the same day, or
    over several days, weeks, or months. A session usually has a goal,
    such as nding a good restaurant in a city, or listening to music of
    a certain style or mood.
    Providing recommendations in these domains poses unique chal-
    lenges that until recently have been mainly tackled by applying
    conventional recommender algorithms [10] on either the last inter-
    action or the last session (session-based recommenders). Recurrent
    Neural Networks (RNNs) have been recently used for the purpose
    of session-based recommendations [7] outperforming item-based
    Permission to make digital or hard copies of all or part of this work for personal or
    classroom use is granted without fee provided that copies are not made or distributed
    for prot or commercial advantage and that copies bear this notice and the full citation
    on the rst page. Copyrights for components of this work owned by others than ACM
    must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
    to post on servers or to redistribute to lists, requires prior specic permission and/or a
    fee. Request permissions from [email protected].
    RecSys’17, August 27–31, 2017, Como, Italy.
    © 2017 ACM. 978-1-4503-4652-8/17/08...$15.00
    DOI: http://dx.doi.org/10.1145/3109859.3109896
    methods by 15% to 30% in terms of ranking metrics. In session-based
    recommenders, recommendations are provided based solely on the
    interactions in the current user session, as user are assumed to be
    anonymous. But in many of these systems there are cases where
    a user might be logged-in (e.g. music streaming services) or some
    form of user identier might be present (cookie or other identier).
    In these cases it is reasonable to assume that the user behavior
    in past sessions might provide valuable information for providing
    recommendations in the next session.
    A simple way of incorporating past user session information in
    session-based algorithm would be to simply concatenate past and
    current user sessions. While this seems like a reasonable approach,
    we will see in the experimental section that this does not yield the
    best results.
    In this work we describe a novel algorithm based on RNNs that
    can deal with both cases: (i) session-aware recommenders, when user
    identiers are present and propagate information from the previ-
    ous user session to the next, thus improving the recommendation
    accuracy, and (ii) session-based recommenders, when there are no
    past sessions (i.e., no user identiers). The algorithm is based on a
    Hierarchical RNN where the hidden state of a lower-level RNN at
    the end of one user session is passed as an input to a higher-level
    RNN which aims at predicting a good initialization (i.e., a good
    context vector) for the hidden state of the lower RNN for the next
    session of the user.
    We evaluate the Hierarchical RNNs on two datasets from in-
    dustry comparing them to the plain session-based RNN and to
    item-based collaborative ltering. Hierarchical RNNs outperform
    both alternatives by a healthy margin.
    2 RELATED WORK
    Session-based recommendations. Classical CF methods (e.g. ma-
    trix factorization) break down in the session-based setting when no
    user prole can be constructed from past user behavior. A natural
    solution to this problem is the item-to-item recommendation ap-
    proach [11, 16]. In this setting an item-to-item similarity matrix is
    precomputed from the available session data, items that are often
    clicked together in sessions are deemed to be similar. These similar-
    ities are then used to create recommendations. While simple, this
    method has been proven to be eective and is widely employed.
    Though, these methods only take into account the last click of the
    user, in eect ignoring the information of the previous clicks.
    arXiv:1706.04148v5 [cs.LG] 23 Aug 2017
    s
    1
    s
    2
    i
    2,4
    i
    1,3
    c
    2
    c
    0
    c
    1
    user representation
    propagation
    i
    2,3
    i
    2,1
    i
    2,2
    prediction i
    2,5
    i
    2,4
    i
    2,2
    i
    2,3
    input
    item id
    i
    1,4
    i
    1,2
    i
    1,3
    user-level
    representation
    session-level
    representation
    session
    initialization
    i
    1,1
    i
    1,2
    s
    1,0
    Figure 1: Graphical representation of the proposed Hierarchical RNN model for personalized session-based recommendation.
    The model is composed of an hierarchy of two GRUs, the session-level GRU (GRUses ) and the user-level GRU (GRUusr ). The
    session-level GRU models the user activity within sessions and generates recommendations. The user-level GRU models the
    evolution of the user across sessions and provides personalization capabilities to the session-level GRU by initializing its
    hidden state and, optionally, by propagating the user representation in input.
    way, the user-level GRU can track the evolution of the user across
    sessions and, in turn, model the dynamics user interests seamlessly.
    Notice that the user-level representation is kept xed throughout
    the session and it is updated only when the session ends.
    The user-level representation is then used to initialize the hidden
    state of the session-level GRU. Given cm, the initial hidden state
    sm+1,0 of the session-level GRU for the following session is set to
    sm+1,0 = tanh (Winitcm + binit ) (4)
    where Winit and binit are the initialization weights and biases
    respectively. In this way, the information relative to the preferences
    expressed by the user in the previous sessions is transferred to
    the session-level. Session-level representations are then updated as
    follows
    training) how user sessions evolve during time. We will see in the
    experimental section that this is crucial in achieving increased per-
    formance. In eectGRUusr computes and evolves a user prole that
    is based on the previous user sessions, thus in eect personalizing
    the GRUses . In the original RNN, users who had clicked/interacted
    with the same sequence of items in a session would get the same
    recommendations; in HRNN this is not anymore the case, recom-
    mendations will be inuenced by the the users past sessions as
    well.
    In summary, we considered the following two dierent HRNN
    settings, depending on whether the user representation cm is con-
    sidered in Equation 5:
    • HRNN Init, in which cm is used only to initialize the repre-
    sentation of the next session.
    https://arxiv.org/abs/1706.04148

    View Slide

  45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Predefined PERSONALIZED_RANKING recipes
    Recipe How When AutoML Metadata
    Personalized-
    Ranking
    Use this recipe when
    you’re personalizing the
    results for your users,
    such as, personalized
    reranking of search
    results or curated lists.

    View Slide

  46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Predefined RELATED_ITEMS recipes
    Recipe How When AutoML Metadata
    SIMS
    Item-to-item similarities (SIMS) is
    based on the concept of collaborative
    filtering.
    It generates items similar to a given
    item based on co-occurrence of the
    item in user history in the user-item
    interaction dataset.
    In the absence of sufficient user
    behavior data for an item, or if the
    specified item ID is not found, the
    algorithm returns popular items as
    recommendations.
    Use for improving item
    discoverability and in
    detail pages.
    Provides fast
    performance.

    View Slide

  47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Amazon Personalize examples & notebooks
    https://github.com/aws-samples/amazon-personalize-samples

    View Slide

  49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Takeaways
    • Forecasting and personalization are can help improve your business
    efficiency
    • Amazon Forecast provides accurate time-series forecasting
    • Amazon Personalize provides a real-time personalization and
    recommendation
    • They are both based on the same technology used at Amazon.com and
    don’t require machine learning expertise to be used

    View Slide

  51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Links
    • Blogs
    • https://aws.amazon.com/blogs/aws/amazon-forecast-time-series-forecasting-made-easy/
    • https://aws.amazon.com/blogs/aws/amazon-forecast-now-generally-available/
    • https://aws.amazon.com/blogs/aws/amazon-personalize-real-time-personalization-and-
    recommendation-for-everyone/
    • https://aws.amazon.com/blogs/aws/amazon-personalize-is-now-generally-available/
    • Examples & Notebooks
    • https://github.com/aws-samples/amazon-forecast-samples
    • https://github.com/aws-samples/amazon-personalize-samples

    View Slide

  52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Links
    • Training algorithms
    • DeepAR – https://arxiv.org/abs/1704.04110
    • HRNN – https://arxiv.org/abs/1706.04148
    • Evaluating performance of a trained model
    • https://en.wikipedia.org/wiki/Mean_absolute_percentage_error (MAPE)
    • https://en.wikipedia.org/wiki/Quantile_regression
    • https://en.wikipedia.org/wiki/Mean_reciprocal_rank
    • https://en.wikipedia.org/wiki/Discounted_cumulative_gain

    View Slide

  53. Thank you!
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Danilo Poccia
    @danilop

    View Slide