$30 off During Our Annual Pro Sale. View Details »

Amazon Forecast and Personalize

Frank Munz
October 24, 2019

Amazon Forecast and Personalize

A peek into the future with Amazon Forecast and Personalize. AWS DevDay DACH 2109.

Companies today use everything from simple spreadsheets to complex financial planning software to attempt to accurately forecast future business outcomes such as product demand, resource needs, or financial performance. These tools build forecasts by looking at a historical series of data, which is called time series data. For example, such tools may try to predict the future sales of a raincoat by looking only at its previous sales data with the underlying assumption that the future is determined by the past. This approach can struggle to produce accurate forecasts for large sets of data that have irregular trends. Also, it fails to easily combine data series that change over time (such as price, discounts, web traffic, and number of employees) with relevant independent variables like product features and store locations.
Based on the same technology used at Amazon.com, Amazon Forecast uses machine learning to combine time series data with additional variables to build forecasts. Amazon Forecast requires no machine learning experience to get started. You only need to provide historical data, plus any additional data that you believe may impact your forecasts. For example, the demand for a particular color of a shirt may change with the seasons and store location. This complex relationship is hard to determine on its own, but machine learning is ideally suited to recognize it. Once you provide your data, Amazon Forecast will automatically examine it, identify what is meaningful, and produce a forecasting model capable of making predictions that are up to 50% more accurate than looking at time series data alone.

Frank Munz

October 24, 2019
Tweet

More Decks by Frank Munz

Other Decks in Programming

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Solve complex business problems with
    Amazon Personalize and Amazon Forecast
    Frank Munz
    Sr Technical Evangelist AWS
    D A C H 2 0 1 9
    @frankmunz

    View Slide

  2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    About me
    • Software Architect / DevOps Engineer
    • Technical Evangelist @ AWS
    • Published an AWS book
    • Containers, serverless and a sprinkle
    of ML & big / fast data
    @frankmunz

    View Slide

  3. Customers often ask,
    “How can we tap into Amazon’s
    experience in machine learning?”

    View Slide

  4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    S U M M I T
    M L F R A M E W O R K S &
    I N F R A S T R U C T U R E
    The Amazon ML stack: Broadest & deepest set of capabilities
    A I S E R V I C E S
    A m a z o n
    R e k o g n i t i o n
    I m a g e
    A m a z o n
    P o l l y
    A m a z o n
    T r a n s c r i b e
    A m a z o n
    T r a n s l a t e
    A m a z o n
    C o m p r e h e n d &
    C o m p r e h e n d
    M e d i c a l
    A m a z o n
    L e x
    A m a z o n
    R e k o g n i t i o n
    V i d e o
    Vision Speech Language Chatbots
    A m a z o n
    S a g e M a k e r
    B U I L D T R A I N
    A m a z o n
    F o r e c a s t
    Forecasting
    A m a z o n
    T e x t r a c t
    A m a z o n
    P e r s o n a l i z e
    Recommendations
    D E P L O Y
    Pre-built algorithms
    Data labeling (G r o u n d T r u t h )
    One-click model training & tuning
    Optimization (N e o )
    M L S E R V I C E S
    F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
    A m a z o n
    E C 2 P 3 &
    P 3 d n
    A m a z o n
    E C 2 C 5
    F P G A s A W S I o T
    G r e e n g r a s s
    A m a z o n
    E l a s t i c
    I n f e r e n c e
    Reinforcement learning
    Algorithms & models ( A W S M a r k e t p l a c e
    f o r M a c h i n e L e a r n i n g )
    A W S
    I n f e r e n t i a
    Notebook Hosting
    One-click deployment & hosting
    Automatic scaling
    Virtual private cloud
    AWS PrivateLink
    Amazon Elastic Inference integration
    Hyper parameter optimization

    View Slide

  5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Accurate time-series forecasting service, based
    on the same technology used at Amazon.com.
    No ML experience required.

    View Slide

  6. Predicting future points in a time-series
    Product demand Workforce demand
    Financial
    metrics
    Inventory
    planning

    View Slide

  7. (Understanding) Accuracy is key in Forecasting
    Under forecasting leads
    to lost opportunity
    Over-forecasting leads
    to wasted resources

    View Slide

  8. Traditional time-series models
    Trend + Seasonality

    View Slide

  9. Never trust statistics alone;
    visualize your data.

    View Slide

  10. Traditional time-series models
    • Independent forecasts
    • Strong structural assumptions
    • De-facto industry standard
    • Well-understood, > 50 yrs. research
    • High data efficiency
    • Data must match the structural
    assumptions
    • Cannot identify patterns
    across time series

    View Slide

  11. Traditional methods struggle with real-world forecasting
    Don’t consider
    metadata
    Don’t consider
    external factors such
    as holidays and
    promotions
    Can’t handle
    time-series with
    no history

    View Slide

  12. A Real World Example
    The Visual Miscellaneum by David McCandless

    View Slide

  13. Using deep learning increases forecast accuracy

    View Slide

  14. Discovering shared patterns with deep learning

    View Slide

  15. Deep learning time-series models
    • Global models: identify patterns
    using all available time series
    • Group-dependent seasonality and lifecycle
    • Behavior in response to covariate inputs
    • Weak structural assumptions
    • Can be significantly more accurate
    than traditional methods
    • Can easily incorporate and learn
    from rich metadata
    • Support cold-start forecasts for new
    items

    View Slide

  16. Probabilistic forecasts
    • Quantification of uncertainty
    • Support optimal decision making
    • Make “wrong” forecasts useful
    • All Amazon Forecast algorithms
    support generating probabilistic
    forecasts
    • Forecasts can be obtained for
    different quantiles of the predictive
    distribution

    View Slide

  17. Learning with covariates
    • Additional inputs can
    • Explain historical data
    • Drive forecast behavior
    • Examples from retail demand
    forecasting
    • Price information
    • Information about promotions
    • Out-of-stock information
    • Web page views
    • Categorical inputs can be used to
    identify group-level patterns
    Fashion
    Women’s
    Clothing
    Shoes
    Watches
    Men’s
    Clothing
    Shoes
    Watches
    Girls'
    Clothing
    Shoes
    Watches
    Boys'
    Clothing
    Shoes
    Watches

    View Slide

  18. Amazon Forecast
    web traffic,

    View Slide

  19. Amazon Forecast workflow
    1. Create related datasets and a dataset group
    2. Get training data
    • Import historical data to the dataset group
    3. Train a predictor (trained model with HPO) using an algorithm or
    AutoML
    4. Evaluate the predictor version using metrics
    5. Create a forecast (for every item in the dataset group)
    6. Retrieve forecasts for users

    View Slide

  20. Comparision: AWS Rekognition with builtin Model

    View Slide

  21. Algorithms
    Algorithm What
    ARIMA
    Autoregressive integrated moving average (ARIMA) is a commonly used local
    statistical algorithm for time-series forecasting
    DeepAR+
    A supervised learning algorithm for forecasting scalar (one-dimensional)
    time series using recurrent neural networks (RNNs); supports
    hyperparameter optimization (HPO)
    ETS
    Exponential smoothing (ETS) is a commonly used local statistical algorithm
    for time-series forecasting
    NPTS
    Non-parametric time series (NPTS) is a scalable, probabilistic baseline
    forecaster algorithm; NPTS is especially useful when the time series is
    intermittent (or sparse, containing many 0s) and bursty
    Prophet A popular local Bayesian structural time series model

    View Slide

  22. TARGET_TIME_SERIES dataset
    timestamp item_id store demand
    2019-01-01 socks NYC 25
    2019-01-05 socks SFO 45
    2019-02-01 shoes ORD 10

    2019-06-01 socks NYC 100
    2019-06-05 socks SFO 5
    2019-07-01 shoes ORD 50

    View Slide

  23. Data alignment
    Data is automatically aggregated by forecast frequency,
    for example, hourly, daily, or weekly.

    View Slide

  24. Training & testing

    View Slide

  25. Applicable across multiple different domains

    View Slide

  26. Predictor metrics: Quantiles

    View Slide

  27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  28. Getting a forecast: Interpreting P-numbers

    View Slide

  29. Amazon Forecast examples & notebooks
    https://github.com/aws-samples/amazon-forecast-samples

    View Slide

  30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Real-time personalization and recommendation service,
    based on the same technology used at Amazon.com.
    No ML experience required.

    View Slide

  31. Personalizing user experience is proven to increase
    discoverability, engagement, user satisfaction, and revenue
    30% of page views
    on Amazon are
    from recommendations
    … However, most customers find personalization
    hard to get right

    View Slide

  32. Effective personalization requires solving multiple hard problems
    Reacting to user interactions in real time
    Avoiding mostly showing popular items
    Handling cold start (insufficient data about
    new users/items)
    Scale

    View Slide

  33. Traditional recommender systems aren’t adequate
    • Rule-based systems perform poorly,
    don’t scale and are hard to maintain
    • Collaborative filtering and matrix
    factorization methods are good for
    v1, but deep neural networks, esp.
    recurrent neural networks, that take
    into account the sequence of a user’s
    activity (clicks) out-perform other
    methods

    View Slide

  34. State of the Art Performance
    0,954
    0,928 0,925 0,922
    0,91
    0,856
    Rolling
    Average
    T-SVD
    [2009]
    PMF [2008] RRN [2017] DeepRec
    [2017]
    HRNN
    Ratings RMSE on Netflix
    98 MM interactions, 500k users, 18k items
    Rolling Average T-SVD [2009]
    PMF [2008] RRN [2017]
    DeepRec [2017] HRNN
    0,933
    0,916
    0,871
    0,857
    0,846
    Rolling
    Average
    FM [2012] I-AutoRec
    [2015]
    RNN HRNN
    Ratings RMSE on MovieLens
    20 MM interactions, 173k users, 131k items
    Rolling Average FM [2012]
    I-AutoRec [2015] RNN

    View Slide

  35. Common applications & use cases
    Personalized
    recommendations
    Search
    reranking
    Notifications and
    emails
    Related Items

    View Slide

  36. Real-time data can be consumed by Amazon Personalize
    Historical user
    activity
    User
    attributes
    Item
    catalog
    Real-time data
    Mobile
    SDKs
    (coming soon)
    JavaScript SDK
    Amazon S3
    bucket
    Server-Side SDKs
    Offline data
    Real-time data can be consumed by Amazon Personalize
    Historical user
    activity
    User
    attributes
    Item
    catalog
    Real-time data
    Mobile
    SDKs
    (coming soon)
    JavaScript SDK
    Amazon S3
    bucket
    Server-Side SDKs
    Offline data

    View Slide

  37. Recurrent Neural Networks (RNN)
    RNN

    View Slide

  38. Modeling for personalization

    View Slide

  39. HRNN - Modeling sessions
    Learned user representation
    Hierarchical recurrent network
    User
    representation

    View Slide

  40. Hierarchical recurrent neural networks (HRNNs)
    Personalizing Session-based Recommendations with
    Hierarchical Recurrent Neural Networks
    Massimo Quadrana
    Politecnico di Milano, Milan, Italy
    [email protected]
    Alexandros Karatzoglou
    Telefonica Research, Barcelona, Spain
    [email protected]
    Balázs Hidasi
    Gravity R&D, Budapest, Hungary
    [email protected]
    Paolo Cremonesi
    Politecnico di Milano, Milan, Italy
    [email protected]
    ABSTRACT
    Session-based recommendations are highly relevant in many mod-
    ern on-line services (e.g. e-commerce, video streaming) and rec-
    ommendation settings. Recently, Recurrent Neural Networks have
    been shown to perform very well in session-based settings. While
    in many session-based recommendation domains user identiers
    are hard to come by, there are also domains in which user proles
    are readily available. We propose a seamless way to personalize
    RNN models with cross-session information transfer and devise
    a Hierarchical RNN model that relays end evolves latent hidden
    states of the RNNs across user sessions. Results on two industry
    datasets show large improvements over the session-only RNNs.
    CCS CONCEPTS
    • Information systems → Recommender systems; • Comput-
    ing methodologies → Neural networks;
    KEYWORDS
    recurrent neural networks; personalization; session-based recom-
    mendation; session-aware recommendation
    1 INTRODUCTION
    In many online systems where recommendations are applied, inter-
    actions between a user and the system are organized into sessions.
    A session is a group of interactions that take place within a given
    time frame. Sessions from a user can occur on the same day, or
    over several days, weeks, or months. A session usually has a goal,
    such as nding a good restaurant in a city, or listening to music of
    a certain style or mood.
    Providing recommendations in these domains poses unique chal-
    lenges that until recently have been mainly tackled by applying
    conventional recommender algorithms [10] on either the last inter-
    action or the last session (session-based recommenders). Recurrent
    Neural Networks (RNNs) have been recently used for the purpose
    of session-based recommendations [7] outperforming item-based
    Permission to make digital or hard copies of all or part of this work for personal or
    classroom use is granted without fee provided that copies are not made or distributed
    for prot or commercial advantage and that copies bear this notice and the full citation
    on the rst page. Copyrights for components of this work owned by others than ACM
    must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
    to post on servers or to redistribute to lists, requires prior specic permission and/or a
    fee. Request permissions from [email protected].
    RecSys’17, August 27–31, 2017, Como, Italy.
    © 2017 ACM. 978-1-4503-4652-8/17/08...$15.00
    DOI: http://dx.doi.org/10.1145/3109859.3109896
    methods by 15% to 30% in terms of ranking metrics. In session-based
    recommenders, recommendations are provided based solely on the
    interactions in the current user session, as user are assumed to be
    anonymous. But in many of these systems there are cases where
    a user might be logged-in (e.g. music streaming services) or some
    form of user identier might be present (cookie or other identier).
    In these cases it is reasonable to assume that the user behavior
    in past sessions might provide valuable information for providing
    recommendations in the next session.
    A simple way of incorporating past user session information in
    session-based algorithm would be to simply concatenate past and
    current user sessions. While this seems like a reasonable approach,
    we will see in the experimental section that this does not yield the
    best results.
    In this work we describe a novel algorithm based on RNNs that
    can deal with both cases: (i) session-aware recommenders, when user
    identiers are present and propagate information from the previ-
    ous user session to the next, thus improving the recommendation
    accuracy, and (ii) session-based recommenders, when there are no
    past sessions (i.e., no user identiers). The algorithm is based on a
    Hierarchical RNN where the hidden state of a lower-level RNN at
    the end of one user session is passed as an input to a higher-level
    RNN which aims at predicting a good initialization (i.e., a good
    context vector) for the hidden state of the lower RNN for the next
    session of the user.
    We evaluate the Hierarchical RNNs on two datasets from in-
    dustry comparing them to the plain session-based RNN and to
    item-based collaborative ltering. Hierarchical RNNs outperform
    both alternatives by a healthy margin.
    2 RELATED WORK
    Session-based recommendations. Classical CF methods (e.g. ma-
    trix factorization) break down in the session-based setting when no
    user prole can be constructed from past user behavior. A natural
    solution to this problem is the item-to-item recommendation ap-
    proach [11, 16]. In this setting an item-to-item similarity matrix is
    precomputed from the available session data, items that are often
    clicked together in sessions are deemed to be similar. These similar-
    ities are then used to create recommendations. While simple, this
    method has been proven to be eective and is widely employed.
    Though, these methods only take into account the last click of the
    user, in eect ignoring the information of the previous clicks.
    arXiv:1706.04148v5 [cs.LG] 23 Aug 2017
    s
    1
    s
    2
    i
    2,4
    i
    1,3
    c
    2
    c
    0
    c
    1
    user representation
    propagation
    i
    2,3
    i
    2,1
    i
    2,2
    prediction i
    2,5
    i
    2,4
    i
    2,2
    i
    2,3
    input
    item id
    i
    1,4
    i
    1,2
    i
    1,3
    user-level
    representation
    session-level
    representation
    session
    initialization
    i
    1,1
    i
    1,2
    s
    1,0
    Figure 1: Graphical representation of the proposed Hierarchical RNN model for personalized session-based recommendation.
    The model is composed of an hierarchy of two GRUs, the session-level GRU (GRUses ) and the user-level GRU (GRUusr ). The
    session-level GRU models the user activity within sessions and generates recommendations. The user-level GRU models the
    evolution of the user across sessions and provides personalization capabilities to the session-level GRU by initializing its
    hidden state and, optionally, by propagating the user representation in input.
    way, the user-level GRU can track the evolution of the user across
    sessions and, in turn, model the dynamics user interests seamlessly.
    Notice that the user-level representation is kept xed throughout
    the session and it is updated only when the session ends.
    The user-level representation is then used to initialize the hidden
    state of the session-level GRU. Given cm, the initial hidden state
    sm+1,0 of the session-level GRU for the following session is set to
    sm+1,0 = tanh (Winitcm + binit ) (4)
    where Winit and binit are the initialization weights and biases
    respectively. In this way, the information relative to the preferences
    expressed by the user in the previous sessions is transferred to
    the session-level. Session-level representations are then updated as
    follows
    training) how user sessions evolve during time. We will see in the
    experimental section that this is crucial in achieving increased per-
    formance. In eectGRUusr computes and evolves a user prole that
    is based on the previous user sessions, thus in eect personalizing
    the GRUses . In the original RNN, users who had clicked/interacted
    with the same sequence of items in a session would get the same
    recommendations; in HRNN this is not anymore the case, recom-
    mendations will be inuenced by the the users past sessions as
    well.
    In summary, we considered the following two dierent HRNN
    settings, depending on whether the user representation cm is con-
    sidered in Equation 5:
    • HRNN Init, in which cm is used only to initialize the repre-
    sentation of the next session.
    https://arxiv.org/abs/1706.04148

    View Slide

  41. Large Parameter Space ‒ HPO and AutoML
    SIMS DeepFM
    HRNN
    Automatic within algorithm
    parameter tuning (HPO)
    Automatic algorithm selection
    (AutoML)
    (time decay) (depth, size)
    (depth, size, height)

    View Slide

  42. Recommendation: get_personalized_ranking()
    userId and inputList from
    data you used to train the
    solution
    Ranked results, first
    item matters most

    View Slide

  43. Amazon Personalize & Customer Success Story
    https://www.youtube.com/watch?v=9sexYAHHjxE

    View Slide

  44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  45. Amazon Personalize examples & notebooks
    https://github.com/aws-samples/amazon-personalize-samples

    View Slide

  46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  47. Takeaways
    • Forecasting and personalization can help improve your business
    efficiency
    • Amazon Forecast provides accurate time-series forecasting
    • Amazon Personalize provides real-time personalization and
    recommendations
    • They are both based on the same technology used at Amazon.com and
    don’t require machine learning expertise to be used

    View Slide

  48. References

    View Slide

  49. Links
    • Blogs
    • https://aws.amazon.com/blogs/aws/amazon-forecast-time-series-forecasting-made-easy/
    • https://aws.amazon.com/blogs/aws/amazon-forecast-now-generally-available/
    • https://aws.amazon.com/blogs/aws/amazon-personalize-real-time-personalization-and-
    recommendation-for-everyone/
    • https://aws.amazon.com/blogs/aws/amazon-personalize-is-now-generally-available/
    • Examples & Notebooks
    • https://github.com/aws-samples/amazon-forecast-samples
    • https://github.com/aws-samples/amazon-personalize-samples

    View Slide

  50. Links
    • Training algorithms
    • DeepAR – https://arxiv.org/abs/1704.04110
    • HRNN – https://arxiv.org/abs/1706.04148
    • Evaluating performance of a trained model
    • https://en.wikipedia.org/wiki/Mean_absolute_percentage_error (MAPE)
    • https://en.wikipedia.org/wiki/Quantile_regression
    • https://en.wikipedia.org/wiki/Mean_reciprocal_rank
    • https://en.wikipedia.org/wiki/Discounted_cumulative_gain

    View Slide

  51. Thank you!
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Frank Munz
    Sr Technical Evangelist
    Twitter: @frankmunz
    Blog: https://medium.com/@frank.munz

    View Slide