$30 off During Our Annual Pro Sale. View Details »

TFX: A tensor flow-based production-scale machine learning platform

TFX: A tensor flow-based production-scale machine learning platform

TensorFlow Extended (TFX), a TensorFlowbased
general-purpose machine learning platform implemented
at Google.
We present about TFX white paper at KDD17 in Internal Seminar at Mercari inc.

Shunya Ueta

March 28, 2018
Tweet

More Decks by Shunya Ueta

Other Decks in Programming

Transcript

  1. TFX: A TensorFlow-Based Production-Scale
    Machine Learning Platform
    KDD2017
    Shunya Ueta (@hurutoriya) 2018-03-28 @mercari

    View Slide

  2. Abstract
    Issue : “This becomes particularly challenging when data changes over time and
    fresh models need to be produced continuously. Unfortunately, such orchestration is
    often done ad hoc using glue code and custom scripts developed by individual
    teams for specific use cases, leading to duplicated effort and fragile systems with
    high technical debt. ”
    1. Reduce the time to production from the order of months to weeks
    2. TFX in the Google Play app store, reduced custom code, faster experiment
    cycles, and a 2% increase in app installs
    KDD Link here, KDD Video here, Paper Link here, Author Demo Video here .

    View Slide

  3. Machine Learning has High Technical Debt
    The conceptual workflow of applying machine learning is simple but actual workflow
    becomes more complex.
    1. Building one machine learning platform for many different
    learning tasks
    2. Continuous training and serving
    3. Human-in-the-loop
    4. Production-level reliability and scalability

    View Slide

  4. Hidden Technical Debt in Machine Learning Systems
    Ref : Hidden Technical Debt in Machine Learning Systems, NIPS'15

    View Slide

  5. What’s Contribution?
    ● We present a case study of deploying the platform in Google Play, a
    commercial mobile app store with over one billion active users and over one
    million apps.
    ● We show best practices for machine learning platforms in a diverse set of
    contexts and are thus of general interest to researchers and practitioners in the
    field.

    View Slide

  6. Machine Learning Platform Design
    1. One machine learning platform for many learning tasks.
    a. Linear, Deep, Linear and Deep combined, Tree-based, Sequential, Multi-tower, Multi-head, etc.
    2. Continuous training.
    3. Easy-to-use configuration and tools.
    4. Production-level reliability and scalability.

    View Slide

  7. High-level component overview of a machine learning platform.

    View Slide

  8. High-level component overview of a machine learning platform.
    Focus

    View Slide

  9. DATA ANALYSIS, TRANSFORMATION, AND VALIDATION
    ● Small bugs in the data can significantly degrade model quality over a period of
    time in a way that is hard to detect and diagnose.
    ● Component needs to support a wide range of data-analysis and validation
    cases that correspond to machine learning applications. e.g. NaN Trap
    Several teams say this loo
    ● Data Analysis →Data Transformation→Data Validation

    View Slide

  10. Sample validation of an example | DATA ANALYSIS
    1. Training Data
    2. Data Scheme
    3. Data Validation
    1. the appearance of a
    new value
    2. needs to be fixed

    View Slide

  11. Model Training
    One of the core design philosophies of TFX
    is to streamline (and automate as much as possible)
    the process of training production quality
    models which can support all training use cases.
    Example Code [22]
    TensorFlow Estimators: Managing Simplicity vs.
    Flexibility in High-Level Machine Learning Frameworks

    View Slide

  12. MODEL EVALUATION AND VALIDATION
    ● Defining a “good” model
    ○ Model is safe to serve, and that it has the desired prediction quality.
    ● Evaluation: human-facing metrics of model quality
    ○ A/B experiments on live traffic on relevant business metrics.
    ● Validation: machine-facing judgment of model goodness
    ○ We evaluate prediction quality by comparing the model quality against a fixed threshold as well
    as against a baseline model
    ● Slicing: subset of the data containing certain features
    ● User Attitudes towards Validation
    ○ No product teams actively requested the validation function when the component was first built,
    ○ However, encountering a real issue in production which could have been prevented by validation
    made the value of the validation apparent to the teams

    View Slide

  13. MODEL SERVING
    ● TensorFlow Serving (2016/02~) Link here
    ○ model is safe to serve, and that it has the desired prediction quality.

    View Slide

  14. Multitenancy with Isolation
    ● TensorFlow Serving (2017/02~)
    ○ Latest Innovations in TensorFlow Serving : here
    ○ Multi-model serving
    ○ “We recently launched a 1TB+ model in production with good results, and
    hope to open-source this capability soon.”
    ● inference request latency ∼500 to ∼1500 msec. → f ∼75 to ∼150 msec.

    View Slide

  15. CASE STUDY : GOOGLE PLAY
    ● The recommender system for Google Play
    ○ The corpus contains over a million apps
    ■ First step in this system is retrieval, which returns a short list of apps
    based on various signals.
    ■ serve thousands of queries per second with a strict latency
    requirement of tens of milliseconds.

    View Slide

  16. CASE STUDY : GOOGLE PLAY
    ● The recommender system for Google Play
    ○ The corpus contains over a million apps
    ■ First step in this system is retrieval, which returns a short list of apps
    based on various signals.
    ■ serve thousands of queries per second with a strict latency
    requirement of tens of milliseconds.

    View Slide

  17. CASE STUDY : GOOGLE PLAY
    ● The data validation and analysis component helped in discovering a harmful
    training-serving feature skew.
    ○ The results of an online A/B experiment showed that removing this skew improved the app
    install rate on the main landing page of the app store by 2%.
    ● Warm-starting helped improve model quality and freshness while reducing the
    time and resources spent on training over hundreds of billions of examples.

    View Slide