Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML Integrations on Application Development

227382dbd5e033db211c159edf32853c?s=47 Hacarus Inc.
September 19, 2019

ML Integrations on Application Development

Presentation Slides at 如何開發可解釋的AI


Hacarus Inc.

September 19, 2019


  1. ML Integrations on Application Development

  2. Call me Ninz! - Software Engineer @ Hacarus - Studying

    MSCS @ Ateneo de Manila University - Music enthusiast - Physics and astronomy is <3 - I love data! - Curious, always.
  3. Let’s begin! • Introduction • Integration Challenges • Architectural and

    Development Approach
  4. VS

  5. • Product delivery • Code quality • Maintainability • Architecture/Boiler

    plates • User oriented • App development • Product delivery • Model accuracy • Visualization of data • Notebooks • Data-driven • Mathematical computations
  6. The software engineer • Code from data scientist ARE SOMETIMES

    not efficient and clean. • Models are based on limited training and testing data. • Data formats, input and output. • D O C U M E N T A T I O N !
  7. The Challenges - Making sure code is efficient and maintainable

    - Resource limitations - Code quality - Error handling - Data integrity for both input and results - Proper feedback loop for data scientists and developers
  8. Efficiency and Maintainability This crucial for both Machine Learning and

    Application development side. Usually the focus are the following: - Resource Limitations - Code Quality - Error Handling
  9. Resource Limitations - Data scientists usually work on environments with

    limited resources. - Good for creating and verifying models.
  10. Resource Limitations - Real world applications should scale depending on

    user demands - Scale with right amount of resource. - Some applications can have specific memory and resource constraint - Software developers should cater to both
  11. Code Quality Remember: Data scientists are NOT software developers.

  12. Code Quality - Don’t expect 100% code quality - The

    quality of the codebase falls into software developers - Type hints are very useful - Tools that improve code readability are highly encourage.
  13. Error Handling model.fit() model.predict() Codes coming from data scientists are

    usually abstracted and high level. Data scientists and application developers must agree on how to handle errors.
  14. Data Integrity - Pre-processing of data inputs - Consistency between

    expected inputs and outputs - Making sure the right results are displayed on the application side - Making sure the right data is passed to the machine learning side
  15. Feedback loop - DOCUMENT as many things as you can

    - Agree on implementation key points such as - Release versions and deployment - Data pipelines - Validation, etc - Regular meetings with data science team is a must!
  16. Solutions and Approach (what we’ve learned in our team)

  17. Proper Resource Handling - Memory vs CPU - CPU needed

    for training - Memory needed for model storage - Consider the kind of algorithm the model uses - Sparse modeling usually performs well in smaller resource setup
  18. Proper Resource Handling - Type of deployment - Mobile? -

    Cloud? - Local? - Multi Threading VS Multiprocessing - Usually have a thin layer of python interface between.
  19. Code Quality - Data scientists and Application team should use

    linters and automatic code formatting tools. - Agree on conventions on function definitions and interfaces. - Code reviews - Use Type Hints and other tools that IDEs utilize
  20. Sample Interface from abc import ABC, abstractmethod class ModelInterface(ABC): @abstractmethod

    def fit(self, X: Iterable[FeatureData], y: Iterable[LabelData]) -> None: # Throw error when not implemented raise NotImplementedError() @abstractmethod def fit(self, X: Iterable[FeatureData]) -> Iterable[LabelData]: # Throw error when not implemented raise NotImplementedError()
  21. Error Handling - Standardize Errors - Meaningful errors - Warnings

    vs Errors vs Fatal Errors - Continuous integration and automated tests
  22. Specific Errors # List errors thrown by models class NotLatestVersionException(Exception):

    """Version mismatch for algorithm""" class NotFittedException(Exception): """Model is not fitted""" class DataSizeException(Exception): """Invalid data size for training""" class NoTrainDataException(Exception): """No training data""" - Errors are clear and descriptive - Case to case basis
  23. Error Handling Continuous integration and automated tests are important in

    making sure errors are handled right away.
  24. Data Integrity - Create data classes for strict type implementation

    - Pre processing should be atomic in nature. - Single operation per data only - Data output and results must be stored as granular as possible
  25. Data Class class TraningData: """TraningData represents labeled dataset""" def __init__(self,

    data: Iterable, label: Label = None, metadata=None) -> None: """ Parameters ---------- data : Iterable, shape, matrix) label : Label.GOOD | Label.BAD metadata : other info """ self.data = data self.label = label self.metadata = metadata
  26. Data Integrity: Atomic Operations Data Cleaning Feature Reduction Principal Component

    Analysis Training/Prediction
  27. Data Integrity: Granularity of Data Results Raw Image Annotations Black

    and White Image Processed Image Raw Image with result overlay
  28. Feedback Loop Without proper feedback loop and communication, it is

    very difficult to work with machine learning developers and data scientist.
  29. Feedback Loop Proactive Documentation - When app developers notice something

    missing, we inform data science team right away - Documentation in advance even if the feature is still being developed
  30. Feedback Loop Version Handling - ML libraries and applications uses

    different versioning - One application might use a different version of the ML
  31. Feedback Loop Deployment - Software developers must give capability to

    ML team to deploy new versions of algorithms - Deployment must be reversible and backwards compatible
  32. Feedback Loop Team Building Activities (for nerdy people) - Kaggle

    Challenge - Software engineers doing ML exercises with data scientists and vice versa. - Solves online challenges, etc. - Makes it easier to align with ML team.
  33. Questions