Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML Integrations on Application Development

Hacarus Inc.
September 19, 2019

ML Integrations on Application Development

Presentation Slides at 如何開發可解釋的AI
https://www.eventbrite.com/e/ai-aihacarus-tickets-68588170063

Hacarus Inc.

September 19, 2019
Tweet

More Decks by Hacarus Inc.

Other Decks in Technology

Transcript

  1. Call me Ninz! - Software Engineer @ Hacarus - Studying

    MSCS @ Ateneo de Manila University - Music enthusiast - Physics and astronomy is <3 - I love data! - Curious, always.
  2. VS

  3. • Product delivery • Code quality • Maintainability • Architecture/Boiler

    plates • User oriented • App development • Product delivery • Model accuracy • Visualization of data • Notebooks • Data-driven • Mathematical computations
  4. The software engineer • Code from data scientist ARE SOMETIMES

    not efficient and clean. • Models are based on limited training and testing data. • Data formats, input and output. • D O C U M E N T A T I O N !
  5. The Challenges - Making sure code is efficient and maintainable

    - Resource limitations - Code quality - Error handling - Data integrity for both input and results - Proper feedback loop for data scientists and developers
  6. Efficiency and Maintainability This crucial for both Machine Learning and

    Application development side. Usually the focus are the following: - Resource Limitations - Code Quality - Error Handling
  7. Resource Limitations - Data scientists usually work on environments with

    limited resources. - Good for creating and verifying models.
  8. Resource Limitations - Real world applications should scale depending on

    user demands - Scale with right amount of resource. - Some applications can have specific memory and resource constraint - Software developers should cater to both
  9. Code Quality - Don’t expect 100% code quality - The

    quality of the codebase falls into software developers - Type hints are very useful - Tools that improve code readability are highly encourage.
  10. Error Handling model.fit() model.predict() Codes coming from data scientists are

    usually abstracted and high level. Data scientists and application developers must agree on how to handle errors.
  11. Data Integrity - Pre-processing of data inputs - Consistency between

    expected inputs and outputs - Making sure the right results are displayed on the application side - Making sure the right data is passed to the machine learning side
  12. Feedback loop - DOCUMENT as many things as you can

    - Agree on implementation key points such as - Release versions and deployment - Data pipelines - Validation, etc - Regular meetings with data science team is a must!
  13. Proper Resource Handling - Memory vs CPU - CPU needed

    for training - Memory needed for model storage - Consider the kind of algorithm the model uses - Sparse modeling usually performs well in smaller resource setup
  14. Proper Resource Handling - Type of deployment - Mobile? -

    Cloud? - Local? - Multi Threading VS Multiprocessing - Usually have a thin layer of python interface between.
  15. Code Quality - Data scientists and Application team should use

    linters and automatic code formatting tools. - Agree on conventions on function definitions and interfaces. - Code reviews - Use Type Hints and other tools that IDEs utilize
  16. Sample Interface from abc import ABC, abstractmethod class ModelInterface(ABC): @abstractmethod

    def fit(self, X: Iterable[FeatureData], y: Iterable[LabelData]) -> None: # Throw error when not implemented raise NotImplementedError() @abstractmethod def fit(self, X: Iterable[FeatureData]) -> Iterable[LabelData]: # Throw error when not implemented raise NotImplementedError()
  17. Error Handling - Standardize Errors - Meaningful errors - Warnings

    vs Errors vs Fatal Errors - Continuous integration and automated tests
  18. Specific Errors # List errors thrown by models class NotLatestVersionException(Exception):

    """Version mismatch for algorithm""" class NotFittedException(Exception): """Model is not fitted""" class DataSizeException(Exception): """Invalid data size for training""" class NoTrainDataException(Exception): """No training data""" - Errors are clear and descriptive - Case to case basis
  19. Data Integrity - Create data classes for strict type implementation

    - Pre processing should be atomic in nature. - Single operation per data only - Data output and results must be stored as granular as possible
  20. Data Class class TraningData: """TraningData represents labeled dataset""" def __init__(self,

    data: Iterable, label: Label = None, metadata=None) -> None: """ Parameters ---------- data : Iterable, shape, matrix) label : Label.GOOD | Label.BAD metadata : other info """ self.data = data self.label = label self.metadata = metadata
  21. Data Integrity: Granularity of Data Results Raw Image Annotations Black

    and White Image Processed Image Raw Image with result overlay
  22. Feedback Loop Without proper feedback loop and communication, it is

    very difficult to work with machine learning developers and data scientist.
  23. Feedback Loop Proactive Documentation - When app developers notice something

    missing, we inform data science team right away - Documentation in advance even if the feature is still being developed
  24. Feedback Loop Version Handling - ML libraries and applications uses

    different versioning - One application might use a different version of the ML
  25. Feedback Loop Deployment - Software developers must give capability to

    ML team to deploy new versions of algorithms - Deployment must be reversible and backwards compatible
  26. Feedback Loop Team Building Activities (for nerdy people) - Kaggle

    Challenge - Software engineers doing ML exercises with data scientists and vice versa. - Solves online challenges, etc. - Makes it easier to align with ML team.