Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(DRY) Python Stories & Machine Learning

(DRY) Python Stories & Machine Learning

Николай Фоминых (S7 Techlab / Software Engineer) @ Moscow Python Meetup 71
"В настоящий момент существуют десятки фреймворков для машинного обучения. Более того, моделировать можно в разных средах. Как не терять результаты своей работы? Как абстрагировать ML pipelines? Об этом и хочу вам рассказать".
Видео: http://www.moscowpython.ru/meetup/71/dry-python-and-ml/

Moscow Python Meetup
PRO

December 26, 2019
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. DRY Python Stories &
    Machine Learning
    Nikolay Fominykh @ S7 Techlab

    View Slide

  2. План выступления
    ● Проблема
    ● Как согласовать интерфейс с аналитиком
    ● Fast Research and Stable Production
    ● Как абстрагироваться от среды исполнения

    View Slide

  3. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> ???:
    ...

    View Slide

  4. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> ???:
    ...

    View Slide

  5. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> ???:
    ...

    View Slide

  6. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    ->
    ???:
    ...

    View Slide

  7. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> Union[Model, None]:
    ...

    View Slide

  8. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> Union[Model, None]:
    ...

    View Slide

  9. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    ->
    Predict ??!!!:
    ...

    View Slide

  10. Problem Statement
    def h4229d691b0(self, df: pandas.DataFrame)
    -> Predict:
    ...

    View Slide

  11. Problem Statement
    def h4229d691b0(self, df: pandas.DataFrame)
    -> Predict:
    ...

    View Slide

  12. Problem Statement
    def dance(self, df: pandas.DataFrame)
    -> Predict:
    ...

    View Slide

  13. def dance -> Predict:
    ==
    def fit_model -> Predict:

    View Slide

  14. “One of our difficulties will be the maintenance of
    an appropriate discipline, so that we do not lose
    track of what we are doing.”
    —Alan Turing, 1946, Lecture to the London Mathematical Society (AMT/C/32,
    p18, §2)

    View Slide

  15. Fit Model Done Right
    В идеале fit model:
    ● Обучает модель на подготовленных признаках
    ● Сохраняет обученную модель
    >>> Fit(model=Model(), storage=FileStorage()).run(df)
    ‘Your awesome model saved to /models/logreg_26_12_2019.pkl’

    View Slide

  16. Fit Model Story
    @attr.s
    class Fit(object):
    storage: IStorage = attr.ib()
    model: IModel = attr.ib()
    @arguments('train_data_frame')
    @story
    def run(I):
    I.fit
    I.save_model

    View Slide

  17. Fit Model Story - Possible Mistake
    @attr.s
    class Fit(object):
    @arguments('train_data_frame', 'storage', 'model')
    @story
    def run(I):
    I.fit
    I.save_model

    View Slide

  18. Fit Model Story - Possible Mistake
    @attr.s
    class Fit(object):
    @arguments('train_data_frame', 'storage', 'model')
    @story
    def run(I):
    I.fit
    I.save_model

    View Slide

  19. @arguments nightmare in the wild
    @arguments - используются только для бизнес-аргументов.
    Иначе...
    config: MLConfig = Configure(
    factors_config_path=ctx.factors_config_path,
    ).run(
    direction=ctx.direction, target_profiles=ctx.target_profiles,
    model_info=ctx.model_info, start_date=ctx.start_date,
    end_date=ctx.end_date, model_class=ctx.model_class,
    fit=ctx.fit, directions_data=ctx.directions_data,
    object_data_manager=ctx.object_data_manager, model_group_field=ctx.model_group_field,
    version=ctx.version, directions_model_path=ctx.directions_model_path, validator=ctx.validator,
    )
    … аргументов станет слишком много

    View Slide

  20. Оформляем ML Experiment полностью
    @attr.s
    class ExperimentForAircraft(object):
    fit = attr.ib()
    predict = attr.ib()
    prepare = attr.ib()
    @arguments('aircraft', 'start_date', 'end_date')
    @story
    def run(I):
    I.prepare
    I.fit
    I.predict

    View Slide

  21. ML Experiment - план выполнения
    >>> ExperimentForAircraft(...).run
    ExperimentForAircraft.run
    prepare (Prepare.run)
    get_features
    get_target
    merge_features_and_targets
    split_train_test
    fit (Fit.run)
    fit
    save_model
    predict (Predict.run)
    load_model
    predict

    View Slide

  22. ML Experiment - запускаем
    experiment = ExperimentForAircraft(
    prepare=Prepare(source=Database()).run,
    fit=Fit(model=Model(), storage=ModelStorage()).run,
    predict=Predict(model=Model(),
    storage=ModelStorage()).run,
    )
    experiment.run(aircraft='VP-BBQ', start_date='20.12.2018',
    end_date='01.01.2020',)

    View Slide

  23. ML Experiment - запускаем через DI
    class ExperimentRunner(Injector):
    storage = ModelStorage
    model = Model
    source = Database
    fit_story = Fit
    fit = this.fit_story.run
    predict_story = Predict
    predict = this.predict_story.run
    prepare_story = Prepare
    prepare = this.prepare_story.run
    experiment = ExperimentForAircraft
    ExperimentRunner.experiment.run(aircraft='VP-BBQ', start_date='20.12.2018', end_date='01.01.2020',)

    View Slide

  24. Python Dependency Injection Libraries
    В современном python для DI применяются:
    ● Google Pinject ( https://github.com/google/pinject )
    ● DRY Python Dependencies ( https://github.com/dry-python/dependencies )
    ● Punq ( https://github.com/bobthemighty/punq )
    ● Python-inject ( https://github.com/ivankorobkov/python-inject )

    View Slide

  25. Research => Production. Train Story
    @attr.s
    class FitForAircraft(object):
    prepare = attr.ib()
    fit = attr.ib()
    @arguments('aircraft', 'start_date', 'end_date')
    @story
    def run(I):
    I.prepare
    I.fit

    View Slide

  26. Research => Production. Train Runner
    class FitRunner(Injector):
    storage = ModelStorage
    model = Model
    source = Database
    fit_story = Fit
    fit = this.fit_story.run
    prepare_story = Prepare
    prepare = this.prepare_story.run
    background_fit = FitForAircraft
    FitRunner.background_fit.run(aircraft='VP-BBQ', start_date='20.12.2018',
    end_date='01.01.2020',)

    View Slide

  27. Research => Production. Summary
    Важно использовать одинаковые
    вспомогательные истории для
    research и production

    View Slide

  28. Абстрагируемся от среды исполнения
    Stories легко встраиваются в:
    ● Jupyter Notebook
    ● Apache Airflow
    ● Netflix Metaflow

    View Slide

  29. Stories & Airflow
    dag = DAG(dag_id='machine_learning_story', schedule_interval=None)
    def aircraft_fit(aircraft, start_date, end_date):
    FitRunner.background_fit.run(aircraft=aircraft, start_date=start_date,
    end_date=end_date,)
    task = PythonOperator(
    task_id='aircraft_fit',
    python_callable=aircraft_fit,
    op_kwargs={'aircraft': 'VP-BBQ', 'start_date': 'VP-BBQ','end_date': 'VP-BBQ', },
    dag=dag)

    View Slide

  30. Stories & MetaFlow
    class StoryFlow(FlowSpec):
    @retry
    @step
    def start(self):
    ExperimentRunner.experiment.run(aircraft='VP-BBQ',
    start_date='20.12.2018', end_date='01.01.2020',)
    self.next(self.end)
    @step
    def end(self):
    pass

    View Slide

  31. Thanks!

    View Slide