Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(DRY) Python Stories & Machine Learning

(DRY) Python Stories & Machine Learning

Николай Фоминых (S7 Techlab / Software Engineer) @ Moscow Python Meetup 71
"В настоящий момент существуют десятки фреймворков для машинного обучения. Более того, моделировать можно в разных средах. Как не терять результаты своей работы? Как абстрагировать ML pipelines? Об этом и хочу вам рассказать".
Видео: http://www.moscowpython.ru/meetup/71/dry-python-and-ml/

Moscow Python Meetup

December 26, 2019
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. DRY Python Stories &
    Machine Learning
    Nikolay Fominykh @ S7 Techlab

    View full-size slide

  2. План выступления
    ● Проблема
    ● Как согласовать интерфейс с аналитиком
    ● Fast Research and Stable Production
    ● Как абстрагироваться от среды исполнения

    View full-size slide

  3. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> ???:
    ...

    View full-size slide

  4. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> ???:
    ...

    View full-size slide

  5. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> ???:
    ...

    View full-size slide

  6. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    ->
    ???:
    ...

    View full-size slide

  7. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> Union[Model, None]:
    ...

    View full-size slide

  8. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    -> Union[Model, None]:
    ...

    View full-size slide

  9. Problem Statement
    def fit_model(self, df: pandas.DataFrame)
    ->
    Predict ??!!!:
    ...

    View full-size slide

  10. Problem Statement
    def h4229d691b0(self, df: pandas.DataFrame)
    -> Predict:
    ...

    View full-size slide

  11. Problem Statement
    def h4229d691b0(self, df: pandas.DataFrame)
    -> Predict:
    ...

    View full-size slide

  12. Problem Statement
    def dance(self, df: pandas.DataFrame)
    -> Predict:
    ...

    View full-size slide

  13. def dance -> Predict:
    ==
    def fit_model -> Predict:

    View full-size slide

  14. “One of our difficulties will be the maintenance of
    an appropriate discipline, so that we do not lose
    track of what we are doing.”
    —Alan Turing, 1946, Lecture to the London Mathematical Society (AMT/C/32,
    p18, §2)

    View full-size slide

  15. Fit Model Done Right
    В идеале fit model:
    ● Обучает модель на подготовленных признаках
    ● Сохраняет обученную модель
    >>> Fit(model=Model(), storage=FileStorage()).run(df)
    ‘Your awesome model saved to /models/logreg_26_12_2019.pkl’

    View full-size slide

  16. Fit Model Story
    @attr.s
    class Fit(object):
    storage: IStorage = attr.ib()
    model: IModel = attr.ib()
    @arguments('train_data_frame')
    @story
    def run(I):
    I.fit
    I.save_model

    View full-size slide

  17. Fit Model Story - Possible Mistake
    @attr.s
    class Fit(object):
    @arguments('train_data_frame', 'storage', 'model')
    @story
    def run(I):
    I.fit
    I.save_model

    View full-size slide

  18. Fit Model Story - Possible Mistake
    @attr.s
    class Fit(object):
    @arguments('train_data_frame', 'storage', 'model')
    @story
    def run(I):
    I.fit
    I.save_model

    View full-size slide

  19. @arguments nightmare in the wild
    @arguments - используются только для бизнес-аргументов.
    Иначе...
    config: MLConfig = Configure(
    factors_config_path=ctx.factors_config_path,
    ).run(
    direction=ctx.direction, target_profiles=ctx.target_profiles,
    model_info=ctx.model_info, start_date=ctx.start_date,
    end_date=ctx.end_date, model_class=ctx.model_class,
    fit=ctx.fit, directions_data=ctx.directions_data,
    object_data_manager=ctx.object_data_manager, model_group_field=ctx.model_group_field,
    version=ctx.version, directions_model_path=ctx.directions_model_path, validator=ctx.validator,
    )
    … аргументов станет слишком много

    View full-size slide

  20. Оформляем ML Experiment полностью
    @attr.s
    class ExperimentForAircraft(object):
    fit = attr.ib()
    predict = attr.ib()
    prepare = attr.ib()
    @arguments('aircraft', 'start_date', 'end_date')
    @story
    def run(I):
    I.prepare
    I.fit
    I.predict

    View full-size slide

  21. ML Experiment - план выполнения
    >>> ExperimentForAircraft(...).run
    ExperimentForAircraft.run
    prepare (Prepare.run)
    get_features
    get_target
    merge_features_and_targets
    split_train_test
    fit (Fit.run)
    fit
    save_model
    predict (Predict.run)
    load_model
    predict

    View full-size slide

  22. ML Experiment - запускаем
    experiment = ExperimentForAircraft(
    prepare=Prepare(source=Database()).run,
    fit=Fit(model=Model(), storage=ModelStorage()).run,
    predict=Predict(model=Model(),
    storage=ModelStorage()).run,
    )
    experiment.run(aircraft='VP-BBQ', start_date='20.12.2018',
    end_date='01.01.2020',)

    View full-size slide

  23. ML Experiment - запускаем через DI
    class ExperimentRunner(Injector):
    storage = ModelStorage
    model = Model
    source = Database
    fit_story = Fit
    fit = this.fit_story.run
    predict_story = Predict
    predict = this.predict_story.run
    prepare_story = Prepare
    prepare = this.prepare_story.run
    experiment = ExperimentForAircraft
    ExperimentRunner.experiment.run(aircraft='VP-BBQ', start_date='20.12.2018', end_date='01.01.2020',)

    View full-size slide

  24. Python Dependency Injection Libraries
    В современном python для DI применяются:
    ● Google Pinject ( https://github.com/google/pinject )
    ● DRY Python Dependencies ( https://github.com/dry-python/dependencies )
    ● Punq ( https://github.com/bobthemighty/punq )
    ● Python-inject ( https://github.com/ivankorobkov/python-inject )

    View full-size slide

  25. Research => Production. Train Story
    @attr.s
    class FitForAircraft(object):
    prepare = attr.ib()
    fit = attr.ib()
    @arguments('aircraft', 'start_date', 'end_date')
    @story
    def run(I):
    I.prepare
    I.fit

    View full-size slide

  26. Research => Production. Train Runner
    class FitRunner(Injector):
    storage = ModelStorage
    model = Model
    source = Database
    fit_story = Fit
    fit = this.fit_story.run
    prepare_story = Prepare
    prepare = this.prepare_story.run
    background_fit = FitForAircraft
    FitRunner.background_fit.run(aircraft='VP-BBQ', start_date='20.12.2018',
    end_date='01.01.2020',)

    View full-size slide

  27. Research => Production. Summary
    Важно использовать одинаковые
    вспомогательные истории для
    research и production

    View full-size slide

  28. Абстрагируемся от среды исполнения
    Stories легко встраиваются в:
    ● Jupyter Notebook
    ● Apache Airflow
    ● Netflix Metaflow

    View full-size slide

  29. Stories & Airflow
    dag = DAG(dag_id='machine_learning_story', schedule_interval=None)
    def aircraft_fit(aircraft, start_date, end_date):
    FitRunner.background_fit.run(aircraft=aircraft, start_date=start_date,
    end_date=end_date,)
    task = PythonOperator(
    task_id='aircraft_fit',
    python_callable=aircraft_fit,
    op_kwargs={'aircraft': 'VP-BBQ', 'start_date': 'VP-BBQ','end_date': 'VP-BBQ', },
    dag=dag)

    View full-size slide

  30. Stories & MetaFlow
    class StoryFlow(FlowSpec):
    @retry
    @step
    def start(self):
    ExperimentRunner.experiment.run(aircraft='VP-BBQ',
    start_date='20.12.2018', end_date='01.01.2020',)
    self.next(self.end)
    @step
    def end(self):
    pass

    View full-size slide