Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(DRY) Python Stories & Machine Learning

(DRY) Python Stories & Machine Learning

Николай Фоминых (S7 Techlab / Software Engineer) @ Moscow Python Meetup 71
"В настоящий момент существуют десятки фреймворков для машинного обучения. Более того, моделировать можно в разных средах. Как не терять результаты своей работы? Как абстрагировать ML pipelines? Об этом и хочу вам рассказать".
Видео: http://www.moscowpython.ru/meetup/71/dry-python-and-ml/

Moscow Python Meetup

December 26, 2019
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. План выступления • Проблема • Как согласовать интерфейс с аналитиком

    • Fast Research and Stable Production • Как абстрагироваться от среды исполнения
  2. “One of our difficulties will be the maintenance of an

    appropriate discipline, so that we do not lose track of what we are doing.” —Alan Turing, 1946, Lecture to the London Mathematical Society (AMT/C/32, p18, §2)
  3. Fit Model Done Right В идеале fit model: • Обучает

    модель на подготовленных признаках • Сохраняет обученную модель >>> Fit(model=Model(), storage=FileStorage()).run(df) ‘Your awesome model saved to /models/logreg_26_12_2019.pkl’
  4. Fit Model Story @attr.s class Fit(object): storage: IStorage = attr.ib()

    model: IModel = attr.ib() @arguments('train_data_frame') @story def run(I): I.fit I.save_model
  5. @arguments nightmare in the wild @arguments - используются только для

    бизнес-аргументов. Иначе... config: MLConfig = Configure( factors_config_path=ctx.factors_config_path, ).run( direction=ctx.direction, target_profiles=ctx.target_profiles, model_info=ctx.model_info, start_date=ctx.start_date, end_date=ctx.end_date, model_class=ctx.model_class, fit=ctx.fit, directions_data=ctx.directions_data, object_data_manager=ctx.object_data_manager, model_group_field=ctx.model_group_field, version=ctx.version, directions_model_path=ctx.directions_model_path, validator=ctx.validator, ) … аргументов станет слишком много
  6. Оформляем ML Experiment полностью @attr.s class ExperimentForAircraft(object): fit = attr.ib()

    predict = attr.ib() prepare = attr.ib() @arguments('aircraft', 'start_date', 'end_date') @story def run(I): I.prepare I.fit I.predict
  7. ML Experiment - план выполнения >>> ExperimentForAircraft(...).run ExperimentForAircraft.run prepare (Prepare.run)

    get_features get_target merge_features_and_targets split_train_test fit (Fit.run) fit save_model predict (Predict.run) load_model predict
  8. ML Experiment - запускаем experiment = ExperimentForAircraft( prepare=Prepare(source=Database()).run, fit=Fit(model=Model(), storage=ModelStorage()).run,

    predict=Predict(model=Model(), storage=ModelStorage()).run, ) experiment.run(aircraft='VP-BBQ', start_date='20.12.2018', end_date='01.01.2020',)
  9. ML Experiment - запускаем через DI class ExperimentRunner(Injector): storage =

    ModelStorage model = Model source = Database fit_story = Fit fit = this.fit_story.run predict_story = Predict predict = this.predict_story.run prepare_story = Prepare prepare = this.prepare_story.run experiment = ExperimentForAircraft ExperimentRunner.experiment.run(aircraft='VP-BBQ', start_date='20.12.2018', end_date='01.01.2020',)
  10. Python Dependency Injection Libraries В современном python для DI применяются:

    • Google Pinject ( https://github.com/google/pinject ) • DRY Python Dependencies ( https://github.com/dry-python/dependencies ) • Punq ( https://github.com/bobthemighty/punq ) • Python-inject ( https://github.com/ivankorobkov/python-inject )
  11. Research => Production. Train Story @attr.s class FitForAircraft(object): prepare =

    attr.ib() fit = attr.ib() @arguments('aircraft', 'start_date', 'end_date') @story def run(I): I.prepare I.fit
  12. Research => Production. Train Runner class FitRunner(Injector): storage = ModelStorage

    model = Model source = Database fit_story = Fit fit = this.fit_story.run prepare_story = Prepare prepare = this.prepare_story.run background_fit = FitForAircraft FitRunner.background_fit.run(aircraft='VP-BBQ', start_date='20.12.2018', end_date='01.01.2020',)
  13. Stories & Airflow dag = DAG(dag_id='machine_learning_story', schedule_interval=None) def aircraft_fit(aircraft, start_date,

    end_date): FitRunner.background_fit.run(aircraft=aircraft, start_date=start_date, end_date=end_date,) task = PythonOperator( task_id='aircraft_fit', python_callable=aircraft_fit, op_kwargs={'aircraft': 'VP-BBQ', 'start_date': 'VP-BBQ','end_date': 'VP-BBQ', }, dag=dag)
  14. Stories & MetaFlow class StoryFlow(FlowSpec): @retry @step def start(self): ExperimentRunner.experiment.run(aircraft='VP-BBQ',

    start_date='20.12.2018', end_date='01.01.2020',) self.next(self.end) @step def end(self): pass