Workflow Engines Meetup #1 Luigi

3fb7c8738d5838efdea727e15880abd2?s=47 bwtakacy
March 11, 2017

Workflow Engines Meetup #1 Luigi

Workflow Engines Meetup #1での発表資料

https://connpass.com/event/50900/

3fb7c8738d5838efdea727e15880abd2?s=128

bwtakacy

March 11, 2017
Tweet

Transcript

  1. 2.

    ࣗݾ঺հ େ੢ɹߴ࢙ ▸ 2017/10 Recruit Marketing Partnersೖࣾ ▸ લ৬͸SIer ▸

    σʔλ෼ੳج൫ͷ։ൃɾӡ༻ΛϝΠϯʹ୲౰ ▸ @bwtakacy on GitHub, Qiita ▸ ҎԼͰύον౤ߘͨ͠Γɺπʔϧϝϯςφϯεͨ͠Γ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  2. 6.
  3. 10.

    ೔࣍ॲཧ ▸ EmbulkͰ 50+ ͷςʔϒϧΛ࿈ܞ ▸ LuigiͰ 30+ ͷHiveΫΤϦΛ࣮ߦ ▸

    TD Workflow (Digdag) Ͱ 10+ ͷPrestoΫΤϦΛ࣮ߦ ▸ TD্ͷεέδϡʔϧΫΤϦ΋ 10+ ଘࡏ ▸ ෳ਺ͷ෦໳ʹ 20+ ͷϨϙʔτΛఏڙ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  4. 12.

    LUIGI Λ ▸ ෳ਺ͷόονॲཧΛ૊Έ߹ΘͤͨδϣϒΛ੍ޚ ▸ ॲཧͷґଘؔ܎ͷղܾͱεέδϡʔϦϯάʹಛԽ ▸ ॲཧͷΞτϛοΫੑ୲อ ▸ શͯPythonͰهड़

    ▸ ϓϥοτϑΥʔϜʹґଘ͠ͳ͍ͨΊɺ༷ʑͳॲཧ͕Ұݩతʹهड़Մೳ ▸ Hadoop, Hive, Pig, Spark ▸ MySQL, PostgreSQL, SQLAlchemy ▸ Treasure Data, BigQuery, Redshift ▸ SSH, FTP ▸ ͳͲͳͲ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Spotify͕։ൃͨ͠΋ͷ͕OSSԽ͞Εͨ ໊લͷ༝དྷ͸ɺʮੈքͰ̎൪໨ʹ༗໊ͳ഑؅޻ʯ
  5. 14.

    ༻ޠ ▸ Task ▸ ॲཧͷ࣮ମ ▸ ϫʔΫϑϩʔͷ෦඼ ▸ ࣄલʹ࣮ߦ͞Ε͍ͯΔ΂͖TaskΛఆٛͰ͖Δ ▸

    Target ▸ Taskͷਖ਼ৗऴྃΛࣔ͢৘ใ ▸ ϑΝΠϧ on ϩʔΧϧɺ HDFS, RDB, S3 ▸ Paramater ▸ TaskͷҾ਺ͱͯ͠༩͑Δ͜ͱ͕Ͱ͖Δม਺ ▸ ྫɽ೔࣍ॲཧʹ͓͚Δ೔෇ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  6. 15.

    ؆୯ͳྫ import luigi class Foo(luigi.WrapperTask): task_namespace = 'examples' def run(self):

    print("Running Foo") def requires(self): for i in range(10): yield Bar(i) class Bar(luigi.Task): task_namespace = 'examples' num = luigi.IntParameter() def run(self): time.sleep(1) self.output().open('w').close() def output(self): time.sleep(1) return luigi.LocalTarget('/tmp/bar/%d' % self.num) if __name__ == "__main__": luigi.run(['examples.Foo', '--workers', '2', '--local-scheduler']) luigiͷexamples/foo.py (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  7. 16.

    ؆୯ͳྫ import luigi class Foo(luigi.WrapperTask): task_namespace = 'examples' def run(self):

    print("Running Foo") def requires(self): for i in range(10): yield Bar(i) class Bar(luigi.Task): task_namespace = 'examples' num = luigi.IntParameter() def run(self): time.sleep(1) self.output().open('w').close() def output(self): time.sleep(1) return luigi.LocalTarget('/tmp/bar/%d' % self.num) if __name__ == "__main__": luigi.run(['examples.Foo', '--workers', '2', '--local-scheduler']) TASKʹͤ͞Δॲཧ ͜ͷTASKͷ࣮ߦ͕લఏͱ͍ͯ͠Δ TASK TASKͷPARAMETER TASK͕ਖ਼ৗऴྃͨ͜͠ͱΛࣔ ͢TARGETΛੜ੒ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  8. 18.

    Πϯετʔϧͱ࣮ߦ ‣ Πϯετʔϧ $ pip install luigi ‣ ϫʔΫϑϩʔ࣮ߦ $

    luigi --module foo examples.Foo --local-scheduler ‣ ىಈ͢ΔPythonεΫϦϓτ͕sys.path഑Լʹଘࡏ͍ͯ͠Δ͜ͱ ‣ local-scheduler: ίϚϯυ࣮ߦ͝ͱʹεέδϡʔϥ͕ىಈ ‣ εέδϡʔϥϓϩηεΛಠཱ࣮ͯ͠ߦ͓͚ͯ͠͹ෆཁ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  9. 21.

    $ PYTHONPATH='.' luigi --module top_artists AggregateArtists --local-scheduler --date- interval 2012-06

    DEBUG: Checking if AggregateArtists(date_interval=2012-06) is complete DEBUG: Checking if Streams(date=2012-06-01) is complete ʢུʣ DEBUG: Checking if Streams(date=2012-06-30) is complete INFO: Informed scheduler that task AggregateArtists_2012_06_7af15faabf has status PENDING INFO: Done scheduling tasks INFO: Running Worker with 1 processes DEBUG: Asking scheduler for work... DEBUG: Pending tasks: 31 INFO: [pid 3501] Worker Worker(salt=894400700, workers=1, host=oonishitk-MBP.local, username=bwtakacy, pid=3501) running Streams(date=2012-06-21) INFO: [pid 3501] Worker Worker(salt=894400700, workers=1, host=oonishitk-MBP.local, username=bwtakacy, pid=3501) done Streams(date=2012-06-21) DEBUG: 1 running tasks, waiting for next task to finish INFO: Informed scheduler that task Streams_2012_06_21_b21eade591 has status DONE ʢུʣ DEBUG: There are no more tasks to run at this time INFO: Worker Worker(salt=894400700, workers=1, host=oonishitk-MBP.local, username=bwtakacy, pid=3501) was stopped. Shutting down Keep-Alive thread INFO: ===== Luigi Execution Summary ===== Scheduled 31 tasks of which: * 31 ran successfully: - 1 AggregateArtists(date_interval=2012-06) - 30 Streams(date=2012-06-01...2012-06-30) This progress looks :) because there were no failed tasks or missing external dependencies ===== Luigi Execution Summary ===== (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  10. 22.

    TASKͷ࣮ߦ࣌ؒΛ࠾Δํ๏ PROCESSING_TIMEΠϕϯτ ▸ Luigiʹ͸ϏϧτΠϯɾΠϕϯτͱͦΕʹର͢ΔίʔϧόοΫΛొ࿥ ͢Δ͜ͱ͕Ͱ͖Δ࢓૊Έ͕͋Δ ▸ FAILUREΠϕϯτɿTASK͕ࣦഊͨ͠৔߹ʹͷΈൃߦɻ ▸ ΫϦʔϯφοϓॲཧΛߦ͏ɺΤϥʔ௨஌Λߦ͏ɺͳͲʹ࢖͑Δ ▸

    PROCESSING_TIMEΠϕϯτɿTASK͕ਖ਼ৗऴྃͨ͠৔߹ʹͷΈ ൃߦɻTASͷ࣮ߦ࣌ؒʢ୯Ґ͸ඵʣΛ֨ೲ͍ͯ͠Δɻ ▸ ͜Εͩʂʂ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  11. 23.

    ྫ class TimeTaskMixin(object): ''' A mixin that when added to

    a luigi task, will print out the tasks execution time to standard out, when the task is finished ''' @luigi.Task.event_handler(luigi.Event.PROCESSING_TIME) def print_execution_time(self, processing_time): print('### PROCESSING TIME ###: ' + str(processing_time)) https://gist.github.com/samuell/93cc7eb6803fa2790042 (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  12. 24.

    CASE 2. ฒྻ࣮ߦͱίϚϯυ໭Γ஋ ▸ σϑΥϧτͰ͸ɺTASKͷࣦഊ͕ى͖ͯ΋luigiίϚϯυͷ໭ Γ஋͸0 ▸ ໭Γ஋ʹΑΔΤϥʔϋϯυϦϯά͕Ͱ͖ͳ͍ ▸ ղܾࡦ

    ▸ ઃఆϑΝΠϧʹͯretcodeઃఆΛߦ͍ɺluigiίϚϯυىಈ ࣌ʹಡΈࠐ·ͤΔ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  13. 25.

    LUIGIͷઃఆ RETCODE ▸ ҟৗൃੜͷछྨʹԠͨ͡໭Γ஋ΛઃఆͰ͖Δ ▸ υΩϡϝϯτΛΑ͘ಡΉͱɺҎԼͷઃఆྫ͕ࡌ͍ͬͯΔʂ [retcode] # The following

    return codes are the recommended exit codes for Luigi # They are in increasing level of severity (for most applications) already_running=10 missing_data=20 not_run=25 task_failed=30 scheduling_error=35 unhandled_exception=40 http://luigi.readthedocs.io/en/stable/configuration.html#retcode (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  14. 28.

    ϋϚΓͲ͜Ζ2 ฒྻ࣮ߦ͢Δͱ໭Γ஋͕͓͔͘͠ͳΔ ▸ Luigiʹ͸ϫʔΫϑϩʔ಺ͷTASKΛฒྻͰ࣮ߦ͢Δػೳ͕͋Δ ▸ —workers N ΦϓγϣϯΛ͚ͭΕ͹ฒྻ࣮ߦ͞ΕΔ ▸ ͋Δ೔ͷ͜ͱ

    ▸ ϫʔΫϑϩʔ͸ࣦഊͨ͠໭Γ஋Λฦ͕ͨ͠ɺ࣮ࡍʹϦϥϯ ͯ͠ΈΔͱࣦഊͨ͠ͱ௨஌͞ΕͨTASK͸͢Ͱʹ੒ޭࡁΈͰ ۭৼΓͯ͠ऴྃ ▸ ͳΜͷͬͪ͜Όʁ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  15. 31.

    DIGDAGͱLUIGIͷ࢖͍෼͚ ▸ γϯϓϧͳϫʔΫϑϩʔΛهड़͢Δʹ͸Luigi͸ॏ͍ͨ ▸ ҎԼ͸Treasure WorkflowʹҠߦத ▸ ΫΤϦͷґଘؔ܎͕؆୯ ▸ TASK͕TDΫΤϦͷΈ

    ▸ Embulk/FTPͳͲଞγεςϜ࿈ܞܥͷॲཧ͕͋Δ࣌͸Luigi ▸ TASKؒͷґଘؔ܎͕ෳࡶʹͳΓͦ͏ͳΒLuigi ▸ GUI͸Digdat/Treasure Workflowͷํ͕༏Ε͍ͯΔ ▸ LuigiͷGUI͸ɾɾɾ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.
  16. 34.