Slide 1

Slide 1 text

LUIGIΛ࢖͍ͬͯΔ࿩ RECRUIT MARKETING PARTNERS େ੢ɹߴ࢙

Slide 2

Slide 2 text

ࣗݾ঺հ େ੢ɹߴ࢙ ▸ 2017/10 Recruit Marketing Partnersೖࣾ ▸ લ৬͸SIer ▸ σʔλ෼ੳج൫ͷ։ൃɾӡ༻ΛϝΠϯʹ୲౰ ▸ @bwtakacy on GitHub, Qiita ▸ ҎԼͰύον౤ߘͨ͠Γɺπʔϧϝϯςφϯεͨ͠Γ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 3

Slide 3 text

ࠓ೔ͷ࿩ ಺༰ ▸ ελσΟαϓϦɾσʔλ෼ੳج൫ͰͷLUIGI ▸ LUIGIͷجૅ ▸ LUIGIͱࢲɹ·ͨ͸ࢲ͸೗Կʹͯ͠৺഑͢ΔͷΛࢭΊͯLUIGIͱ෇͖߹͏Α͏ʹͳ͔ͬͨ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 4

Slide 4 text

ελσΟαϓϦ σʔλ෼ੳج൫

Slide 5

Slide 5 text

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

ελσΟαϓϦ ▸ ߨࢣʹΑΔߨٛಈըɺυϦϧɺςΩετΛ׆༻ֶͨ͠शࢧԉ ▸ খɾதɾߴɾେֶडݧͷ5ڭՊ18Պ໨ͷ1ສҎ্ͷߨٛ ▸ ֹ݄ 980ԁʢ੫ൈʣʙ ▸ ༗ྉձһ਺ 42ສਓʢ2016೥౓ͷ΂ʣ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 8

Slide 8 text

Developer summitʮελσΟαϓϦΛࢧ͑Δσʔλ෼ੳج൫ ʙઃܭͷצॴͱར׆༻ࣄྫʙʯΑΓ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 9

Slide 9 text

Developer summitʮελσΟαϓϦΛࢧ͑Δσʔλ෼ੳج൫ ʙઃܭͷצॴͱར׆༻ࣄྫʙʯΑΓ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 10

Slide 10 text

೔࣍ॲཧ ▸ EmbulkͰ 50+ ͷςʔϒϧΛ࿈ܞ ▸ LuigiͰ 30+ ͷHiveΫΤϦΛ࣮ߦ ▸ TD Workflow (Digdag) Ͱ 10+ ͷPrestoΫΤϦΛ࣮ߦ ▸ TD্ͷεέδϡʔϧΫΤϦ΋ 10+ ଘࡏ ▸ ෳ਺ͷ෦໳ʹ 20+ ͷϨϙʔτΛఏڙ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 11

Slide 11 text

LUIGIͷجૅ

Slide 12

Slide 12 text

LUIGI Λ ▸ ෳ਺ͷόονॲཧΛ૊Έ߹ΘͤͨδϣϒΛ੍ޚ ▸ ॲཧͷґଘؔ܎ͷղܾͱεέδϡʔϦϯάʹಛԽ ▸ ॲཧͷΞτϛοΫੑ୲อ ▸ શͯPythonͰهड़ ▸ ϓϥοτϑΥʔϜʹґଘ͠ͳ͍ͨΊɺ༷ʑͳॲཧ͕Ұݩతʹهड़Մೳ ▸ Hadoop, Hive, Pig, Spark ▸ MySQL, PostgreSQL, SQLAlchemy ▸ Treasure Data, BigQuery, Redshift ▸ SSH, FTP ▸ ͳͲͳͲ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved. Spotify͕։ൃͨ͠΋ͷ͕OSSԽ͞Εͨ ໊લͷ༝དྷ͸ɺʮੈքͰ̎൪໨ʹ༗໊ͳ഑؅޻ʯ

Slide 13

Slide 13 text

LUIGIͷൣғ͡Όͳ͍͜ͱ ▸ ϦΞϧλΠϜॲཧ΍௕ظؒܧଓ࣮ߦ͢Δॲཧʹ͸ෆ޲͖ ▸ ॲཧͷ෼ࢄ࣮ߦ͸αϙʔτ͍ͯ͠ͳ͍ ▸ ॲཧͷεέδϡʔϧىಈ΍τϦΨىಈ͸Ͱ͖ͳ͍ ▸ ελσΟαϓϦͷσʔλج൫Ͱ͸Jenkins͔ΒLuigiΛεέ δϡʔϧىಈ͍ͯ͠Δ ▸ εέʔϥϏϦςΟ͸௥ٻ͍ͯ͠ͳ͍ ▸ ਺ઍ͘Β͍ͷॲཧΛͭͳ͛Δ͜ͱ͸Մೳ͕ͩ਺ສن໛͸ແཧ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 14

Slide 14 text

༻ޠ ▸ Task ▸ ॲཧͷ࣮ମ ▸ ϫʔΫϑϩʔͷ෦඼ ▸ ࣄલʹ࣮ߦ͞Ε͍ͯΔ΂͖TaskΛఆٛͰ͖Δ ▸ Target ▸ Taskͷਖ਼ৗऴྃΛࣔ͢৘ใ ▸ ϑΝΠϧ on ϩʔΧϧɺ HDFS, RDB, S3 ▸ Paramater ▸ TaskͷҾ਺ͱͯ͠༩͑Δ͜ͱ͕Ͱ͖Δม਺ ▸ ྫɽ೔࣍ॲཧʹ͓͚Δ೔෇ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 15

Slide 15 text

؆୯ͳྫ import luigi class Foo(luigi.WrapperTask): task_namespace = 'examples' def run(self): print("Running Foo") def requires(self): for i in range(10): yield Bar(i) class Bar(luigi.Task): task_namespace = 'examples' num = luigi.IntParameter() def run(self): time.sleep(1) self.output().open('w').close() def output(self): time.sleep(1) return luigi.LocalTarget('/tmp/bar/%d' % self.num) if __name__ == "__main__": luigi.run(['examples.Foo', '--workers', '2', '--local-scheduler']) luigiͷexamples/foo.py (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 16

Slide 16 text

؆୯ͳྫ import luigi class Foo(luigi.WrapperTask): task_namespace = 'examples' def run(self): print("Running Foo") def requires(self): for i in range(10): yield Bar(i) class Bar(luigi.Task): task_namespace = 'examples' num = luigi.IntParameter() def run(self): time.sleep(1) self.output().open('w').close() def output(self): time.sleep(1) return luigi.LocalTarget('/tmp/bar/%d' % self.num) if __name__ == "__main__": luigi.run(['examples.Foo', '--workers', '2', '--local-scheduler']) TASKʹͤ͞Δॲཧ ͜ͷTASKͷ࣮ߦ͕લఏͱ͍ͯ͠Δ TASK TASKͷPARAMETER TASK͕ਖ਼ৗऴྃͨ͜͠ͱΛࣔ ͢TARGETΛੜ੒ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 17

Slide 17 text

ελσΟαϓϦͰͷ࢖༻ྫ Developer summitʮελσΟαϓϦΛࢧ͑Δσʔλ෼ੳج൫ ʙઃܭͷצॴͱར׆༻ࣄྫʙʯΑΓ ‣ Treasure Data΁ͷΫΤϦൃߦ͸luigi_tdϞδϡʔϧΛར༻ ‣ Embulk࣮ߦ͸ී௨ʹpython͔ΒίϚϯυϥΠϯ૊Έཱ࣮ͯͯߦ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 18

Slide 18 text

Πϯετʔϧͱ࣮ߦ ‣ Πϯετʔϧ $ pip install luigi ‣ ϫʔΫϑϩʔ࣮ߦ $ luigi --module foo examples.Foo --local-scheduler ‣ ىಈ͢ΔPythonεΫϦϓτ͕sys.path഑Լʹଘࡏ͍ͯ͠Δ͜ͱ ‣ local-scheduler: ίϚϯυ࣮ߦ͝ͱʹεέδϡʔϥ͕ىಈ ‣ εέδϡʔϥϓϩηεΛಠཱ࣮ͯ͠ߦ͓͚ͯ͠͹ෆཁ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 19

Slide 19 text

LUIGIͱࢲ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 20

Slide 20 text

CASE 1. TASKͷ࣮ߦ͕࣌ؒ஌Γ͍ͨ ▸ ϫʔΫϑϩʔΛ࣮ߦ͢Δͱίϯιʔϧϩά͕ͦΕͳΓʹग़Δ ▸ DEBUGͱ͔ผʹ͍Βͳͦ͏ͳ΋ͷ·Ͱ ▸ ͔͠͠ɺ֤TASKͷ࣮ߦ࣌ؒ͸ग़ͳ͍ʂ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 21

Slide 21 text

$ PYTHONPATH='.' luigi --module top_artists AggregateArtists --local-scheduler --date- interval 2012-06 DEBUG: Checking if AggregateArtists(date_interval=2012-06) is complete DEBUG: Checking if Streams(date=2012-06-01) is complete ʢུʣ DEBUG: Checking if Streams(date=2012-06-30) is complete INFO: Informed scheduler that task AggregateArtists_2012_06_7af15faabf has status PENDING INFO: Done scheduling tasks INFO: Running Worker with 1 processes DEBUG: Asking scheduler for work... DEBUG: Pending tasks: 31 INFO: [pid 3501] Worker Worker(salt=894400700, workers=1, host=oonishitk-MBP.local, username=bwtakacy, pid=3501) running Streams(date=2012-06-21) INFO: [pid 3501] Worker Worker(salt=894400700, workers=1, host=oonishitk-MBP.local, username=bwtakacy, pid=3501) done Streams(date=2012-06-21) DEBUG: 1 running tasks, waiting for next task to finish INFO: Informed scheduler that task Streams_2012_06_21_b21eade591 has status DONE ʢུʣ DEBUG: There are no more tasks to run at this time INFO: Worker Worker(salt=894400700, workers=1, host=oonishitk-MBP.local, username=bwtakacy, pid=3501) was stopped. Shutting down Keep-Alive thread INFO: ===== Luigi Execution Summary ===== Scheduled 31 tasks of which: * 31 ran successfully: - 1 AggregateArtists(date_interval=2012-06) - 30 Streams(date=2012-06-01...2012-06-30) This progress looks :) because there were no failed tasks or missing external dependencies ===== Luigi Execution Summary ===== (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 22

Slide 22 text

TASKͷ࣮ߦ࣌ؒΛ࠾Δํ๏ PROCESSING_TIMEΠϕϯτ ▸ Luigiʹ͸ϏϧτΠϯɾΠϕϯτͱͦΕʹର͢ΔίʔϧόοΫΛొ࿥ ͢Δ͜ͱ͕Ͱ͖Δ࢓૊Έ͕͋Δ ▸ FAILUREΠϕϯτɿTASK͕ࣦഊͨ͠৔߹ʹͷΈൃߦɻ ▸ ΫϦʔϯφοϓॲཧΛߦ͏ɺΤϥʔ௨஌Λߦ͏ɺͳͲʹ࢖͑Δ ▸ PROCESSING_TIMEΠϕϯτɿTASK͕ਖ਼ৗऴྃͨ͠৔߹ʹͷΈ ൃߦɻTASͷ࣮ߦ࣌ؒʢ୯Ґ͸ඵʣΛ֨ೲ͍ͯ͠Δɻ ▸ ͜Εͩʂʂ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 23

Slide 23 text

ྫ class TimeTaskMixin(object): ''' A mixin that when added to a luigi task, will print out the tasks execution time to standard out, when the task is finished ''' @luigi.Task.event_handler(luigi.Event.PROCESSING_TIME) def print_execution_time(self, processing_time): print('### PROCESSING TIME ###: ' + str(processing_time)) https://gist.github.com/samuell/93cc7eb6803fa2790042 (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 24

Slide 24 text

CASE 2. ฒྻ࣮ߦͱίϚϯυ໭Γ஋ ▸ σϑΥϧτͰ͸ɺTASKͷࣦഊ͕ى͖ͯ΋luigiίϚϯυͷ໭ Γ஋͸0 ▸ ໭Γ஋ʹΑΔΤϥʔϋϯυϦϯά͕Ͱ͖ͳ͍ ▸ ղܾࡦ ▸ ઃఆϑΝΠϧʹͯretcodeઃఆΛߦ͍ɺluigiίϚϯυىಈ ࣌ʹಡΈࠐ·ͤΔ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 25

Slide 25 text

LUIGIͷઃఆ RETCODE ▸ ҟৗൃੜͷछྨʹԠͨ͡໭Γ஋ΛઃఆͰ͖Δ ▸ υΩϡϝϯτΛΑ͘ಡΉͱɺҎԼͷઃఆྫ͕ࡌ͍ͬͯΔʂ [retcode] # The following return codes are the recommended exit codes for Luigi # They are in increasing level of severity (for most applications) already_running=10 missing_data=20 not_run=25 task_failed=30 scheduling_error=35 unhandled_exception=40 http://luigi.readthedocs.io/en/stable/configuration.html#retcode (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 26

Slide 26 text

LUIGIͷઃఆϑΝΠϧ LUIGI.CFG ▸ luigiίϚϯυ࣮ߦ࣌ʹҎԼ͔ΒಡΈࠐ·ΕΔɻʢԼʹߦ͘ ΄Ͳ༏ઌ౓͕ߴ͍ʣ ▸ /etc/luigi/client.cfg ▸ luigi.cfg (ΧϨϯτσΟϨΫτϦ্ʣ ▸ ؀ڥม਺ LUIGI_CONFIG_PATH Լ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 27

Slide 27 text

ϋϚΓͲ͜Ζ1 RETCODEΛόΠύεͨ͠ϫʔΫϑϩʔ͕Ͱ͖ͯ͠·͏ ▸ luigiίϚϯυΛ࢖͑͹Α͍ͷ͕ͩɺ෇ଐͷexamplesͷྫʹ͋ΔΑ͏ʹɺ εΫϦϓτ಺ʹ ɹͱͯ͠ɺpythonίϚϯυ͔Β࣮ߦ͍ͯ͠Δͱμϝɻ ɹͰ͋Ε͹OKɻ ‣ ༨ஊɿluigiίϚϯυ͸luigi.cmdline.luigi_run()ʹҾ਺Λ౉ͯ͠ݺͼग़͠ ͍ͯΔ͚ͩͷϥούʔεΫϦϓτͰ͔͠ͳ͍ if __name__ == '__main__': luigi.run() if __name__ == '__main__': luigi.cmdline.luigi_run() (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 28

Slide 28 text

ϋϚΓͲ͜Ζ2 ฒྻ࣮ߦ͢Δͱ໭Γ஋͕͓͔͘͠ͳΔ ▸ Luigiʹ͸ϫʔΫϑϩʔ಺ͷTASKΛฒྻͰ࣮ߦ͢Δػೳ͕͋Δ ▸ —workers N ΦϓγϣϯΛ͚ͭΕ͹ฒྻ࣮ߦ͞ΕΔ ▸ ͋Δ೔ͷ͜ͱ ▸ ϫʔΫϑϩʔ͸ࣦഊͨ͠໭Γ஋Λฦ͕ͨ͠ɺ࣮ࡍʹϦϥϯ ͯ͠ΈΔͱࣦഊͨ͠ͱ௨஌͞ΕͨTASK͸͢Ͱʹ੒ޭࡁΈͰ ۭৼΓͯ͠ऴྃ ▸ ͳΜͷͬͪ͜Όʁ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 29

Slide 29 text

Կ͕ى͖͍ͯͨͷ͔ ▸ TASK͸͔֬ʹҰ౓ࣦഊ͕ͨ͠ɺϦτϥΠ͞Εͯ੒ޭ͍ͯͨ͠ ▸ Luigi͸σϑΥϧτͰ15෼ܦͭͱTASKͷϦτϥΠΛߦ͏ɿਖ਼ৗͳಈ࡞ ▸ TASK͕ϦτϥΠ͞Εͨ৔߹ʹɺͦͷ݁ՌΛߟྀͯ͠໭Γ஋Λฦͤͳ͍ɿόά1 ▸ ͞ΒʹɺϩάͷExecution SummaryͰ΋ͦͷϦτϥΠΛਖ਼͘͠දࣔͰ͖ͳ͍ɿ όά2 ▸ ϦτϥΠͰ੒ޭ͍ͯ͠ΔͷʹɺExecution SummaryͰ͸TASK͕ࣦഊͨ͜͠ ͱʹͳ͍ͬͯΔ ▸ Ͳ͏͍͏ͬͪ͜Όɾɾɾ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 30

Slide 30 text

AFTER V2.5.0 ύονΛ౤ߘͯ͠ղܾ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 31

Slide 31 text

DIGDAGͱLUIGIͷ࢖͍෼͚ ▸ γϯϓϧͳϫʔΫϑϩʔΛهड़͢Δʹ͸Luigi͸ॏ͍ͨ ▸ ҎԼ͸Treasure WorkflowʹҠߦத ▸ ΫΤϦͷґଘؔ܎͕؆୯ ▸ TASK͕TDΫΤϦͷΈ ▸ Embulk/FTPͳͲଞγεςϜ࿈ܞܥͷॲཧ͕͋Δ࣌͸Luigi ▸ TASKؒͷґଘؔ܎͕ෳࡶʹͳΓͦ͏ͳΒLuigi ▸ GUI͸Digdat/Treasure Workflowͷํ͕༏Ε͍ͯΔ ▸ LuigiͷGUI͸ɾɾɾ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 32

Slide 32 text

·ͱΊ LUIGI ▸ σʔλ෼ੳʹPythonΛଟ༻͍ͯ͠ΔͳΒศར ▸ ҟͳΔϓϥοτϑΥʔϜͷॲཧΛ૊Έ߹ΘͤΔͷʹศར ▸ TASKؒͷґଘੑ͕ෳࡶʹͳͬͯ΋ࣗಈͰղܾͯ͘͠Εͯศར ▸ ࣮ଶ͕PythonεΫϦϓτͳͷͰόʔδϣϯ؅ཧ΍CIʹศར ▸ HiveΫΤϦΛॱʹ࣮ߦ͢Δ͚ͩɺͳͲͷ؆୯ͳϫʔΫϑϩʔ ͳΒ΍΍ΦʔόʔεϖοΫ (C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Slide 33

Slide 33 text

We are Hiring! ڭҭ x σʔλʹڵຯ͕͋ΔํΛ͓଴͍ͪͯ͠·͢ :)

Slide 34

Slide 34 text

No content