Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSでクローラを構築するときのベストプラクティスを思いつきたい人生だった

 AWSでクローラを構築するときのベストプラクティスを思いつきたい人生だった

masaaki.takeuchi

February 24, 2016
Tweet

More Decks by masaaki.takeuchi

Other Decks in Technology

Transcript

  1. Ϋϩʔϥ ✓ ͳΜͱͳ͘ͷఆٛʢͱ͍͏͔ࢲͷೝࣝʣ ✓ web͔ΒσʔλΛऔಘ͠ߏ଄Խ͢Δ ✓ ͦͷಈ࡞͸ओʹ3ͭͷϑΣʔζʹ෼ׂͰ͖Δ ✓ Ϋϩʔϧ/μ΢ϯϩʔυ/ύʔε +------------------------------------------------------------+

    | | | | | +-----------+ +------------+ +----------+ | +----+ | | | url | | html | | | | | | | crawler | +----> | downloader | +----> | parser | +---> | DB | | | | | | | | | | | | +-----------+ +------------+ +----------+ | +----+ | | | | +------------------------------------------------------------+
  2. ElasticBeanstalkΛར༻ͨ͠ΫϩʔϥΞʔΩςΫνϟ SQS ElastiCache S3 +-----------+ +----------+ +---------------+ | | |

    | | | | job queue | | lock | | shared data | | | | | | | +----+------+ +----+-----+ +---+-----------+ ^ ^ ^ | | | | | | +-----------+------------------+----------------+------------+ | | Lambda | worker1 worker2 worker3 | EC2 +---------+ | +-----------+ +------------+ +----------+ | +-----------+ | | | | | url | | html | | | | | | Trigger | +---> | | crawler | +----> | downloader | +----> | parser | +---> | mongoDB | | | | | | | | | | | | | +---------+ | +-----------+ +------------+ +----------+ | +-----------+ | | | | +------------------------------------------------------------+ Elastic Beanstalk
  3. ໰୊ ✓ Ϋϩʔϥͷ໰୊ ✓ crawler/downloader/parser͕͚ͬ͜͏ີʹ݁߹͍ͯ͠Δ ✓ Ϋϩʔϧର৅ʢαΠτଆʣͷ໰୊ ✓ 47௨ΓͷϩδοΫ͕ඞཁ ✓

    post͡Όͳ͍ͱϖʔδભҠͰ͖ͳ͍ͱ͔ɺɺ ✓ nϖʔδ໨දࣔঢ়ଶ͔Βn+10ϖʔδ໨·Ͱ͔͠ϖʔδભҠͰ͖ͳ͍ͱ͔ɺɺ ✓ ʢؔ܎ͳ͍͚Ͳɺͳ͔ͥ߳઒ͱ௕໺ͱ࡛ۄͰΫϩʔϧࣦഊ͢Δ͜ͱ͕ଟ͍ͱ͔ɺɺʣ ✓ ͭ·Γ ✓ ·͋·͋࡞Γ௚͕͠ඞཁ͕ͩɺਅਖ਼໘͔Β޲͖߹͍ͨ͘ͳ͍ɺɺ
  4. ෼཭͠ͳ͍ύλʔϯ +-----------------------------------------------------+ | | | +----------+ +----------+ +----------+ | |

    | | | | | | | | | hokkaido | | aomori | ...... | okinawa | | | | | | | | | | | +----------+ +----------+ +----------+ | | X | | X | +-------X---------------------------------------------+ X X +---------X-------------------------+ | | | crawler -> downloader -> parser | | | +-----------------------------------+
  5. ͳΜ͔ ✓ 1पճͬͯૉ๿ͳΞʔΩςΫνϟʹ໭͖ͬͯͨײ͋Δ ✓ ͱ͸ݴ͑ɺ͖ͬ͞ͷΞʔΩςΫνϟΑΓ΋ಈ࡞͸γϯϓϧʹͳΔ͸ͣ ✓ ۃྗΞϓϦέʔγϣϯଆʹ৮ΒͣʹࡁΈͦ͏ ✓ ී௨ʹEC2্Ͱcronಈ͔ͯ͠΋͍͍͚Ͳɺɺ ✓

    ͍·ίϯςφӡ༻ΛਐΊ͍ͯΔ ✓ EC2 Container ServiceʢECSʣͷTaskͰόον࣮ߦͰ͖Δ ✓ ECS ✓ AWS্ͰDockerίϯςφͷ࣮ߦɺ؅ཧΛߦ͏ͨΊͷϓϥοτϑΥʔϜ ✓ λεΫ ✓ ECS্ͷΞϓϦέʔγϣϯͷ୯Ґ ✓ λεΫఆٛͱݺ͹ΕΔίϯςφઃఆΛ࡞੒࣮͠ߦ͢ΔͱɺECS্Ͱίϯςφ্ཱ͕͕ͪΔ
  6. AWS্ʹΫϩʔϥΛߏங̎ EC2 Container Service +-----------------------------------------------------+ | | Lambda | container1

    container2 container47 | +---------+ | +----------+ +----------+ +----------+ | | | | | | | | | | | | Trigger | +---> | | hokkaido | | aomori | ...... | okinawa | | | | | | | | | | | | +---------+ | +----------+ +----------+ +----------+ | | X | | X | +-------X---------------------------------------------+ X X +---------X-------------------------+ | | | crawler -> downloader -> parser | | | +-----------------------------------+ ✓ ✓ ݩʹͳΔλεΫΛ࡞੒ ✓ Lambda͔Β౎ಓ෎ݝ͝ͱʹλεΫ࣮ߦ ✓ ࣮ߦ࣌ʹίϚϯυ্ॻ͖ ✓ php exec.php hokkaido Έ͍ͨͳײ͡
  7. ॴײ ✓ ແཧ໼ཧײ͋Δ͚Ͳɺ͍͍ ✓ ౎ಓ෎ݝ͝ͱʹ෼཭͞Εɺݩʑߟ͍͑ͯͨΞʔΩςΫνϟΑΓ΋ಈ࡞͕γϯϓϧʹͳͬͨ ✓ ҰԠɺεέʔϧΞ΢τ͸Ͱ͖Δ ✓ ౎ಓ෎ݝ͕࠷খ࣮ߦ୯ҐͱͳΔ͕ ✓

    ϩά΋ઃఆ࣍ୈͰऔΕΔ ✓ ίϯςφ಺ͷϩά͸ίϯςφऴྃͱಉ࣌ʹফ͑Δ ✓ ϗετଆʹϚ΢ϯτͨ͠ύεʹอଘ ✓ ಉҰΠϯελϯεʹfluentd༻taskΛ࣮ߦ͓͖ͯ͠ɺcloudwatch΁ྲྀ͢