Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWS演習(未実施)- EMR・MapReduce

axi-sugiki
February 05, 2019

AWS演習(未実施)- EMR・MapReduce

授業の演習で使用した資料です.一コマの演習,また2年生を対象とした演習で,1年生では専門教育を行わず,2年生から学部・学科に配属されるため,この辺りに水準を合わせています.

axi-sugiki

February 05, 2019
Tweet

More Decks by axi-sugiki

Other Decks in Education

Transcript

  1. #&,+$ ◦ Amazon EMR  S3 %*0   (-

    ◦ Amazon EMR1 MapReduce.Hadoop/ -) Amazon' ◦ Amazon S31 !"   2 Amazon EMR Amazon S3 Master node Core nodes m3.xlarge Instances (3 nodes) Data load Output results Load Mapper and Reducer functions Input data
  2. Amazon S3 ◦ Amazon Simple Storage Service (S3) ◦ REST

    API !" ◦  $  )%"20 ◦  3'/. 7#-/. ◦ +4 +,4 20*571&7(1 6 3 objects bucket
  3. Amazon EMR ◦ Amazon Elastic MapReduce (EMR) ◦ Amazon EC2

    % *+ ◦ '&) Amazon" ◦ Apache Hadoop ◦ Apache Hadoop,Spark,HBase,Presto,… ◦ EMR!,EC2  $& ◦ EMR + EC2#( 4 Amazon EMR cluster bucket with objects bucket with objects
  4. #'!& :1 ◦ .#'!&9+ &&2 (< /6#'!&:1 ◦ netclouds 07-?,8=',;>

    ◦ netclouds 107-?,8=> ◦ netclouds 207-?,8=,$> ◦ netclouds 307-?,8=%&> ◦ netclouds 407-?P=3)> ◦ netclouds 507-?EU="&$> ◦ netclouds 607-?EU=$"&> 5 451#'!& *0 &20&& 
  5. EMR  ◦ "EMR  ◦ EMR   $

    ◦ %#!   6 1.     2. #!  
  6. 0, 738 ◦  % /( ◦ emr-( %) /(

    ◦    &*7!+4-8 ◦    3)9/(2$  9 1.  % /( 2. .59#6"1  ' 3.    3)9/(2$ 
  7. 94#?4@ ◦ ! (.8)A;&! ◦ s3://**-mecab/bootstrap/bootstrap.sh 10 1. ! (.8)?=<

      > 2. ;&! 1! AEMR %-0 5'*6! MeCab?/1:+,372@ #$";&6
  8. EMR .$ ◦ .$(!10#-) ◦ 0' 44*+  &% 

    ◦ *+ 2,"(!3  4EMR.$(! ◦  1 4Amazon S3 ,"/ 12 *+ 2,"(!3 *
  9. Amazon S3 ).122 ◦  &, %-0+ ◦ $* "(

    14 bootstrap EMR /#( input  "( mapper Map!'( reducer Reduce!'(
  10. -0 ◦ EMR 63' -0   ◦ S3#&" ,*

    +)6!% $5 .1  16 1. 2( EMR4/6 4/ 2. 3'
  11.  A.D1E ◦  A.=C(?@4 ◦ F $'&%#B6 ◦ "0F

    (!'0)-wc  0, ◦ 'Fs3://**-mecab/mapper/mapper.py74 ◦ $ ''Fs3://**-mecab/reducer/reducer.py7 4 ◦ S3 15 *-Fs3://**-mecab/input/74 ◦ S3 15 +-Fs3://**-mecab/output-(!'0)74 D/3<> E ◦ 29: 8)F; 74 17
  12. !)Mapper ◦ MeCab  % %&$' ◦ $ & (

     " #' 19 #!/usr/bin/python # -*- coding:utf-8 -*- import sys import MeCab m = MeCab.Tagger("-Ochasen") for line in sys.stdin: node = m.parseToNode(line) while node: feature = node.feature.split(",") word = feature[-3] if word != "*": print("%s¥t%d" % (word, 1)) else: pass node = node.next
  13. Reducer ◦    20 #!/usr/bin/python # -*- coding:utf-8

    -*- import sys from collections import defaultdict from operator import itemgetter wc_dict = defaultdict(int) for line in sys.stdin: word, count = line.strip().split('¥t') wc_dict[word] += int(count) for word, count in sorted(wc_dict.items(), key=itemgetter(1), reverse=True): print('{0}¥t{1}'.format(word, count))
  14.   ◦  "  ◦  $"$ 

    ◦   !# 21   
  15. (& '* ◦ S3 "#! )%  $ ◦ Hadoop

    "# Reducer )%  +part- ****, ◦ -(&'* 22
  16.  ◦       https://docs.aws.amazon.com/ja_jp/gettingstar ted/latest/emr/getting-started-emr-

    overview.html ◦  http://www.aozora.gr.jp/ ◦ Project Gutenberg: http://www.gutenberg.org/ 24