How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

本文はこちら

https://gist.github.com/suzuken/6033c20a3a3c9e0f5354b88f405240f5

9c47f639b91a66ccf901724eaaf9043d?s=128

Kenta Suzuki

April 22, 2016
Tweet

Transcript

  1. How to use AWS Lambda in Document Processing Pipeline @suzu_v

    VOYAGE GROUP 2016/04/22 at AWS Tokyo Office
  2. ࢲʹ͍ͭͯ • ͚ͣ͢Μ, https://github.com/suzuken, @suzu_v • GopherͰ͢ / ࠓ೔͸Javaͷ࿩Λ͠·͢ •

    http://fluct.jp Ͱ޿ࠂ഑৴ / ղੳج൫ͷιϑτ ΢ΣΞΤϯδχΞΛ͍ͯ͠·͢
  3. ΞδΣϯμ • ޿ࠂ഑৴γεςϜͷจষղੳج൫ͰLambda͕ Ͳ͏࢖ΘΕ͍ͯΔͷ͔Λઆ໌͠·͢ • API Gateway + LambdaͰ͸ͳ͘ɺKinesis Stream

    + LambdaͷࣄྫͰ͢ ൃදதʹ΋ؾܰʹ࣭໰͍ͯͩ͘͠͞ʂ
  4. ༻్ɾཁ݅ɾഎܠ • ޿ࠂ഑৴ͷͨΊʹϖʔδ಺จষΛݟͯͦͷ಺༰ Λ෼ੳɾ෼ྨ͠ɺ഑৴ʹ׆༻͍ͨ͠ • ϖʔδ͸Ϋϩʔϧͯ͠औಘɺͳΔ΂͘Ϋϩʔϧ ͔ͯ͠Βૣ͘෼ྨ͍ͨ͠ • 1೔ʹऔಘɾ෼ੳ͍ͨ͠ϖʔδ͸100ສϖʔδ ΄Ͳ

    ։ൃ͔ΒϦϦʔε·Ͱ͸3ϲ݄ఔ౓
  5. ํ਑ • ӡ༻ʹख͔͚ؒͨ͘ͳ͍ͷͰͳΔ΂͘Ϛωʔδ υαʔϏεΛ͔ͭ͏ • ෼ੳɾ෼ྨɾจॻݕࡧ͋ͨΓ͸ࠓޙ৭ʑͳख๏ ΛࢼͤΔΑ͏ʹ࡞Δ • ֤ίϯϙʔωϯτ͸ͦΕͧΕಠཱͯ͠ಈ࡞͠ɺ 1͕ͭམͪͯ΋શମʹӨڹ͕ͳ͍Α͏ʹ͢Δ

  6. ߏ੒ཁૉ • WebΫϩʔϥ (EC2 / Go): URLΛࢦఆͯ͠ίϯςϯπΛऔಘ͢Δ daemon • ຊจநग़ػ

    (Lambda / Java8): ຊจͰ͋Δͱਪఆ͞ΕΔ෦෼Λ ൈ͖ग़͢ • Lucene / KuromojiΛ͔͍͍ͭͨͷͰ • ෼ྨث (EC2 / Go): ຊจ΍ϖʔδ͔ΒಘΒΕΔ৘ใΛݩʹจষΛ ΧςΰϦ෼͚ͳͲΛ୲౰͢Δ • υΩϡϝϯτετΞ (EC2 / Elasticsearch): Ϋϩʔϧͯ͠෼ྨࡁΈ ͷίϯςϯπΛ֨ೲ͠ɺݕࡧՄೳʹ͢Δ • API (EC2 + ELB / Go): ෼ྨ݁ՌΛฦ͢internalͳHTTP API
  7. None
  8. ΞʔΩςΫνϟ Kinesis StreamΛॏๅ͍ͯ͠·͢ • ϐʔΫͰ~100MB/sͰΫϩʔϥ͕ίϯςϯπΛ fetch • ͦΕΛ௚઀Kinesis StreamʹPutRecordsͰૠೖ •

    Ϋϩʔϥ͸Go੡ (with aws-sdk-go)ɺॻ͖ࠐΈ ͷϦτϥΠ΍όοϑΝϦϯά΋͍ͯ͠Δ • ႈ౳ੑͷ୲อ͸ElasticsearchͰ
  9. ͳͥKinesis Stream͔ • PutRecords / GetRecords ͕҆ఆ͍ͯ͠Δ • ϦΞϧλΠϜʹΫϩʔϧ݁ՌΛղੳ͢ΔͨΊͷ σʔλͷόοϑΝͱͯ͠ॏๅ͍ͯ͠Δ

    • Lambdaͱ࿈ܞ͢Δ͜ͱͰετϦʔϜॲཧ༻Ξ ϓϦέʔγϣϯ΋؆୯ʹॻ͚Δ
  10. ͳͥLambda͔ • Kinesis Streamͱͷ࿈ܞ͕؆୯ • ݕূʹ΋Άͪͬͱ৽͍͠Lambda Function࡞Ε ͹͍͍ͷͰखܰ • Kinesis

    Streamͷσʔλ͸shardʹσʔλ͕͋ ΔͷͰಉ͡σʔλͰͷςετ΋खܰ • Testing in Production (Data)
  11. Lambdaͷྑ͍఺ • Kinesis ApplicationΛࣗલͰॻ͘ͱγϟʔυͷ΍Γ ͘Γ͕໘౗ • ͦͷ͋ͨΓΛLambdaଆͷwrapper͕͍͍ײ͡ʹ ͯ͘͠ΕΔ • σϓϩΠָ͕

    • Ϗϧυ࣮ͯ͠ߦՄೳόΠφϦΛs3ʹ͓͚͹ͦΕΛ ར༻Ͱ͖Δ • daemon؅ཧͳͲΛߟ͑ͳ͍͍ͯ͘
  12. JavaͰͷ࣮૷ྫ

  13. ࣮૷ྫ in Java KinesisͷϨίʔυܗࣜͱରͱͳΔPOJOΦϒδΣΫ τΛ࡞੒ public class KinesisMessageModel implements Serializable{

    public String id; public String url; public String body; public String title; public String description; // ... } see: ྫ: ϋϯυϥʔͷೖग़ྗʹ POJO Λ࢖༻͢Δ (Java) - AWS Lambda
  14. σʔλΛՃ޻ͯ࣍͠ͷKinesis Stream΁ public class Boiler { // Kinesis Stream͔ΒͷσʔλΛ͏͚ͱΔϋϯυϥ public

    void recordHandler(KinesisEvent event) throws IOException { PutRecordsRequest putRecordsRequest = getPutRecordsRequest(this.kinesisOutputStreamName); List<PutRecordsRequestEntry> putRecordsRequestEntryList = new ArrayList<>(); // 1ͭͷeventʹ͸ෳ਺ͷϨίʔυ͕ೖ͍ͬͯΔ batch sizeͰઃఆՄೳɻ for(KinesisEventRecord rec : event.getRecords()) { KinesisMessageModel record = toClass(rec); PutRecordsRequestEntry putRecordsRequestEntry = new PutRecordsRequestEntry(); // ϨίʔυͷՃ޻ʢ࣮ࡍʹ͸͜͜Ͱຊจநग़Λ͍ͯ͠·͢ʣ ByteBuffer data = ByteBuffer.wrap(new ObjectMapper().writeValueAsString(record)); putRecordsRequestEntry.setData(data); putRecordsRequestEntry.setPartitionKey(record.getSomeKey()); putRecordsRequestEntryList.add(putRecordsRequestEntry); } // ࣍ͷKinesis Stream΁ͷPutRecordsͷ૊Έཱ͍ͯͯΔ putRecordsRequest.setRecords(putRecordsRequestEntryList); PutRecordsResult putRecordsResult = this.kinesis.putRecords(putRecordsRequest); } }
  15. Java࣮૷ͷॴײ • ͬ͘͞ͱॻ͘ͳΒnode.jsͷ΄͏ָ͕ • Javaͷ৔߹͸blueprint͕ͳ͍ & Lambda Console͔Βͬ͘͞ ͱࢼ͢͜ͱ͸Ͱ͖ͳ͍ •

    ύοέʔδϯά͸MavenͰ΍͍ͬͯͯɺMaven Shade PluginͰ uber jarΛ͓͍͍ͭͬͯͯ͘·͢ɻ • uber jar: ґଘϥΠϒϥϦͳͲΛશ෦1ͭͷjarʹ͍Εͨjarͷ͜ͱ • ܗଶૉղੳ༻ͷࣙॻ΋jarʹ͍Ε͍ͯ·͢ Lambda ؔ਺ϋϯυϥʔ (Java) - AWS Lambda Apache Maven Shade Plugin – Introduction
  16. ࣮૷ʹ͋ͨͬͯؾΛ͚ͭΔ͜ͱ • ΤϥʔϋϯυϦϯά • 1ͭͰ΋มͳϨίʔυ͕͘ΔͱKinesis StreamଆͷϨίʔυ͕expire͢Δ·Ͱ Lambda͕retry͚ͭͮ͠Δ • failͤ͞Δͱఀࢭͯ͠͠·͏ͷͰɺskip͢Δ Α͏ʹ࣮૷͢Δ͜ͱ

  17. Lambda࡞੒: aws-cli aws lambda create-function --region ap-northeast-1 --function-name my-lambda-function --code

    S3Bucket=mybucket,S3Key=path/to/my.jar --role arn:aws:iam::999999999999:role/lambda_kinesis_rw --runtime java8 --handler com.your.app.Handler::recordHandler --description "my kinesis stream!" --timeout 15 --memory-size 512 aws lambda create-event-source-mapping --event-source-arn arn:aws:kinesis:ap-northeast-1:999999999999:stream/your-stream --function-name my-lambda-function --enable --batch-size 100 --starting-position TRIM_HORIZON
  18. σϓϩΠํ๏: aws-cli • Pull Request -> merge -> build (on

    Travis CI) - > S3 • Travis CIͰuber jarΛ͍ͭͬͯ͘·͢ • ͋ͱ͸ update-function-code Ͱ൓ө aws lambda update-function-code --function-name my-lambda-function --s3-bucket mybucket --s3-key path/to/my.jar
  19. LambdaͰͷϩΪϯά • Log4jΛ͔͍ͭͬͯ·͢ • ΤϥʔϩάͳͲ͸CloudWatch Logs͔ΒݟΔ͜ͱ ͕Ͱ͖Δ • खݩͰ͸࠶ݱ͠ͳ͍ෆ۩߹ͳͲ͕͋Δ৔߹ʹ͸ CloudWatch

    Logs͔ΒݟΔ͜ͱ AWS Lambda ͷ Amazon CloudWatch ϩά΁ͷΞ Ϋηε - AWS Lambda ϩΪϯά (Java) - AWS Lambda
  20. ·ͱΊ • Lambda + Kinesis StreamͰจষΛϦΞϧλΠ Ϝ෼ྨ͢Δ͜ͱ͕Ͱ͖ΔΑ͏ʹͳΓ·ͨ͠ • Lambda, ͬ͘͞ͱ͔͓ͭ͑ͯ͢͢ΊͰ͢