Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

本文はこちら

https://gist.github.com/suzuken/6033c20a3a3c9e0f5354b88f405240f5

Kenta Suzuki

April 22, 2016
Tweet

More Decks by Kenta Suzuki

Other Decks in Technology

Transcript

  1. How to use AWS Lambda
    in Document Processing
    Pipeline
    @suzu_v VOYAGE GROUP
    2016/04/22 at AWS Tokyo Office

    View full-size slide

  2. ࢲʹ͍ͭͯ
    • ͚ͣ͢Μ, https://github.com/suzuken,
    @suzu_v
    • GopherͰ͢ / ࠓ೔͸Javaͷ࿩Λ͠·͢
    • http://fluct.jp Ͱ޿ࠂ഑৴ / ղੳج൫ͷιϑτ
    ΢ΣΞΤϯδχΞΛ͍ͯ͠·͢

    View full-size slide

  3. ΞδΣϯμ
    • ޿ࠂ഑৴γεςϜͷจষղੳج൫ͰLambda͕
    Ͳ͏࢖ΘΕ͍ͯΔͷ͔Λઆ໌͠·͢
    • API Gateway + LambdaͰ͸ͳ͘ɺKinesis
    Stream + LambdaͷࣄྫͰ͢
    ൃදதʹ΋ؾܰʹ࣭໰͍ͯͩ͘͠͞ʂ

    View full-size slide

  4. ༻్ɾཁ݅ɾഎܠ
    • ޿ࠂ഑৴ͷͨΊʹϖʔδ಺จষΛݟͯͦͷ಺༰
    Λ෼ੳɾ෼ྨ͠ɺ഑৴ʹ׆༻͍ͨ͠
    • ϖʔδ͸Ϋϩʔϧͯ͠औಘɺͳΔ΂͘Ϋϩʔϧ
    ͔ͯ͠Βૣ͘෼ྨ͍ͨ͠
    • 1೔ʹऔಘɾ෼ੳ͍ͨ͠ϖʔδ͸100ສϖʔδ
    ΄Ͳ
    ։ൃ͔ΒϦϦʔε·Ͱ͸3ϲ݄ఔ౓

    View full-size slide

  5. ํ਑
    • ӡ༻ʹख͔͚ؒͨ͘ͳ͍ͷͰͳΔ΂͘Ϛωʔδ
    υαʔϏεΛ͔ͭ͏
    • ෼ੳɾ෼ྨɾจॻݕࡧ͋ͨΓ͸ࠓޙ৭ʑͳख๏
    ΛࢼͤΔΑ͏ʹ࡞Δ
    • ֤ίϯϙʔωϯτ͸ͦΕͧΕಠཱͯ͠ಈ࡞͠ɺ
    1͕ͭམͪͯ΋શମʹӨڹ͕ͳ͍Α͏ʹ͢Δ

    View full-size slide

  6. ߏ੒ཁૉ
    • WebΫϩʔϥ (EC2 / Go): URLΛࢦఆͯ͠ίϯςϯπΛऔಘ͢Δ
    daemon
    • ຊจநग़ػ (Lambda / Java8): ຊจͰ͋Δͱਪఆ͞ΕΔ෦෼Λ
    ൈ͖ग़͢
    • Lucene / KuromojiΛ͔͍͍ͭͨͷͰ
    • ෼ྨث (EC2 / Go): ຊจ΍ϖʔδ͔ΒಘΒΕΔ৘ใΛݩʹจষΛ
    ΧςΰϦ෼͚ͳͲΛ୲౰͢Δ
    • υΩϡϝϯτετΞ (EC2 / Elasticsearch): Ϋϩʔϧͯ͠෼ྨࡁΈ
    ͷίϯςϯπΛ֨ೲ͠ɺݕࡧՄೳʹ͢Δ
    • API (EC2 + ELB / Go): ෼ྨ݁ՌΛฦ͢internalͳHTTP API

    View full-size slide

  7. ΞʔΩςΫνϟ
    Kinesis StreamΛॏๅ͍ͯ͠·͢
    • ϐʔΫͰ~100MB/sͰΫϩʔϥ͕ίϯςϯπΛ
    fetch
    • ͦΕΛ௚઀Kinesis StreamʹPutRecordsͰૠೖ
    • Ϋϩʔϥ͸Go੡ (with aws-sdk-go)ɺॻ͖ࠐΈ
    ͷϦτϥΠ΍όοϑΝϦϯά΋͍ͯ͠Δ
    • ႈ౳ੑͷ୲อ͸ElasticsearchͰ

    View full-size slide

  8. ͳͥKinesis Stream͔
    • PutRecords / GetRecords ͕҆ఆ͍ͯ͠Δ
    • ϦΞϧλΠϜʹΫϩʔϧ݁ՌΛղੳ͢ΔͨΊͷ
    σʔλͷόοϑΝͱͯ͠ॏๅ͍ͯ͠Δ
    • Lambdaͱ࿈ܞ͢Δ͜ͱͰετϦʔϜॲཧ༻Ξ
    ϓϦέʔγϣϯ΋؆୯ʹॻ͚Δ

    View full-size slide

  9. ͳͥLambda͔
    • Kinesis Streamͱͷ࿈ܞ͕؆୯
    • ݕূʹ΋Άͪͬͱ৽͍͠Lambda Function࡞Ε
    ͹͍͍ͷͰखܰ
    • Kinesis Streamͷσʔλ͸shardʹσʔλ͕͋
    ΔͷͰಉ͡σʔλͰͷςετ΋खܰ
    • Testing in Production (Data)

    View full-size slide

  10. Lambdaͷྑ͍఺
    • Kinesis ApplicationΛࣗલͰॻ͘ͱγϟʔυͷ΍Γ
    ͘Γ͕໘౗
    • ͦͷ͋ͨΓΛLambdaଆͷwrapper͕͍͍ײ͡ʹ
    ͯ͘͠ΕΔ
    • σϓϩΠָ͕
    • Ϗϧυ࣮ͯ͠ߦՄೳόΠφϦΛs3ʹ͓͚͹ͦΕΛ
    ར༻Ͱ͖Δ
    • daemon؅ཧͳͲΛߟ͑ͳ͍͍ͯ͘

    View full-size slide

  11. JavaͰͷ࣮૷ྫ

    View full-size slide

  12. ࣮૷ྫ in Java
    KinesisͷϨίʔυܗࣜͱରͱͳΔPOJOΦϒδΣΫ
    τΛ࡞੒
    public class KinesisMessageModel implements Serializable{
    public String id;
    public String url;
    public String body;
    public String title;
    public String description;
    // ...
    }
    see: ྫ: ϋϯυϥʔͷೖग़ྗʹ POJO Λ࢖༻͢Δ
    (Java) - AWS Lambda

    View full-size slide

  13. σʔλΛՃ޻ͯ࣍͠ͷKinesis Stream΁
    public class Boiler {
    // Kinesis Stream͔ΒͷσʔλΛ͏͚ͱΔϋϯυϥ
    public void recordHandler(KinesisEvent event) throws IOException {
    PutRecordsRequest putRecordsRequest
    = getPutRecordsRequest(this.kinesisOutputStreamName);
    List putRecordsRequestEntryList
    = new ArrayList<>();
    // 1ͭͷeventʹ͸ෳ਺ͷϨίʔυ͕ೖ͍ͬͯΔ batch sizeͰઃఆՄೳɻ
    for(KinesisEventRecord rec : event.getRecords()) {
    KinesisMessageModel record = toClass(rec);
    PutRecordsRequestEntry putRecordsRequestEntry
    = new PutRecordsRequestEntry();
    // ϨίʔυͷՃ޻ʢ࣮ࡍʹ͸͜͜Ͱຊจநग़Λ͍ͯ͠·͢ʣ
    ByteBuffer data = ByteBuffer.wrap(new ObjectMapper().writeValueAsString(record));
    putRecordsRequestEntry.setData(data);
    putRecordsRequestEntry.setPartitionKey(record.getSomeKey());
    putRecordsRequestEntryList.add(putRecordsRequestEntry);
    }
    // ࣍ͷKinesis Stream΁ͷPutRecordsͷ૊Έཱ͍ͯͯΔ
    putRecordsRequest.setRecords(putRecordsRequestEntryList);
    PutRecordsResult putRecordsResult
    = this.kinesis.putRecords(putRecordsRequest);
    }
    }

    View full-size slide

  14. Java࣮૷ͷॴײ
    • ͬ͘͞ͱॻ͘ͳΒnode.jsͷ΄͏ָ͕
    • Javaͷ৔߹͸blueprint͕ͳ͍ & Lambda Console͔Βͬ͘͞
    ͱࢼ͢͜ͱ͸Ͱ͖ͳ͍
    • ύοέʔδϯά͸MavenͰ΍͍ͬͯͯɺMaven Shade PluginͰ
    uber jarΛ͓͍͍ͭͬͯͯ͘·͢ɻ
    • uber jar: ґଘϥΠϒϥϦͳͲΛશ෦1ͭͷjarʹ͍Εͨjarͷ͜ͱ
    • ܗଶૉղੳ༻ͷࣙॻ΋jarʹ͍Ε͍ͯ·͢
    Lambda ؔ਺ϋϯυϥʔ (Java) - AWS Lambda
    Apache Maven Shade Plugin – Introduction

    View full-size slide

  15. ࣮૷ʹ͋ͨͬͯؾΛ͚ͭΔ͜ͱ
    • ΤϥʔϋϯυϦϯά
    • 1ͭͰ΋มͳϨίʔυ͕͘ΔͱKinesis
    StreamଆͷϨίʔυ͕expire͢Δ·Ͱ
    Lambda͕retry͚ͭͮ͠Δ
    • failͤ͞Δͱఀࢭͯ͠͠·͏ͷͰɺskip͢Δ
    Α͏ʹ࣮૷͢Δ͜ͱ

    View full-size slide

  16. Lambda࡞੒: aws-cli
    aws lambda create-function --region ap-northeast-1
    --function-name my-lambda-function
    --code S3Bucket=mybucket,S3Key=path/to/my.jar
    --role arn:aws:iam::999999999999:role/lambda_kinesis_rw
    --runtime java8
    --handler com.your.app.Handler::recordHandler
    --description "my kinesis stream!"
    --timeout 15 --memory-size 512
    aws lambda create-event-source-mapping
    --event-source-arn arn:aws:kinesis:ap-northeast-1:999999999999:stream/your-stream
    --function-name my-lambda-function
    --enable --batch-size 100 --starting-position TRIM_HORIZON

    View full-size slide

  17. σϓϩΠํ๏: aws-cli
    • Pull Request -> merge -> build (on Travis CI) -
    > S3
    • Travis CIͰuber jarΛ͍ͭͬͯ͘·͢
    • ͋ͱ͸ update-function-code Ͱ൓ө
    aws lambda update-function-code
    --function-name my-lambda-function
    --s3-bucket mybucket --s3-key path/to/my.jar

    View full-size slide

  18. LambdaͰͷϩΪϯά
    • Log4jΛ͔͍ͭͬͯ·͢
    • ΤϥʔϩάͳͲ͸CloudWatch Logs͔ΒݟΔ͜ͱ
    ͕Ͱ͖Δ
    • खݩͰ͸࠶ݱ͠ͳ͍ෆ۩߹ͳͲ͕͋Δ৔߹ʹ͸
    CloudWatch Logs͔ΒݟΔ͜ͱ
    AWS Lambda ͷ Amazon CloudWatch ϩά΁ͷΞ
    Ϋηε - AWS Lambda
    ϩΪϯά (Java) - AWS Lambda

    View full-size slide

  19. ·ͱΊ
    • Lambda + Kinesis StreamͰจষΛϦΞϧλΠ
    Ϝ෼ྨ͢Δ͜ͱ͕Ͱ͖ΔΑ͏ʹͳΓ·ͨ͠
    • Lambda, ͬ͘͞ͱ͔͓ͭ͑ͯ͢͢ΊͰ͢

    View full-size slide