How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

本文はこちら

https://gist.github.com/suzuken/6033c20a3a3c9e0f5354b88f405240f5

9c47f639b91a66ccf901724eaaf9043d?s=128

Kenta Suzuki

April 22, 2016
Tweet

Transcript

  1. 1.

    How to use AWS Lambda in Document Processing Pipeline @suzu_v

    VOYAGE GROUP 2016/04/22 at AWS Tokyo Office
  2. 2.

    ࢲʹ͍ͭͯ • ͚ͣ͢Μ, https://github.com/suzuken, @suzu_v • GopherͰ͢ / ࠓ೔͸Javaͷ࿩Λ͠·͢ •

    http://fluct.jp Ͱ޿ࠂ഑৴ / ղੳج൫ͷιϑτ ΢ΣΞΤϯδχΞΛ͍ͯ͠·͢
  3. 6.

    ߏ੒ཁૉ • WebΫϩʔϥ (EC2 / Go): URLΛࢦఆͯ͠ίϯςϯπΛऔಘ͢Δ daemon • ຊจநग़ػ

    (Lambda / Java8): ຊจͰ͋Δͱਪఆ͞ΕΔ෦෼Λ ൈ͖ग़͢ • Lucene / KuromojiΛ͔͍͍ͭͨͷͰ • ෼ྨث (EC2 / Go): ຊจ΍ϖʔδ͔ΒಘΒΕΔ৘ใΛݩʹจষΛ ΧςΰϦ෼͚ͳͲΛ୲౰͢Δ • υΩϡϝϯτετΞ (EC2 / Elasticsearch): Ϋϩʔϧͯ͠෼ྨࡁΈ ͷίϯςϯπΛ֨ೲ͠ɺݕࡧՄೳʹ͢Δ • API (EC2 + ELB / Go): ෼ྨ݁ՌΛฦ͢internalͳHTTP API
  4. 7.
  5. 8.

    ΞʔΩςΫνϟ Kinesis StreamΛॏๅ͍ͯ͠·͢ • ϐʔΫͰ~100MB/sͰΫϩʔϥ͕ίϯςϯπΛ fetch • ͦΕΛ௚઀Kinesis StreamʹPutRecordsͰૠೖ •

    Ϋϩʔϥ͸Go੡ (with aws-sdk-go)ɺॻ͖ࠐΈ ͷϦτϥΠ΍όοϑΝϦϯά΋͍ͯ͠Δ • ႈ౳ੑͷ୲อ͸ElasticsearchͰ
  6. 10.

    ͳͥLambda͔ • Kinesis Streamͱͷ࿈ܞ͕؆୯ • ݕূʹ΋Άͪͬͱ৽͍͠Lambda Function࡞Ε ͹͍͍ͷͰखܰ • Kinesis

    Streamͷσʔλ͸shardʹσʔλ͕͋ ΔͷͰಉ͡σʔλͰͷςετ΋खܰ • Testing in Production (Data)
  7. 11.
  8. 13.

    ࣮૷ྫ in Java KinesisͷϨίʔυܗࣜͱରͱͳΔPOJOΦϒδΣΫ τΛ࡞੒ public class KinesisMessageModel implements Serializable{

    public String id; public String url; public String body; public String title; public String description; // ... } see: ྫ: ϋϯυϥʔͷೖग़ྗʹ POJO Λ࢖༻͢Δ (Java) - AWS Lambda
  9. 14.

    σʔλΛՃ޻ͯ࣍͠ͷKinesis Stream΁ public class Boiler { // Kinesis Stream͔ΒͷσʔλΛ͏͚ͱΔϋϯυϥ public

    void recordHandler(KinesisEvent event) throws IOException { PutRecordsRequest putRecordsRequest = getPutRecordsRequest(this.kinesisOutputStreamName); List<PutRecordsRequestEntry> putRecordsRequestEntryList = new ArrayList<>(); // 1ͭͷeventʹ͸ෳ਺ͷϨίʔυ͕ೖ͍ͬͯΔ batch sizeͰઃఆՄೳɻ for(KinesisEventRecord rec : event.getRecords()) { KinesisMessageModel record = toClass(rec); PutRecordsRequestEntry putRecordsRequestEntry = new PutRecordsRequestEntry(); // ϨίʔυͷՃ޻ʢ࣮ࡍʹ͸͜͜Ͱຊจநग़Λ͍ͯ͠·͢ʣ ByteBuffer data = ByteBuffer.wrap(new ObjectMapper().writeValueAsString(record)); putRecordsRequestEntry.setData(data); putRecordsRequestEntry.setPartitionKey(record.getSomeKey()); putRecordsRequestEntryList.add(putRecordsRequestEntry); } // ࣍ͷKinesis Stream΁ͷPutRecordsͷ૊Έཱ͍ͯͯΔ putRecordsRequest.setRecords(putRecordsRequestEntryList); PutRecordsResult putRecordsResult = this.kinesis.putRecords(putRecordsRequest); } }
  10. 15.

    Java࣮૷ͷॴײ • ͬ͘͞ͱॻ͘ͳΒnode.jsͷ΄͏ָ͕ • Javaͷ৔߹͸blueprint͕ͳ͍ & Lambda Console͔Βͬ͘͞ ͱࢼ͢͜ͱ͸Ͱ͖ͳ͍ •

    ύοέʔδϯά͸MavenͰ΍͍ͬͯͯɺMaven Shade PluginͰ uber jarΛ͓͍͍ͭͬͯͯ͘·͢ɻ • uber jar: ґଘϥΠϒϥϦͳͲΛશ෦1ͭͷjarʹ͍Εͨjarͷ͜ͱ • ܗଶૉղੳ༻ͷࣙॻ΋jarʹ͍Ε͍ͯ·͢ Lambda ؔ਺ϋϯυϥʔ (Java) - AWS Lambda Apache Maven Shade Plugin – Introduction
  11. 17.

    Lambda࡞੒: aws-cli aws lambda create-function --region ap-northeast-1 --function-name my-lambda-function --code

    S3Bucket=mybucket,S3Key=path/to/my.jar --role arn:aws:iam::999999999999:role/lambda_kinesis_rw --runtime java8 --handler com.your.app.Handler::recordHandler --description "my kinesis stream!" --timeout 15 --memory-size 512 aws lambda create-event-source-mapping --event-source-arn arn:aws:kinesis:ap-northeast-1:999999999999:stream/your-stream --function-name my-lambda-function --enable --batch-size 100 --starting-position TRIM_HORIZON
  12. 18.

    σϓϩΠํ๏: aws-cli • Pull Request -> merge -> build (on

    Travis CI) - > S3 • Travis CIͰuber jarΛ͍ͭͬͯ͘·͢ • ͋ͱ͸ update-function-code Ͱ൓ө aws lambda update-function-code --function-name my-lambda-function --s3-bucket mybucket --s3-key path/to/my.jar
  13. 19.

    LambdaͰͷϩΪϯά • Log4jΛ͔͍ͭͬͯ·͢ • ΤϥʔϩάͳͲ͸CloudWatch Logs͔ΒݟΔ͜ͱ ͕Ͱ͖Δ • खݩͰ͸࠶ݱ͠ͳ͍ෆ۩߹ͳͲ͕͋Δ৔߹ʹ͸ CloudWatch

    Logs͔ΒݟΔ͜ͱ AWS Lambda ͷ Amazon CloudWatch ϩά΁ͷΞ Ϋηε - AWS Lambda ϩΪϯά (Java) - AWS Lambda