Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Serverless Analytics and Monitoring Sebastian Hesse · K15t · @seeebiii

Slide 3

Slide 3 text

You build it. Containers, Serverless Functions, Databases, File Storage, Frontend, ...

Slide 4

Slide 4 text

You run it. “As long as I can use the app, everything is fine.”

Slide 5

Slide 5 text

You sell it. Fantastic! Your app grows!

Slide 6

Slide 6 text

Your app is doing well, but now you need to continuously satisfy everyone. Something great is ahead of you!

Slide 7

Slide 7 text

“I love your product! Can you add a feature for me?” Customer

Slide 8

Slide 8 text

“THIS S#!T IS NOT WORKING!!!!” Customer #2

Slide 9

Slide 9 text

“We need to scale the system!” Developer

Slide 10

Slide 10 text

“Was it worth to spend three months on it?” Management

Slide 11

Slide 11 text

And now...?

Slide 12

Slide 12 text

Monitoring Get insights how your app is performing. Identify bottle necks and improve your architecture. Analytics Know how your app is used. Make informed decisions about your next steps and evaluate the previous ones. Next

Slide 13

Slide 13 text

Raise your hands if you are using AWS! Compare Cloud Services: http://comparecloud.in/

Slide 14

Slide 14 text

Monitoring Failures 1.

Slide 15

Slide 15 text

Even big companies fail. 2019: Google recovers from outage that took down YouTube, Gmail and Snapchat. https://www.theverge.com/2019/6/2/18649635 2019: Azure global outage: Our DNS update mangled domain records, says Microsoft https://www.zdnet.com/article/azure-global-outage-our-dns-update-mangled-domain-records-says-microsoft/ 2017: AWS's S3 outage was so bald Amazon couldn't get into its own dashboard to warn the world. https://www.theregister.co.uk/2017/03/01/aws_s3_outage/

Slide 16

Slide 16 text

Your architecture Previous Talk: Going Serverless https://youtu.be/Lihu4oyS_0I Webhook
 Processing REST API Frontend files

Slide 17

Slide 17 text

Your architecture will fail! Webhook
 Processing REST API Frontend files Just wait for it...

Slide 18

Slide 18 text

Monitoring Identify Metrics 2.

Slide 19

Slide 19 text

80+ AWS services are automatically publishing metrics to CloudWatch

Slide 20

Slide 20 text

750+ metrics available (Try: aws cloudwatch list-metrics)

Slide 21

Slide 21 text

Example: AWS Lambda Invocations The number of times a function is invoked. Errors The number of failed invocations, e.g. due to errors. Duration The elapsed time from function start until the execution ends. Throttles The number of throttled Lambda invocations. http://bit.ly/aws-lambda-metrics

Slide 22

Slide 22 text

How to find the best metrics for your app?

Slide 23

Slide 23 text

Which feature is providing the greatest value to your app?

Slide 24

Slide 24 text

Focus Start to monitor your key app metrics. Then monitor everything else. Webhook
 Processing REST API Frontend files

Slide 25

Slide 25 text

Number of (un-)successful syncs If we have a high percentage of unsuccessful syncs, this is an indicator that there is something wrong. Duration of a synchronization If it takes too much time to synchronize data, our customers are not satisfied. Response time of HTTP requests If communication to Jira or other AWS services is too high, it will influence the duration of a sync. Example: Backbone Issue Sync

Slide 26

Slide 26 text

Monitoring Tools 3.

Slide 27

Slide 27 text

CloudWatch

Slide 28

Slide 28 text

Events Listen to events in your AWS account. Logs Write and view log messages of your services. Metrics Use metrics to describe the (health) status of your services. Alarms Get notified if metrics are crossing defined thresholds Amazon CloudWatch

Slide 29

Slide 29 text

Standard Metrics Custom Metrics Use if... They already reflect your key app metrics. For example, if webhook processing is done by two contiguous Lambda functions. Use if... You can not make use of the standard ones. For example, if you have custom error reports in your app.

Slide 30

Slide 30 text

CloudWatch: Custom Metrics // create metric data MetricDatum datum = new MetricDatum() .withMetricName("sync_error") .withUnit("Count") .withValue(1) .withTimestamp(new Date());

Slide 31

Slide 31 text

CloudWatch: Custom Metrics // create metric data MetricDatum datum = ...; // prepare request to CloudWatch PutMetricDataRequest request = new PutMetricDataRequest() .withNamespace("Backbone Issue Sync") .withMetricData(datum); // send request cloudWatchClient.putMetricData(request);

Slide 32

Slide 32 text

Add Widgets Present the most important numbers. Dynamic Metrics Add mathematical expressions to your widgets. Custom Dashboard

Slide 33

Slide 33 text

What if... I don't look at this dashboard?

Slide 34

Slide 34 text

CloudWatch Alarms Create alarms for the metrics you have created. Think about good thresholds. Send alert via SNS Then, configure notifications to be sent out to your developers or operations team using SNS. Infrastructure as Code Add the alarm configuration to your Infrastructure as Code template, e.g. CloudFormation. Configure Alarms

Slide 35

Slide 35 text

Add to CloudFormation TooManyErrorsAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: "TooManyErrorsAlarm" ActionsEnabled: true AlarmActions: - SnsAlarmTopic Namespace: "Backbone Issue Sync" MetricName: "sync_error" ComparisonOperator: GreaterThanThreshold Statistic: p95 Threshold: 100 # ...

Slide 36

Slide 36 text

That's great! What will I do if I receive an alarm?

Slide 37

Slide 37 text

X-Ray

Slide 38

Slide 38 text

Trace your service interdependencies. X-Ray DynamoDB S3 Buckets Lambda

Slide 39

Slide 39 text

Enable X-Ray Tracing // AWS Lambda in CloudFormation SAM template MyFunction: Type: AWS::Serverless::Function Properties: Handler: index.myFunction Runtime: nodejs10.x CodeUri: target Tracing: 'Active'

Slide 40

Slide 40 text

X-Ray in your code // automatic tracing for AWS SDK AmazonDynamoDBClientBuilder .standard() .withRequestHandlers(new TracingHandler()) .build();

Slide 41

Slide 41 text

X-Ray in your code // custom tracing, e.g. when requesting Jira AWSXRayRecorder xrayRecorder = ...; String segmentName = "sendJiraRequest ..."; Subsegment subsegment = xrayRecorder.beginSubsegment(segmentName); try { // send request to Jira } catch (Exception ex) { subsegment.addException(ex); subsegment.setError(true); } finally { xrayRecorder.endSubsegment(); }

Slide 42

Slide 42 text

Investigate Traces

Slide 43

Slide 43 text

Caution! Keep an eye on the pricing and the artifact size of your Lambda function. Pricing: 5$ / 1m traces

Slide 44

Slide 44 text

Dynamically Enable/Disable String enabled = System.getenv("ENABLE_TRACING"); if ("true".equals(enabled)) { // build AWS clients with X-Ray } else { // build AWS clients without }

Slide 45

Slide 45 text

AWS Lambda

Slide 46

Slide 46 text

Scheduling Lambda Functions Lambda CloudWatch Rule triggers based on certain events or a schedule

Slide 47

Slide 47 text

Check APIs Schedule a Lambda function to regularly call your own APIs and check that your service is available. Endless Options

Slide 48

Slide 48 text

Caution! If your cloud provider goes down, you will not notice it.

Slide 49

Slide 49 text

Check APIs Schedule a Lambda function to regularly call your own APIs and check that your service is available. Check X-Ray data You can access the X-Ray API and check if traces reached a certain threshold or the error rate is higher than expected. Endless Options

Slide 50

Slide 50 text

GetServiceGraph Retrieve per service: - Error statistics - Fault statistics - Ok count - Total count http://bit.ly/xray-getservicegraph

Slide 51

Slide 51 text

Check APIs Schedule a Lambda function to regularly call your own APIs and check that your service is available. Check X-Ray data You can access the X-Ray API and check if traces reached a certain threshold or the error rate is higher than expected. Check CloudWatch logs Attach a Lambda function to different CloudWatch log streams and analyze them - or send them to another service. Endless Options

Slide 52

Slide 52 text

CloudWatch Logs to Lambda Lambda triggers based on new CloudWatch logs CloudWatch Log Event Problem: One log group per Lambda function. Solution: Listen to multiple log groups.

Slide 53

Slide 53 text

CloudWatch Insights

Slide 54

Slide 54 text

Select Sources Select source log groups which you want to query. Write Filter Write a filter query to retrieve the data you want. Matching Messages View all messages matching the filter query. Query Log Messages

Slide 55

Slide 55 text

Example Filter Queries // Filter Expression Using a Trace Key fields @message | filter @message like "MyTraceKey" | sort @timestamp ASC // Auto JSON Detection fields @message | filter data.clientKey = "123-abc-..." | sort @timestamp ASC

Slide 56

Slide 56 text

External Services

Slide 57

Slide 57 text

External Services

Slide 58

Slide 58 text

Monitoring - Monitor app metrics - Get notified about alarms - Further investigation possible Checklist Status - Serverless - Low-cost - Without code changes
 (if you want)

Slide 59

Slide 59 text

Analytics Status Quo 1.

Slide 60

Slide 60 text

Descriptive Describe your data with statistics and reports. Diagnostic Investigate your data and look for reasons. Predictive Identify patterns and prepare for the future. Prescriptive Optimize your actions and find new approaches. Types of Analytics

Slide 61

Slide 61 text

Web Analytics Analyze user interactions with your app, for example using Google Analytics. Business Analytics Find out about business statistics like sales numbers and similar. Data Analytics Investigate the data you have stored - configuration data, processing data and more. Thousands of Tools

Slide 62

Slide 62 text

Data Storage DynamoDB RDS Kinesis S3 Buckets

Slide 63

Slide 63 text

Analytics Asking Questions 2.

Slide 64

Slide 64 text

Questions 2) Which errors occur in my app? 3) Can I identify access patterns? 4) Who is producing which traffic within my app? 1) Which configurations or settings are popular among my users?

Slide 65

Slide 65 text

Questions 1) Which configurations or settings are popular among my users? 2) Which errors occur in my app? 3) Can I identify access patterns? 4) Who is producing which traffic within my app?

Slide 66

Slide 66 text

Naive Approach DynamoDB scan

Slide 67

Slide 67 text

Naive Approach DynamoDB Lambda scan

Slide 68

Slide 68 text

Pros Cons Scan operations cost a lot! It's time consuming - Lambda functions can timeout. You don't want to do this on your production database. It's easy to setup. You are flexible. Serverless.

Slide 69

Slide 69 text

Naive Approach #2 DynamoDB Lambda stream Your Lambda function is invoked for each updated entry in your database table. Then simply analyze it.

Slide 70

Slide 70 text

Pros Cons Only a few database updates per function call. No negative impact on database. Serverless.

Slide 71

Slide 71 text

Better Approach DynamoDB Lambda stream Redshift push Disadvantage: Your own pricey cluster.

Slide 72

Slide 72 text

Not really satisfying...

Slide 73

Slide 73 text

Analytics Amazon Athena 3.

Slide 74

Slide 74 text

Amazon Athena "Start querying data instantly. Get results in seconds. Pay only for the queries you run." A serverless service to run SQL queries on your S3 data.

Slide 75

Slide 75 text

Source Define a source bucket or bucket folder to scan your data from. Schema Define the schema of your data. This data will be used to create your data table. Query Now you can query your data based on the defined schema. Athena - How It Works

Slide 76

Slide 76 text

Amazon Athena DynamoDB Athena S3 Bucket query stream Lambda push Push CSV-like or JSON data.

Slide 77

Slide 77 text

Normalize Data Lambda function normalize(config) { let entries = ''; for (...) { entries += item1 + ';' + item2 + ... + '\n'; } s3.putObject({ Body: entries, // ... }); }

Slide 78

Slide 78 text

Questions 1) Which configurations or settings are popular among my users? 2) Which errors occur in my app? 3) Can I identify access patterns? 4) Who is producing which traffic within my app?

Slide 79

Slide 79 text

Which fields are usually used in a synchronization? Which field mappings are the most popular ones? Example // Search for popular fields SELECT bac_config_data.fieldId fieldId, count(*) fieldCount FROM bac_config_data GROUP BY fieldId ORDER BY fieldCount DESC; // Search for less popular field mappings SELECT bac_config_data.fieldMapping mappingName, count(*) mappingCount FROM bac_config_data GROUP BY mappingName ORDER BY mappingCount ASC;

Slide 80

Slide 80 text

Important! Think about how you normalize your data.

Slide 81

Slide 81 text

Questions 1) Which configurations or settings are popular among my users? 2) Which errors occur in my app? 3) Can I identify access patterns? 4) Who is producing which traffic within my app?

Slide 82

Slide 82 text

Depends on your architecture. Measure Traffic in Your App REST API request

Slide 83

Slide 83 text

Depends on your architecture. Measure Traffic in Your App Kinesis Webhooks Lambda queue process REST API request

Slide 84

Slide 84 text

Analytics Amazon Kinesis 4.

Slide 85

Slide 85 text

Amazon Kinesis A streaming service to collect a huge amount of data and process it.

Slide 86

Slide 86 text

How It Works Kinesis Webhooks Lambda queue process Lambda process

Slide 87

Slide 87 text

How It Works Kinesis Webhooks Lambda queue Lambda Kinesis Data Analytics Run SQL queries on your streaming data. process process analyze

Slide 88

Slide 88 text

How It Works Kinesis Webhooks Lambda queue Lambda Kinesis Data Analytics process process analyze Lambda forward S3 Bucket forward

Slide 89

Slide 89 text

How It Works Kinesis Webhooks Lambda queue Lambda Kinesis Data Analytics process process analyze Lambda forward S3 Bucket Athena forward query

Slide 90

Slide 90 text

Pricing Kinesis is a service where you pay a price per hour. Kinesis Data Streams: >10$/month/shard Kinesis Data Analytics: >80$/month

Slide 91

Slide 91 text

Analytics - Get app data insights - Answer your custom questions - Further optimization possible Checklist Status - Serverless - Pay as you go - Additional code required

Slide 92

Slide 92 text

Conclusion

Slide 93

Slide 93 text

Stakeholders Customer I love your product! Can you add a feature for me? Customer #2 THIS S#!T IS NOT WORKING!!!! Developer We need to scale the system! Management Was it worth to spend three months on it?

Slide 94

Slide 94 text

Monitoring - (Customer) - Customer #2 - Developer - Management Analytics - Customer - (Customer #2) - Developer - Management Goal Check

Slide 95

Slide 95 text

Serverless No need to manage anything by yourself. Pay as you go and if you do not use it, you will not pay. S3 = Key Service S3 is a key service in the AWS ecosystem. If you have data there, you can use it almost everywhere. Extend It You know the basics now. Use your data and add sugar services like Machine Learning to be one step ahead. Take Aways

Slide 96

Slide 96 text

Need more serverless content? www.sebastianhesse.de / @seeebiii Thank you!

Slide 97

Slide 97 text

No content