Accelerate Insights and Streamline Ingestion with a Data Lake on AWS

© 2020, Amazon Web Services, Inc. or its Affiliates. All
rights reserved. Accelerate insights and streamline ingestion with a data lake on AWS

Learn how to get the full benefits of cloud data
lakes without compromising the productivity of your DevOps team. Presenters will explain how a modern data lake can: ◦ Realize the elasticity of the cloud and reduce costs through consumption utilization ◦ Prepare your structured, unstructured, and semi-structured data for integration with analytics and machine learning tools ◦ Support data democratization and enable collaboration while maintaining security and data governance Agenda Kanchan Waikar Solutions Architect, AWS Helen Beal Chief Ambassador, DevOps Institute

Helen Beal Helen Beal is a DevOps and Ways of
Working coach, Chief Ambassador at DevOps Institute and an ambassador for the Continuous Delivery Foundation. She is the Chair of the Value Stream Management Consortium and provides strategic advisory services to DevOps industry leaders such as Plutora and Moogsoft. She is also an analyst at Accelerated Strategies Group. She hosts the Day-to-Day DevOps webinar series for BrightTalk, speaks regularly on DevOps topics, is a DevOps editor for InfoQ and also writes for a number of other online platforms. She regularly appears in TechBeacon’s DevOps Top100 lists and was recognized as the Top DevOps Evangelist 2020 in the DevOps Dozen awards. Herder of Humans @bealhelen

Flow: Talk Map Data is everywhere Extracting insights Processing ML,
AI and DataOps DevOps Practices 3rd party data Batch Stream 3Vs

PAGE | Data is Everywhere Every application, every service, every
environment 5 Icons by Eucalyp and Freepik from Flaticon

PAGE | Data from Third Parties Find, subscribe to, and
use third-party data in the cloud 6 Icons by Freepik from Flaticon

PAGE | Extracting Data Insights Outcomes Data Insights An insight
is only of value when it has a positive outcome 7 Icons by Freepik from Flaticon Why is this so hard to do?

PAGE | Different Data, Different Needs The 3 Vs of
Big Data 8 VOLUME VELOCITY VARIETY 1 2 3 It’s a scaling problem

PAGE | Stream Processing The goal is to process big
data volumes and provide useful insights into the data prior to saving it to long-term storage 9 Dimension Batch Stream History Traditional Modern Data Processing Location System of Record Source In Event of Failure Restart batch Retry increment Pros Simple and robust Live, scalable, fault tolerant Cons Latency Complex, expensive? Use cases for stream processing are found when systems handle big data volumes and where real-time results matter. If the value of the information contained in the data stream decreases rapidly as it gets older, stream processing is appropriate. E.g.: • Real-time analytics • Anomaly, fraud or pattern detection • Complex event processing • Real-time statistics and dashboards • Real-time extract, transform, load (ETL) • Implementing event driven architectures

PAGE | Aggregation and Batch Processing • Use Batch Processing
jobs to prepare large, bulk datasets for downstream analytics • Avoid just lifting and shifting batch processing to AWS - use the opportunity to improve the service • Automate and orchestrate everywhere • Use Spot Instances to save on ﬂexible batch processing jobs • Continuously monitor and improve batch processing • Redshift for datawarehousing needs 10

PAGE | Structured and Unstructured Data Semistructured data uses tagging
systems or other markers, separating different elements and enabling search (self-describing); think JSON, CSV, XML 11 Dimension Structured Unstructured Format Defined Undefined Type Qualitative Quantitative Usually Stored Data warehouses Data lakes Search and Analyze Easy More work Database RDBMS NoSQL Programming Language SQL Various Analysis Regression, classification, clustering Data mining, data stacking Customer Insights High level Deeper insights into sentiment and behavior 20%- 80%+ Unstructured data: • Documents • Publications • Reports • Emails • Social media • Videos • Images • Audio • Mobile activity • Satellite imagery • Sensors

PAGE | Centrally Managed Data Data warehouses, lakes and lake
houses: key to enabling analytics 12 • Predictive analysis • Join LoBs • Cross-organizational insights • Make better business decisions • Automate DSS • Improve customer interactions • Improve R&D innovation choices • Increase operational eﬃciencies

PAGE | ML, AI and DataOps 13 A DevOps team
quickly builds a high-quality, device-friendly app on a cloud-based platform designed with developer and consumer usability in mind and makes it available via self-service. Icons by Smashicons, Freepic, Dimitry Miroliubov, Eucalyp from Flaticon Deep exploration Personalized Insights Real-time Queries Transparency The business doesn’t need to be data engineers or scientists to search using natural language for the answers they need and gain the insights that will enable them to make intelligent, data-driven business decisions. Context Anomaly Detection Causal Relationships Trend Isolation Noise Reduction Segmentation

PAGE | Leveraging DevOps Practices DevOps DataOps Incremental , continuous
change Data needs to be mined and business intelligence analyzed at speed and with adaptability too. Systems from backlog to deployment must handle data needs. CICD & DevOps Toolchains Teams working with data need to leverage the power of automation to maximise throughput and stability and provide CICD capabilities and limited blast radius. The Three Ways We want to accelerate ﬂow, amplify feedback and use our data to drive experiments too. Monitoring and observability are key with AI for feedback. A high-trust, collaborative culture In order to build trust in a DevOps culture we have data-driven, not opinion driven conversations. Data must be available real-time, on demand and via self-service. Value stream centric working Truly understanding ﬂow, means all people in the value stream have a profound understanding of the end-to-end system and this is driven by data insights. “We build it, we own it.” Teams must be multifunctional, cross-skilling must be standard practice, it must be quick and easy to get results from tools - choose those designed with usability. Focus on value outcomes Insights lead to decisions lead to measurement experience improvements for the customer: AI accelerates mean time to outcome (MTTO). 14

PAGE | Key Takeaways • There is a LOT of
data, coming from many different sources in multiple formats at variable speeds • Remember the 3Vs: Volume, Velocity and Variety; these demand scalability • Different data has different needs Accelerate Insights and Streamline Ingestion with a Data Lake on AWS 15 • Making data available centrally is key for eﬃcient processing and access • The objective is to make better business decisions • Those business decisions must result in sublime customer experiences • AI/ML and predictive analytics accelerate time to insight and time to outcome • This makes more innovation time available to build differentiating features • DataOps accelerates the data pipeline The 3 Vs Data as a Service Augmented Analytics

THANK YOU

Accelerate Insights and Streamline Ingestion wi...

Accelerate Insights and Streamline Ingestion with a Data Lake on AWS

Helen Beal

More Decks by Helen Beal

Other Decks in Technology

Featured

Transcript

© 2020, Amazon Web Services, Inc. or its Affiliates. All

Learn how to get the full benefits of cloud data

Helen Beal Helen Beal is a DevOps and Ways of

Flow: Talk Map Data is everywhere Extracting insights Processing ML,

PAGE | Data is Everywhere Every application, every service, every

PAGE | Data from Third Parties Find, subscribe to, and

PAGE | Extracting Data Insights Outcomes Data Insights An insight

PAGE | Different Data, Different Needs The 3 Vs of

PAGE | Stream Processing The goal is to process big

PAGE | Aggregation and Batch Processing • Use Batch Processing

PAGE | Structured and Unstructured Data Semistructured data uses tagging

PAGE | Centrally Managed Data Data warehouses, lakes and lake

PAGE | ML, AI and DataOps 13 A DevOps team

PAGE | Leveraging DevOps Practices DevOps DataOps Incremental , continuous

PAGE | Key Takeaways • There is a LOT of

THANK YOU