Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Who's in your Cloud? Cloud State Monitoring

Who's in your Cloud? Cloud State Monitoring

When it comes to cloud operations, monitoring security and visibility are critical. Integration by other systems via Cloud APIs is one of the most powerful value drivers of the hyperscale cloud providers.

In this session, we will describe Cloud State Monitoring, including why it is important and who needs awareness in your organization. An explanation of the categories of Cloud APIs (including the management plane, control plane, and data plane) will give us background. Specific use cases across AWS, Azure, and GCP will dive deep into various changes you might not have considered monitoring.


Kevin Hakanson

December 15, 2020


  1. Who’s in your Cloud? Joint Chapter Meeting: MN ISSA +

    CSA MN 15 December 2020 Kevin Hakanson Director of Customer Success & Principal Cloud Solutions Architect kevin.hakanson@opscompass.com https://www.linkedin.com/in/kevinhakanson/
  2. When it comes to cloud operations, monitoring security and visibility

    are critical. • Why Cloud is special? • Explanation of Cloud APIs: Management / Control Plane vs Data Plane • What is Cloud State Monitoring? • Why is Cloud State Monitoring important? • Who needs awareness about Cloud State Monitoring and Use Cases? 2 Objectives
  3. Cloud and Cloud APIs 3

  4. • Amazon Web Services (AWS) • Microsoft Azure (Azure) •

    Google Cloud Platform (GCP) Powerful Value Drivers: • Cost: Enormous investment into tools and services can reduce customer expense to build and operate workloads. • Scalability: Ability to quickly and seamlessly handle immense volumes of activity and data. • Interoperability: Integration by other systems via Cloud APIs expands capabilities and accelerates cloud adoption. Hyperscale Cloud Providers 4 Source: https://www.forbes.com/sites/peterbendorsamuel/2020/03/02/hyperscale-cloud-providers-shaping-the-platform-marketplace/
  5. • Domain 6 of the CSA Security Guidance for Critical

    Areas of Focus in Cloud Computing v4.0 includes these comments about the Management Plane: • Refers to the interfaces for managing your assets in the cloud • Key tool for enabling and enforcing separation and isolation in multitenancy • Delivered via APIs and web consoles (which often use the same APIs) Cloud Security Alliance 5 Source: https://downloads.cloudsecurityalliance.org/assets/research/security-guidance/security-guidance-v4-FINAL.pdf
  6. • Service Infrastructure is divided into three planes based on

    their functionality: • Management Plane – lets developers manage configurations of their services and their usage of services. • Data Plane – handles the data traffic between the clients and the services. The data plane can run in different environments and support both internal and external clients. • Control Plane – controls the data plane based on the configurations coming from the management plane, such as rate limiting. Google Cloud Platform 6 Source: https://cloud.google.com/service-infrastructure/docs/overview
  7. • Azure operations can be divided into two categories: •

    Control Plane – manage resources in your subscription • Data Plane – use capabilities exposed by your instance of a resource type • All requests for control plane operations are sent to the Azure Resource Manager URL. • For Azure global, the URL is https://management.azure.com • Data plane operations are sent to an endpoint that is specific to your instance and aren't limited to REST APIs. Microsoft Azure 7 Source: https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/control-plane-and-data-plane
  8. • Azure subscriptions and resource groups emit events related to

    resource changes or actions onto the Azure Event Grid • GET operations don't create events • Resource events are created for PUT, PATCH, POST, and DELETE operations that are sent to management.azure.com • Operations sent to the data plane don't create events • e.g. myaccount.blob.core.windows.net Microsoft Azure 8 Source: https://docs.microsoft.com/en-us/azure/event-grid/event-schema-resource-groups
  9. The Azure REST APIs are designed for resiliency and continuous

    availability. Control plane operations (requests sent to management.azure.com) in the REST API are: • Distributed across regions. Some services are regional. • Distributed across Availability Zones in locations that have multiple Availability Zones. • Not dependent on a single logical data center. • Never taken down for maintenance activities. Resiliency of Azure APIs 9 Source: https://docs.microsoft.com/en-us/rest/api/azure/#resiliency-of-azure-apis
  10. • AWS CloudTrail differentiates between data and management events based

    on planes: • Management events are also known as control plane operations. • Data events are also known as data plane operations and are often high-volume activities. Amazon Web Services 10 Source: https://aws.amazon.com/premiumsupport/knowledge-center/cloudtrail-data-management-events/
  11. • Region – isolated and independent of the other Regions

    • Service Endpoint – URL of the entry point • If a service supports Regions, the resources in each Region are independent of similar resources in other Regions. • s3.us-east-1.amazonaws.com • rds.us-east-2.amazonaws.com • Some services, such as IAM, do not support Regions. • iam.amazonaws.com AWS Regions and Service Endpoints 11 Source: https://docs.aws.amazon.com/general/latest/gr/rande.html
  12. • No common definition for Management Plane, Control Plane, or

    Data Plane • Significant differences in architecture and infrastructure of Cloud APIs across the hyperscale cloud providers 12 Recap
  13. Cloud State Monitoring 13

  14. • A specialized definition of state is used for computer

    programs that operate sequentially on streams of data. • Information about previous data received is stored and used to affect the processing of the current data. • This is called a stateful protocol and the data carried over from the previous processing cycle is called the state. State (Computer Science) 14 Source: https://en.wikipedia.org/wiki/State_(computer_science)
  15. • Treats cloud resource changes as a stream of data

    and intelligently monitors both the current data as well as changes from the previous data (state). • Examples: • Monitor current data for security misconfigurations • Monitor changes in data for configuration drift Cloud State Monitoring 15
  16. • Posture management is a set of new functions that

    realize many previously imagined or attempted ideas that were difficult, impossible, or extremely manual before the advent of the cloud. • Emerging discipline: Security posture management will disrupt many norms of the security organization in a healthy way with these new capabilities and may shift responsibilities among roles or create new roles. Cloud Security Posture Management (CSPM) 16 Source: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/organize/cloud-security-posture-management
  17. 17 Source: https://opscompass.com/cspm/

  18. 18 AWS IAM AWS CloudFormation Service AWS CLI / PowerShell

    AWS SDK AWS CloudFormation AWS Services AWS CloudWatch Events AWS Console API Azure Resource Manager Azure Services Azure RBAC Azure CLI / PowerShell Azure SDK ARM Template Azure Event Grid API API Azure Portal Policy Enforcement Point (PEP) API API Events AWS Config Rules Azure Policy Policy Enforcement Point (PEP) AWS CloudTrail Azure Activity Log Policy Enforcement Point (PEP) Events
  19. Who Needs Awareness? 19

  20. 20 Source: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/organize/organization-structures Modern cloud-first operating model Focus on self-service

    and democratization with centralized governance, security, platform, and automation Cloud Center of Excellence (CCoE) DevOps Teams Architects Risk & Compliance
  21. Personas 21 Developer Cloud Architect Risk & Compliance Do we

    have visibility into all our cloud resources and policies? How can I intelligently track compliance when cloud resources are continuously changing? Cloud Security Automation is our key to success. Doesn’t the CI/CD Pipeline control all the cloud resource changes? Have we drifted from the security and identity baselines?
  22. • Customer (You) • Workload teams releasing software • Platform

    teams enhancing automation • Security teams updating policies • Cloud Providers • Adding new cloud-native service features • Updating versions of hosted open-source software • Continuous change is potentially creating risk • Shared Responsibility Model Who is Making Changes? 22
  23. 23 Source: https://docs.microsoft.com/en-us/azure/security/fundamentals/shared-responsibility Source: https://www.cisecurity.org/blog/shared-responsibility-cloud-security-what-you-need-to-know/ Source: https://www.cisecurity.org/blog/shared-responsibility-cloud-security-what-you-need-to-know/ Source: https://cloud.google.com/blog/products/containers-kubernetes/exploring-container-security- the-shared-responsibility-model-in-gke-container-security-shared-responsibility-model-gke

  24. Example: Azure Storage Account 24

  25. • Azure service that contains data objects: • blobs, files,

    queues, tables, disks • The Storage Resource Provider enables you to manage your storage account programmatically • API requires all requests to be versioned GET https://management.azure.com /subscriptions/{subscriptionId} /resourceGroups/{resourceGroupName} /providers/Microsoft.Storage/storageAccounts /storageAccounts/{accountName} ?api-version={apiVersion} Azure Storage Account 25 Source: https://docs.microsoft.com/en-us/rest/api/storagerp/
  26. 26 PS /home/kevin> az rest --uri https://management.azure.com/subscriptions/9b32941b-395e-48af-815e-915dca2195e5/resourceGroups/20201022- storage-rg/providers/Microsoft.Storage/storageAccounts/20201022storageconsole?api-version={apiVersion} ?api-version=2018-11-01 ?api-version=2019-06-01

    Encryption key type to be used for the encryption service. 'Account' key type implies that an account-scoped encryption key will be used. 'Service' key type implies that a default service key is used. List of private endpoint connection associated with the specified storage account Set the minimum TLS version to be permitted on requests to storage. The default interpretation is TLS 1.0 for this property. Allow or disallow public access to all blobs or containers in the storage account. The default interpretation is true for this property. Four Security Features Added
  27. • Azure Storage supports TLS 1.0, TLS 1.1, and TLS

    1.2 • By default, Azure Storage accounts permit clients to use the oldest version of TLS (TLS 1.0) • The MinimumTlsVersion property is not set by default and does not return a value until you explicitly set it. • If the property value is null, then the storage account will permit requests sent with TLS version 1.0 or greater. • Configuring the minimum TLS version requires version 2019- 04-01 or later of the Azure Storage resource provider. Azure Storage TLS Version 27 Source: https://docs.microsoft.com/en-us/azure/storage/common/transport-layer-security-configure-minimum-version?tabs=template
  28. 28 1 2 3 4 1 2 3 4 5

    5 “DevOps style” Infrastructure as Code (IaC) deployment using a JSON based Azure Resource Manager template Should we consider the JSON based output from the REST API also to be Infrastructure as Code? minimumTlsVersion ?
  29. Example: Amazon RDS 29

  30. Amazon RDS “auto upgrades” 30 “If you want Amazon RDS

    to upgrade the DB engine version of a database automatically, you can enable auto minor version upgrades for the database.” "DBInstanceArn": "arn:aws:rds:us-east-2:123456789012:db:example", "LatestRestorableTime": "2020-11-21T14:10:00.000Z", "EngineVersion": "5.7.28", "AutoMinorVersionUpgrade": true, Version General Availability AWS RDS Support MySQL 5.7.31 2020-07-13 2020-10-01 MySQL 5.7.30 2020-04-27 2020-06-25 MySQL 5.7.29 2020-01-13 n/a MySQL 5.7.28 2019-10-14 2020-02-20 Sources: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.Upgrading.html https://dev.mysql.com/doc/relnotes/mysql/5.7/en/ https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/WhatsNew.html
  31. AWS DescribeDBEngineVersions API 31 { "DBEngineVersions": [ { "Engine": "mysql",

    "EngineVersion": "5.7.28", "DBParameterGroupFamily": "mysql5.7", "DBEngineDescription": "MySQL Community Edition", "DBEngineVersionDescription": "MySQL 5.7.28", "ValidUpgradeTarget": [ { "Engine": "mysql", "EngineVersion": "5.7.30", "Description": "MySQL 5.7.30", "AutoUpgrade": false, "IsMajorVersionUpgrade": false }, { "Engine": "mysql", "EngineVersion": "5.7.31", "Description": "MySQL 5.7.31", "AutoUpgrade": false, "IsMajorVersionUpgrade": false }, Note: This is effectively another cloud configuration that needs to be monitored. Source: https://docs.aws.amazon.com/cli/latest/reference/rds/describe-db-engine-versions.html
  32. Amazon RDS MySQL Versions 32 “For some RDS for MySQL

    major versions in some AWS Regions, one minor version is designated by RDS as the automatic upgrade version.” “RDS doesn't automatically set newer released minor versions as the automatic upgrade version.” Source: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.MySQL.html MySQL Version us-east-1 us-east-2 us-west-1 us-west-2 5.5.46 – 5.5.62 n/a 5.5.61 n/a n/a 5.6.34 – 5.6.49 n/a 5.6.44 5.6.44 n/a 5.7.16 – 5.7.31 n/a 5.7.26 5.7.26 n/a 8.0.11 – 8.0.21 n/a 8.0.17 8.0.15 n/a
  33. • AutoMinorVersionUpgrade originally appeared as a simple Boolean to monitor

    • Paired with a change in EngineVersion might indicate an AWS initiated change • However, the absence of an EngineVersion change when newer versions are available is also interesting to monitor • The set of available versions is an AWS “resource” that should be monitored • AWS lack of Region consistency introduces an additional dimension of complexity 33 Recap
  34. Example: Azure RBAC 34

  35. Fine-grained access control to Azure “control plane” Grant access by

    assigning Security Principal a Role at a Scope Assignments are inherited down the resource hierarchy • If the user doesn't have a role with the Action at the requested Scope, access is not granted. • If a deny assignment applies, access is blocked. • Otherwise, access is granted. Azure Role-Based Access Control (RBAC) 35 Source: https://docs.microsoft.com/en-us/azure/role-based-access-control/overview
  36. Azure roleDefinitions / roleAssignments 36 Examples of extension resources, which

    are resource types that are applied to another resource and extend that resource's capabilities. Therefore, also important to monitor. Source: https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/extension-resource-types
  37. Azure Roles vs. Azure AD Roles 37 Source: https://docs.microsoft.com/en-us/azure/role-based-access-control/rbac-and-directory-admin-roles At

    a high level, Azure roles control permissions to manage Azure resources, while Azure AD roles control permissions to manage Azure Active Directory resources.
  38. • Helps enforce organizational standards at-scale and provide governance for

    resource consistency, regulatory compliance, security, cost, and management. • Evaluates resources in Azure by comparing the properties of those resources to business rules. • Compared to Azure RBAC: • Doesn't restrict actions (also called operations) but ensures that resource state is compliant to your business rules without concern for who made the change or who has permission to make a change. • Azure RBAC focuses on managing user actions at different scopes for when control of an action is required. • The combination of Azure RBAC and Azure Policy provides full scope control in Azure. Azure Policy 38 Source: https://docs.microsoft.com/en-us/azure/governance/policy/overview
  39. • A tag is a label consisting of a user-defined

    key and value attached to resources as metadata • Tags help you organize your resources and can enable cost allocation, automation, and access control • Tags can be IT aligned • Workload, application, function, or environment • Tags can be Business aligned • Accounting, business ownership, or business criticality • Some services still lack tags, don’t support tag-on-create, or have other limitations Resource Tags 39
  40. • Processes requests to create or update resources before handing

    to the appropriate Resource Provider • Can enforce tagging rules and conventions by automatically applying the needed tags during deployment • Example built-in policies that could affect a pipeline deployment from an upstream DevOps team Azure Policy – Tag Policies 40 Source: https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/tag-policies Name Effects Add a tag to resources modify Add or replace a tag on resources modify Append a tag and its value from the resource group append Require a tag on resources deny Require a tag and its value on resources deny … …
  41. Example: AWS IAM Policy 41

  42. AWS Policy Evaluation Logic 42 Source: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html

  43. • An AWS managed policy is a standalone policy that

    is created and administered by AWS. • Standalone policy means that the policy has its own Amazon Resource Name (ARN) that includes the policy name. • For example, arn:aws:iam::aws:policy/SecurityAudit is an AWS managed policy. • You cannot change the permissions defined in AWS managed policies. AWS occasionally updates the permissions defined in an AWS managed policy. AWS Managed Policies 43 Source: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html
  44. Monitor AWS Managed IAM Policies 44 Source: https://github.com/z0ph/aws_managed_policies Useful to

    see fixed policies and product launch just a few minutes/hours before the official announcement. Activate Releases Only feature of GitHub or follow the dedicated Twitter Account. Periodically grabs AWS IAM Managed policy and check if there are any changes initiated by AWS development teams. @mamip_aws
  45. Used by Cloud Security Posture Management (CSPM) solutions. Grants permissions

    to view configuration data for many AWS services and to review their logs. Customer cannot set the default version for an AWS managed policy. arn:aws:iam::aws:policy/SecurityAudit 45
  46. • List: Permission to list resources within the service to

    determine whether an object exists. • Read: Permission to read but not edit the contents and attributes of resources in the service. • Write: Permission to create, delete, or modify resources in the service. • Permissions management: Permission to grant or modify resource permissions in the service. • Tip: To improve the security of your AWS account, restrict or regularly monitor policies that include the Permissions management access level classification. • Tagging: Permission to perform actions that only change the state of resource tags AWS Access Levels 46 Source: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_understand-policy-summary-access-level-summaries.html
  47. Authorization strategy that defines permissions based on attributes. In AWS,

    these attributes are called tags. Tags can be attached to IAM principals (users or roles) and to AWS resources. Using tag condition keys, ABAC policies grant permissions when the principal's tag matches the resource tag. AWS Attribute-Based Access Control (ABAC) 47 Source: https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html
  48. • If you use tag-based access control, the Tagging access

    level has now been elevated to Permissions Management and should be monitored. • Amazon CloudWatch Events can monitor for changes to tags and track the tag state on AWS resources. • CloudWatch Event rules can be built to match related tag changes and perform: • Automated workflows using AWS Lambda functions • Human workflows using your security team to audit changes Monitor AWS Tag-Based Access Control 48 Source: https://aws.amazon.com/blogs/mt/monitor-tag-changes-on-aws-resources-with-serverless-workflows-and-amazon-cloudwatch-events/
  49. • Cloud APIs are one of the powerful value drivers

    of hyperscale cloud providers • Cloud resources can be changed by a variety of actors. • Important to monitor the current state of a cloud resource for security and other misconfigurations. • A change in state of a cloud resource can also provide information related to your security posture. 49 Closing Thoughts
  50. Thank You 50 Kevin Hakanson Director of Customer Success &

    Principal Cloud Solutions Architect kevin.hakanson@opscompass.com https://www.linkedin.com/in/kevinhakanson/