Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Azure Storage - Learnings

Azure Storage - Learnings

Azure Storage is underpinning of lot of services and used extensively to provision machines, store data. This implies developers need to understand challenges of performance, availability and make right choices.

Govind Kanshi

November 06, 2014

More Decks by Govind Kanshi

Other Decks in Technology


  1. Agenda • Where is storage • Storage options • Factors

    • Availability/Performance/Elasticity/Cost
  2. Questions • Do websites use storage? • How much throughput

    can I get from disks • How do I upload data • How do I track issues
  3. Where is Azure storage More than 20 trillion stored objects

    2+ Million requests/sec on average (build 2014) • VM Store • Data disk • Logs • Media (photo/video/audio/docs) • Table store • SSD Disk CDP-B382
  4. Virtual Machine Storage Architecture Azure Virtual Machine C:\ OS Disk

    E:\, F:\, etc. Data Disks D:\ Temporary Disk Disk Cache G:\, H:\, etc. SMB Share
  5. Azure Files Azure VM SMB 2.1 Shared settings, diagnostic share

    Lift and Shift Applications Azure VM Azure VM
  6. Performance of storage • Disk • Linux • fio •

    Windows • Sqlio, crystal disk • Striped vs unstriped (same storage account) • RAID 1-0 – heavy data volume, double the cost • Windows – Storage Pool • Gates for performance • Stated throughput goal of Storage account • Machine Size • Connectivity to storage/pipe – max
  7. Single Queue – Account Name + Queue Name Up to

    2,000 (1KB) messages per second Single Table Partition – Account Name + Table Name + PartitionKey value Up to 2,000 (1KB) entities per second Single Blob – Account Name + Container Name + Blob Name Up to 60 MBytes/sec Scalability Targets – Partition
  8. Think of performance •Performance gate(s) • (storage) VM disk, attached

    disk • Per vm •Network • Gateway (200 Mbps) across vpn • Expressroute 1 Gbps – 10 Gbps
  9. Blob Service - Best Practices •How do I upload a

    folder the fastest? • Upload multiple blobs simultaneously •How do I upload a blob the fastest? • Use parallel block upload • Aspera/send file over mail  •Distribute the load across the namespace •Prefer block upload size of 1 - 4MB range unless read requests are for small ranges
  10. Premium storage Up to 32 TB of storage per VM

    >50,000 IOPS per VM Less than 1ms read latency
  11. Storage Options Cost Latency Size of entity Store Size range

    Availability Data Ephemeral low low type 1 TB 3 local, geo any Azure Storage (page) lowest more 512-byte pages 1 TB 3 local/geo r/w data Azure Table Low more Azure Storage (block) Low More 4-MB 200 GB 3 local/geo Block data(backup delta), files SQL Azure Flexible medium Datatype 1Gb-500 Gb 3 local, 1 geo, RO Structured data Azure Redis Flexible Low Datatype Upto 53 GB Master –slave cache DocumentDB Flexible Lowest 256 Kb Xx Terabytes Auto Any (json) Azure Search Flexible Low Datatype GBs Auto Any HDInsight Flexible More Custom TBs Auto Any
  12. LRS Stores 3 replicas of the data within a single

    zone (facility) in a single region Provides data durability for disk, node and rack failures ZRS * Available only for block blobs Stores 3 replicas of the data across multiple zones (facilities). Designed to keep all 3 replicas across zones within a single region, but may span across two regions. Provides additional durability to protect data against zone failures (e.g., fire in a facility) GRS Stores 6 replicas of the data across two regions (3 in each region) Provides additional durability to protect data against major regional disasters (e.g., tornado, hurricane, earthquake, etc.) 3 Types of Durability offered for Azure Storage
  13. Geo Redundant Storage (GRS) Data geo-replicated across regions hundreds of

    miles apart • Provide data durability in face of potential major regional disasters • Provided for Blob, Tables and Queues User chooses primary region during account creation • Each primary region has a predefined secondary region Asynchronous geo-replication • Off critical path of live requests US West US East US North US South US Central US East 2 Europe North Europe West Asia East Asia South East China North China South Japan East Japan West South Brazil US South
  14. Read-Only Access to GRS (RA-GRS) – Scenarios • Read-only access

    to secondary data even if primary is unavailable • Access to an eventually consistent copy of the data in the other region • For these, the application semantics need to allow for eventually consistent reads
  15. Turn on storage analytics with retention on Send client request

    id with data that needs to be tracked Logs can be analyzed to retrieve information and aggregate based on it Logs can be used to determine hot spots Logs are not sorted in a blob and clock skew needs to be factored in Look at minute & hourly metrics to understand usage and Performance Look for throttling errors Storage Analytics
  16. Choosing the Right Authentication Method Symmetric Shared Key Authentication Trusted

    service that owns the storage accounts Shared Access Signature (SAS) 3rd party services Mobile device applications Restricted access for services Allow client applications to directly communicate with Storage rather than scaling a proxy web service Proxy used for authentication and providing SAS tokens Public (Blob service only) CDN access Content accessed via browsers
  17. Designing Your Service For Security (1 of 2) How to

    store Secret keys/Shared Access Signature (SAS) tokens? Persist only encrypted key/token Use cert to decrypt the encrypted key in the application Certs available only on required nodes How to transfer SAS tokens? Use HTTPS to transfer SAS tokens How often should I change my Secret keys/SAS tokens? Automate the process to enable changing it frequently Always be ready to revoke SAS tokens or change Secret keys/SAS tokens
  18. Designing Your Service For Security (2 of 2) How do

    I rotate Secret Keys/SAS tokens? Two 512-bit keys provided. Push configuration change to all services to use one of them Other key can be changed using service management API
  19. Shared Access Signatures – Best Practices (1 of 2) Authenticate

    the service requesting SAS token Use HTTPS Token provider and consumer need to agree on storage REST version Semantics for SAS Token can change from version Token generating service should be capable of generating multiple versions of tokens and consumer can select the version it can understand Clock Skew Sufficient buffer for start time and end time because of clock skew Avoid setting start time if access should start right away
  20. • Azure Storage • Durable, Scalable and highly Available Cloud

    Storage • Auto load balances to meet scale needs • Performance from local, ssd, striped, provisioned • Storage Durability Options – LRS, ZRS, and GRS • RA-GRS • Provides Higher Availability as applications can read from secondary when primary is not available. • Client Library retries provides this capability out of the box • Details on Internals can be found in the SOSP paper: • “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011 Summary
  21. •Increased Scale Targets for Storage Accounts •Each storage account can

    hold up to 500TBs for all regions •Increased BW for US regions per storage account •10Gbps Ingress and 20Gbps Egress •Improved Versioning for Shared Access Signatures •Client Libraries & Tools •.NET Library Desktop, Phone and Runtime with support for Files and Rest Version 2014-02-14 •Java 1.0 RTM •Android 0.1 CTP •C++ Library CTP •AzCopy for Files CTP •PowerShell for Files CTP •Azure Files Preview What is New?
  22. Your Feedback is Important OPTION 3: Feedback stations outside the

    hall Fill out evaluation of this session and help shape future events. OPTION 1 OPTION 2 Replace this space with the actual QR Code