Azure Storage is underpinning of lot of services and used extensively to provision machines, store data. This implies developers need to understand challenges of performance, availability and make right choices.
Windows • Sqlio, crystal disk • Striped vs unstriped (same storage account) • RAID 1-0 – heavy data volume, double the cost • Windows – Storage Pool • Gates for performance • Stated throughput goal of Storage account • Machine Size • Connectivity to storage/pipe – max
2,000 (1KB) messages per second Single Table Partition – Account Name + Table Name + PartitionKey value Up to 2,000 (1KB) entities per second Single Blob – Account Name + Container Name + Blob Name Up to 60 MBytes/sec Scalability Targets – Partition
folder the fastest? • Upload multiple blobs simultaneously •How do I upload a blob the fastest? • Use parallel block upload • Aspera/send file over mail •Distribute the load across the namespace •Prefer block upload size of 1 - 4MB range unless read requests are for small ranges
zone (facility) in a single region Provides data durability for disk, node and rack failures ZRS * Available only for block blobs Stores 3 replicas of the data across multiple zones (facilities). Designed to keep all 3 replicas across zones within a single region, but may span across two regions. Provides additional durability to protect data against zone failures (e.g., fire in a facility) GRS Stores 6 replicas of the data across two regions (3 in each region) Provides additional durability to protect data against major regional disasters (e.g., tornado, hurricane, earthquake, etc.) 3 Types of Durability offered for Azure Storage
miles apart • Provide data durability in face of potential major regional disasters • Provided for Blob, Tables and Queues User chooses primary region during account creation • Each primary region has a predefined secondary region Asynchronous geo-replication • Off critical path of live requests US West US East US North US South US Central US East 2 Europe North Europe West Asia East Asia South East China North China South Japan East Japan West South Brazil US South
to secondary data even if primary is unavailable • Access to an eventually consistent copy of the data in the other region • For these, the application semantics need to allow for eventually consistent reads
id with data that needs to be tracked Logs can be analyzed to retrieve information and aggregate based on it Logs can be used to determine hot spots Logs are not sorted in a blob and clock skew needs to be factored in Look at minute & hourly metrics to understand usage and Performance Look for throttling errors Storage Analytics
service that owns the storage accounts Shared Access Signature (SAS) 3rd party services Mobile device applications Restricted access for services Allow client applications to directly communicate with Storage rather than scaling a proxy web service Proxy used for authentication and providing SAS tokens Public (Blob service only) CDN access Content accessed via browsers
store Secret keys/Shared Access Signature (SAS) tokens? Persist only encrypted key/token Use cert to decrypt the encrypted key in the application Certs available only on required nodes How to transfer SAS tokens? Use HTTPS to transfer SAS tokens How often should I change my Secret keys/SAS tokens? Automate the process to enable changing it frequently Always be ready to revoke SAS tokens or change Secret keys/SAS tokens
I rotate Secret Keys/SAS tokens? Two 512-bit keys provided. Push configuration change to all services to use one of them Other key can be changed using service management API
the service requesting SAS token Use HTTPS Token provider and consumer need to agree on storage REST version Semantics for SAS Token can change from version Token generating service should be capable of generating multiple versions of tokens and consumer can select the version it can understand Clock Skew Sufficient buffer for start time and end time because of clock skew Avoid setting start time if access should start right away
Storage • Auto load balances to meet scale needs • Performance from local, ssd, striped, provisioned • Storage Durability Options – LRS, ZRS, and GRS • RA-GRS • Provides Higher Availability as applications can read from secondary when primary is not available. • Client Library retries provides this capability out of the box • Details on Internals can be found in the SOSP paper: • “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011 Summary
hold up to 500TBs for all regions •Increased BW for US regions per storage account •10Gbps Ingress and 20Gbps Egress •Improved Versioning for Shared Access Signatures •Client Libraries & Tools •.NET Library Desktop, Phone and Runtime with support for Files and Rest Version 2014-02-14 •Java 1.0 RTM •Android 0.1 CTP •C++ Library CTP •AzCopy for Files CTP •PowerShell for Files CTP •Azure Files Preview What is New?