$30 off During Our Annual Pro Sale. View Details »

Govind Kanshi

Govind Kanshi

Most of the applications require some kind of data store. Azure provides various options for storing Data. This session looks at hosted/host your own from perspective of relational, non relational databases.

Govind Kanshi

March 20, 2014
Tweet

More Decks by Govind Kanshi

Other Decks in Technology

Transcript

  1. Windows Azure Conference 2014
    Windows Azure Conference 2014
    Data Storage options on Windows
    Azure
    Govind Kanshi
    MTC

    View Slide

  2. Windows Azure Conference 2014
    Way to skin cat store
    • Hosting options
    • What you need to worry about
    – Availability
    – Performance
    – Scale...
    • Where do I store data

    View Slide

  3. Windows Azure Conference 2014
    Hosting option
    • Hosted
    • Host your own
    • What you need to worry about
    – Availability
    – Performance (more compute/bw/better storage)
    – Scale (throughput/latency/storage)
    – Management/Monitoring
    – Cost

    View Slide

  4. Windows Azure Conference 2014
    Hosting option Path
    • Hosted (not my headache option)
    – No admin – (majority – setup/maintenance)
    – Availability – Better and cheaper
    – Very little planning/spend the size of mc, resources
    – Focus on application not on admin/mgmt. issues

    View Slide

  5. Windows Azure Conference 2014
    Hosting Options Path
    • Host your own(my headache)
    – Flexibility (use jobs, use replication, use broker)
    – Roll your own Availability, Performance, upgrade,patching
    – Plan your scale, spend
    – Plan for Admin – have inhouse expertise

    View Slide

  6. Windows Azure Conference 2014
    Offerings
    • Relational
    – Hosted
    • SqlAzure
    – Host your own
    • Sql Server, Oracle, MySql, Postgres
    • Non Relational
    – Hosted
    • Table Storage – key/value, Blob/Page store
    • Mongo
    – Host your own
    • Cassandra., Mongo, Redis

    View Slide

  7. Windows Azure Conference 2014
    Availability
    • Hosted
    – SQLAzure
    • Local transparent failover – no direct access to replicas
    • Replicas – Remote ? In future (bkup and restore)
    • Replicas – Read Only ? – In future (local vs across dc)
    – Azure Storage
    • Local transparent failover – no direct access to replicas
    • Remote replication (no guarantee SLA but usually within minutes)
    • Host your own
    – Availability sets
    – Need to setup Virtual Network
    – Need to create synch mechanism
    – Need to setup failover mechanism
    • AlwaysOn for SQL servers, Other databases need to get it right like SQL Server(GG/DG)
    • Use Azure storage – push backup(log+data) via Azure or self.

    View Slide

  8. Windows Azure Conference 2014
    Performance
    • Hosted
    – Azure provides various options
    • SqlAzure premium vs Regular (remove noisy neighbor issue)
    • Pretty soon other services will distinguish themselves by performance(think
    H)
    – SQlAzure premium provides reserved IOPs
    • Host your own
    – Choose better compute
    – Choose better storage
    • Soon good news on more options
    – Eod you need to create monitoring, fixing & do planning

    View Slide

  9. Windows Azure Conference 2014
    Scale (Up/Out)
    • Hosted
    – SqlAzure
    • Web/Business – storage vs SqlPremium isolated perf
    – HDInsight
    • Scaleout vs scaleup of nodes (disruptive)
    – Table Storage/Azure Blog/Queues - Service Bus(little diff)
    • Unlimited storage(overall 200TB) – no explicit limit (no scale up sku)
    • Host your own
    – Need to plan for provisioning of storage/compute based on offering
    (redis vs Cassandra vs Hbase). Monitoring/Handling failover etc extra
    effort.

    View Slide

  10. Windows Azure Conference 2014
    Management/Monitoring
    • Hosted
    – API or Dashboard (mostly)
    – Everything abstraced – Cost/operations which matter than os/mem etc
    – Mostly auto managed/healed with with overall backend taking care of many
    things
    – No worries about patch mgmt, backup schedules etc…
    • Host your own
    – Roll out your own (time vs what to expose/use/act upon) – Cloud aware SW
    needed. System Center can do x things
    – Backend can take care of say compute failover or storage but rest stuff
    needs to be built upon.

    View Slide

  11. Windows Azure Conference 2014
    Cost
    • Hosted
    – Generally easy (volume stored, unit/processed/sent)
    – For ISV Billing is still an exercise – should become better
    • Host your own
    – Roll your own – basically what you use is what you pay.
    – Plus licensing
    – Plus dedicated people(sometimes hierarchy, one to do day-day
    jobs, another to help business/dev)

    View Slide

  12. Windows Azure Conference 2014
    What to check for in Host your Own
    • License portability
    • Certification
    • Support
    • Preferred usage
    – Dev/Test vs Production

    View Slide

  13. Windows Azure Conference 2014
    Why diff kind of store
    • Data is complex - struct of struct of maps
    • Data is changing the shape
    • Lot of data is collected – scale of storage
    – Time Series
    • Sensors
    • Audit events
    – Data is schema?
    • easy to add new fields, and even completely change the structure of a model.
    • Need query model over shape rather than just key/value or pseudo mapping to
    Relational world
    • Low Latency high volume

    View Slide

  14. Windows Azure Conference 2014
    What kind of data
    • What is my scenario
    – Caching – Velocity, MemcacheD, Redis, Riak
    – Counters/Speed/Write – Velocity, Redis, Cassandra
    – Transactions – Database, SQL Azure (federation)
    – Documents/jsonfied class/shape – MongoDB, RavenDB, Riak *
    – Write large amount of data with throughput – Cassandra,Azure
    Storage
    – Full Text Search – Solr/ElasticSearch, Sphinx
    – Store data for scale out compute – Hadoop
    – Store data on specialized Appliance – PDW
    * Wished we could query shape data rather than fitting in
    relational world of columns/rows

    View Slide

  15. Windows Azure Conference 2014
    Where do I store my data - Location
    Low latency
    Local Memory
    Low latency
    Shared Memory
    Dedicated
    Machine
    Shared high
    throughput
    Storage
    Shared entity
    Storage
    Shared raw, batch
    long term storage
    Ref Data Session data Tx Data Tx data Entity data Data Lake/Store
    everything,
    In Node Cache
    Azure Cache
    Relational DB
    SQLAzure
    Relational DB
    AzureTable HDInsight

    View Slide

  16. Windows Azure Conference 2014
    Or another way to think
    • Will I write lot of data and need to store & query it
    • Will need very low latency
    • Can I compromise on consistency
    • What are my business needs (how fast we are growing),
    Can I afford to take a break and get/roll in new store

    View Slide

  17. Windows Azure Conference 2014
    How will we get/store the data
    • Query
    – SQL, LINQ, ORMed (challenge mapping to every language) or
    REST
    – Custom (query format, compression,serialization)
    • Tunable Consistency
    – Out of 5 nodes only when 3 respond yay – consider written
    – Out of 5 nodes when 2 respond yay – take that value

    View Slide

  18. Windows Azure Conference 2014
    Stores Hosted Host your own
    Microsoft Non
    Microsoft/Partner
    Microsoft/Partner Non Microsoft
    Relational SQLAzure Sql Server, Access Oracle, SAP, My
    Caching Azure Cache Memcache Redis, Memcache
    K-v/Column store Azure Table Cassandra, Riak, Hbase
    Document store AzureTable? Mongo MongoDB
    Graph Store Neo4j
    VL-Scaleout HDInsight HortonWorks HDP Cloudera?
    In-Memory DS Azure Cache Redis
    Streaming/Queue/EAI Azure
    queue,Notification
    , Biztalk
    StreamInsight ,MSMQ,
    Biztalk
    Storm, Kafka
    Long term Azure Storage Build your own
    Text Azure Table Solr SQL server Solr, Elastic Search

    View Slide

  19. Windows Azure Conference 2014
    End

    View Slide

  20. Windows Azure Conference 2014
    Compare them – summary
    (evolving)
    Key Value Document Column Graph
    Persistence-
    Json
    * * *
    ACID # # #
    Query mode API/REST API/REST API SPARQL/Rest/Java
    Scale Horizontal Horizontal Horizontal Vertical scale
    Replication Async Async/tunable Tunable NA
    Schema free * * + *
    Mapreduce # # # NA
    Node-
    Addn/Dln
    + Manual # * NA
    Indexing Primary key Attributes # *
    * :Most of them support, # :specific product support , + :partial support

    View Slide