Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building multi-million document applications wi...

Building multi-million document applications with scalability in mind

What does it mean to work with a huge quantity of data? We know that sharding and replication are our friends, but those are also other things that we should consider. In this session I will show you how we have used Raven with scalability in mind.

Manuel Scapolan

September 18, 2014
Tweet

More Decks by Manuel Scapolan

Other Decks in Technology

Transcript

  1. Building multi-million document applications with scalability in mind a case-study

    Software Architect at ServiziCGN RavenDB User CTO at Managed Designs RavenDB Contributor Manuel Scapolan Mauro Servienti
  2. “The Italian Guys” Software Architect at ServiziCGN RavenDB User 1nn0va

    community speaker Manuel Scapolan @manuelscapolan CTO at Managed Designs RavenDB Contributor Microsoft MVP - Visual C# Mauro Servienti @mauroservienti
  3. Servizi CGN •  Fiscal and tax advice •  E-learning and

    class learning •  Dedicated software to fill tax return statements and to run the companies accounting Leader company in offering services to accountants  
  4. An italian strange fact… The Revenue Agency   Employees and

    retired people   Tax return statement  
  5. Fiscal Assistance Centers (CAF) The Revenue Agency   CAF (Fiscal

    Assistance Center)   Employees and retired people  
  6. Servizi CGN •  More than 1.260.000 tax return statements sent

    only this year •  Thanks to a network of 35.000 professionals that use its software free to fill and send the tax return statements of their clients 1° CAF in Italy as private business  
  7. The case-study The porting of the applications that allow the

    accountants to fill and send the tax return statements to the Revenue Agency
  8. A tax return statement •  A huge form composed of

    different fields that can be filled by the user or automatically •  Some fields can be validated by rules •  Once correctly filled the statement will be sent to the Revenue Agency
  9. Our first attempt •  3-layer architecture •  Data persistence on

    SQL Server •  EntityFramework as ORM Business Layer Presentation Layer Data Layer SQL Server
  10. Aggregate Data Model •  An Aggregate is a cluster of

    associated objects that are treated as a unit for the purpose of data changes •  The document databases are the best way to save the aggregates (Aggregate = Document) “A data model is the model through which we perceive and manipulate our data.” Martin Fowler, NoSQL Distilled
  11. Why RavenDB? •  All our developers are familiar with the

    .NET Framework and the C# language •  We operate with sensitive data so we have the need of ACID transactions •  For the technical support provided by Managed Designs
  12. The architecture UI   Command   Handling   command  

    result   Thin  Data   Layer   DTOs   query   load,  change  and  persist   Domain   Model   Query-­‐side   Command-­‐side   Bertrand Meyer’s Command Query Separation Principle   RavenDB  2.5  
  13. The Command-side •  The user fills the fields of a

    section •  A save command will be sent to the server •  On the server the statement is loaded and through an object mapper it is filled with the data in the command •  All the rules are applied •  Finally, all the changes will be saved on the database How it works   UI   Command   Handling   command   result   Thin  Data   Layer   DTOs   query   load,  change  and  persist   Domain   Model   Query-­‐side   Command-­‐side  
  14. The Persistence layer •  The UnitOfWork manages business transactions • 

    It persists all the changes made within the transaction in an atomic operation •  If there are mistakes or exceptions no changes are saved UnitOfWork + Repository   BeginUnitOfWork()   EndUnitOfWork()   using(var tran = new TransactionScope()) { tran.Complete(); }
  15. The Persistence layer •  The Repository mediates between the domain

    and data layer using a collection-like interface for accessing our aggregates •  Changes will be persisted on storage in a completely transparent way •  With RavenDB we don’t have to write a mapping layer UnitOfWork + Repository   BeginUnitOfWork()   EndUnitOfWork()   using(var tran = new TransactionScope()) { Repository.GetById(id)   tran.Complete(); } Repository.Update()   session.SaveChanges(); var s = session.Load<Statement>(id); // Changes occur in the aggregate root
  16. The Query-side •  We make a load of the entire

    statement •  With an operation reverse to the saving, a view model is populated with only the data of the section to display •  The view model is then sent to the client serialized in json How it works   UI   Command   Handling   command   result   Thin  Data   Layer   DTOs   query   load,  change  and  persist   Domain   Model   Query-­‐side   Command-­‐side  
  17. What about the indexes? •  We didn’t use the indexes

    in this case because: 1.  All the data that we needed were a subset of the statement document 2.  Not to handle the asynchronous update of the indexes •  We used indexes for reports, look-up lists and summary information
  18. What does scalability mean? •  The ability of a system

    or a software solution to handle increased loads of work •  A solution that can scale out can usually grow to larger loads in a more convenient way SCALE UP SCALE OUT growing by using stronger hardware growing by adding more hardware
  19. Aggregate boundaries •  Put together the information that need to

    be consistent so they can be easily distributed without compromising the performance •  Obviously this “indivisible whole of consistency” is an aggregate •  So, first of all, define your aggregate boundaries
  20. Data-Sharding •  The most effective solution to be adopted is

    to divide the data per accountant •  Each statement may be made only by an accountant •  So we will not need to run queries on multiple servers at the same time
  21. The ShardStrategy •  We assign to every new accountant a

    serial number (a simple auto-increment integer) •  This number has a high cardinality and so it can be a good shard-key
  22. The ShardStrategy •  Each accountant lives in its own shard

    •  From the support/back office POV we need cross accountants queries •  Accountant affinity requires us to keep “related” accountants as near as possible Node 0 Node 1 Node N Shard(s) DB Shard(s) DB Shard(s) DB
  23. High Availability •  The life of the accountant will often

    be on the cutting edge •  The unavailability of the software in those moments can bring serious harm to our clients •  A master-slave replication can be crucial to ensure high availability to the application
  24. High Availability •  Patch/deploy/update & rebuild on the logical-slave • 

    Switch –  logical-slave becomes the logical-master –  logical-master switches to logical-slave •  VIP Switch the application to use the logical-master through Master-Master Replication  
  25. Fault tolerance •  Write assurance to wait for data to

    be replicated at least to n servers in the replica set; •  Deploy RavenDB server on VMs on SAN; –  Each SAN shadow copies the VM at each change, amazingly fast; –  If HW fails, restore the SAN snapshot to a replicated SAN and move on;
  26. The Good stuff •  Simplifies the data layer development • 

    Ready to run in 5 minutes •  It just works, “zero” administration •  Safe by default •  Easy scale-out
  27. The Bad stuff •  Being an early adopter •  Lack

    of monitoring tools •  Doing query on the fly •  All about the Eventual Consistency
  28. Can RavenDB be the right choice? •  Is your aggregate

    boundaries well defined? •  Are you looking for scalability and lack of verbosity? •  … •  RavenDB may just be the technology you are looking for!