Slide 1

Slide 1 text

Building multi-million document applications with scalability in mind a case-study Software Architect at ServiziCGN RavenDB User CTO at Managed Designs RavenDB Contributor Manuel Scapolan Mauro Servienti

Slide 2

Slide 2 text

“The Italian Guys” Software Architect at ServiziCGN RavenDB User 1nn0va community speaker Manuel Scapolan @manuelscapolan CTO at Managed Designs RavenDB Contributor Microsoft MVP - Visual C# Mauro Servienti @mauroservienti

Slide 3

Slide 3 text

Servizi CGN •  Fiscal and tax advice •  E-learning and class learning •  Dedicated software to fill tax return statements and to run the companies accounting Leader company in offering services to accountants  

Slide 4

Slide 4 text

An italian strange fact… The Revenue Agency   Employees and retired people   Tax return statement  

Slide 5

Slide 5 text

Fiscal Assistance Centers (CAF) The Revenue Agency   CAF (Fiscal Assistance Center)   Employees and retired people  

Slide 6

Slide 6 text

Servizi CGN •  More than 1.260.000 tax return statements sent only this year •  Thanks to a network of 35.000 professionals that use its software free to fill and send the tax return statements of their clients 1° CAF in Italy as private business  

Slide 7

Slide 7 text

The case-study The porting of the applications that allow the accountants to fill and send the tax return statements to the Revenue Agency

Slide 8

Slide 8 text

A tax return statement •  A huge form composed of different fields that can be filled by the user or automatically •  Some fields can be validated by rules •  Once correctly filled the statement will be sent to the Revenue Agency

Slide 9

Slide 9 text

Our first attempt •  3-layer architecture •  Data persistence on SQL Server •  EntityFramework as ORM Business Layer Presentation Layer Data Layer SQL Server

Slide 10

Slide 10 text

Good, but… JOIN  JOIN  JOIN  JOIN  JOIN  JOIN  JOIN  

Slide 11

Slide 11 text

Aggregate Data Model •  An Aggregate is a cluster of associated objects that are treated as a unit for the purpose of data changes •  The document databases are the best way to save the aggregates (Aggregate = Document) “A data model is the model through which we perceive and manipulate our data.” Martin Fowler, NoSQL Distilled

Slide 12

Slide 12 text

Why RavenDB? •  All our developers are familiar with the .NET Framework and the C# language •  We operate with sensitive data so we have the need of ACID transactions •  For the technical support provided by Managed Designs

Slide 13

Slide 13 text

The architecture UI   Command   Handling   command   result   Thin  Data   Layer   DTOs   query   load,  change  and  persist   Domain   Model   Query-­‐side   Command-­‐side   Bertrand Meyer’s Command Query Separation Principle   RavenDB  2.5  

Slide 14

Slide 14 text

The Command-side •  The user fills the fields of a section •  A save command will be sent to the server •  On the server the statement is loaded and through an object mapper it is filled with the data in the command •  All the rules are applied •  Finally, all the changes will be saved on the database How it works   UI   Command   Handling   command   result   Thin  Data   Layer   DTOs   query   load,  change  and  persist   Domain   Model   Query-­‐side   Command-­‐side  

Slide 15

Slide 15 text

Persistence Ignorance

Slide 16

Slide 16 text

The Persistence layer •  The UnitOfWork manages business transactions •  It persists all the changes made within the transaction in an atomic operation •  If there are mistakes or exceptions no changes are saved UnitOfWork + Repository   BeginUnitOfWork()   EndUnitOfWork()   using(var tran = new TransactionScope()) { tran.Complete(); }

Slide 17

Slide 17 text

The Persistence layer •  The Repository mediates between the domain and data layer using a collection-like interface for accessing our aggregates •  Changes will be persisted on storage in a completely transparent way •  With RavenDB we don’t have to write a mapping layer UnitOfWork + Repository   BeginUnitOfWork()   EndUnitOfWork()   using(var tran = new TransactionScope()) { Repository.GetById(id)   tran.Complete(); } Repository.Update()   session.SaveChanges(); var s = session.Load(id); // Changes occur in the aggregate root

Slide 18

Slide 18 text

The Query-side •  We make a load of the entire statement •  With an operation reverse to the saving, a view model is populated with only the data of the section to display •  The view model is then sent to the client serialized in json How it works   UI   Command   Handling   command   result   Thin  Data   Layer   DTOs   query   load,  change  and  persist   Domain   Model   Query-­‐side   Command-­‐side  

Slide 19

Slide 19 text

What about the indexes? •  We didn’t use the indexes in this case because: 1.  All the data that we needed were a subset of the statement document 2.  Not to handle the asynchronous update of the indexes •  We used indexes for reports, look-up lists and summary information

Slide 20

Slide 20 text

Scalability

Slide 21

Slide 21 text

What does scalability mean? •  The ability of a system or a software solution to handle increased loads of work •  A solution that can scale out can usually grow to larger loads in a more convenient way SCALE UP SCALE OUT growing by using stronger hardware growing by adding more hardware

Slide 22

Slide 22 text

What is the worst enemy of the horizontal scalability for a database? THE DATA CONSISTENCY

Slide 23

Slide 23 text

Aggregate boundaries •  Put together the information that need to be consistent so they can be easily distributed without compromising the performance •  Obviously this “indivisible whole of consistency” is an aggregate •  So, first of all, define your aggregate boundaries

Slide 24

Slide 24 text

Data-Sharding

Slide 25

Slide 25 text

Data-Sharding •  The most effective solution to be adopted is to divide the data per accountant •  Each statement may be made only by an accountant •  So we will not need to run queries on multiple servers at the same time

Slide 26

Slide 26 text

The ShardStrategy •  We assign to every new accountant a serial number (a simple auto-increment integer) •  This number has a high cardinality and so it can be a good shard-key

Slide 27

Slide 27 text

The ShardStrategy •  Each accountant lives in its own shard •  From the support/back office POV we need cross accountants queries •  Accountant affinity requires us to keep “related” accountants as near as possible Node 0 Node 1 Node N Shard(s) DB Shard(s) DB Shard(s) DB

Slide 28

Slide 28 text

Replication

Slide 29

Slide 29 text

Master-Slave Replication MASTER   SLAVE   SLAVE   SLAVE  

Slide 30

Slide 30 text

High Availability •  The life of the accountant will often be on the cutting edge •  The unavailability of the software in those moments can bring serious harm to our clients •  A master-slave replication can be crucial to ensure high availability to the application

Slide 31

Slide 31 text

High Availability •  Patch/deploy/update & rebuild on the logical-slave •  Switch –  logical-slave becomes the logical-master –  logical-master switches to logical-slave •  VIP Switch the application to use the logical-master through Master-Master Replication  

Slide 32

Slide 32 text

Fault tolerance •  Write assurance to wait for data to be replicated at least to n servers in the replica set; •  Deploy RavenDB server on VMs on SAN; –  Each SAN shadow copies the VM at each change, amazingly fast; –  If HW fails, restore the SAN snapshot to a replicated SAN and move on;

Slide 33

Slide 33 text

Pros & Cons

Slide 34

Slide 34 text

The Good stuff •  Simplifies the data layer development •  Ready to run in 5 minutes •  It just works, “zero” administration •  Safe by default •  Easy scale-out

Slide 35

Slide 35 text

The Bad stuff •  Being an early adopter •  Lack of monitoring tools •  Doing query on the fly •  All about the Eventual Consistency

Slide 36

Slide 36 text

Can RavenDB be the right choice? •  Is your aggregate boundaries well defined? •  Are you looking for scalability and lack of verbosity? •  … •  RavenDB may just be the technology you are looking for!

Slide 37

Slide 37 text

Thank you @mauroservienti @manuelscapolan