Building multi-million document applications with scalability in mind

Building multi-million document applications with scalability in mind a case-study
Software Architect at ServiziCGN RavenDB User CTO at Managed Designs RavenDB Contributor Manuel Scapolan Mauro Servienti

“The Italian Guys” Software Architect at ServiziCGN RavenDB User 1nn0va
community speaker Manuel Scapolan @manuelscapolan CTO at Managed Designs RavenDB Contributor Microsoft MVP - Visual C# Mauro Servienti @mauroservienti

Servizi CGN •  Fiscal and tax advice •  E-learning and
class learning •  Dedicated software to fill tax return statements and to run the companies accounting Leader company in offering services to accountants

An italian strange fact… The Revenue Agency Employees and
retired people Tax return statement

Fiscal Assistance Centers (CAF) The Revenue Agency CAF (Fiscal
Assistance Center) Employees and retired people

Servizi CGN •  More than 1.260.000 tax return statements sent
only this year •  Thanks to a network of 35.000 professionals that use its software free to fill and send the tax return statements of their clients 1° CAF in Italy as private business

The case-study The porting of the applications that allow the
accountants to fill and send the tax return statements to the Revenue Agency

A tax return statement •  A huge form composed of
different fields that can be filled by the user or automatically •  Some fields can be validated by rules •  Once correctly filled the statement will be sent to the Revenue Agency

Our first attempt •  3-layer architecture •  Data persistence on
SQL Server •  EntityFramework as ORM Business Layer Presentation Layer Data Layer SQL Server

Good, but… JOIN JOIN JOIN JOIN JOIN JOIN JOIN

Aggregate Data Model •  An Aggregate is a cluster of
associated objects that are treated as a unit for the purpose of data changes •  The document databases are the best way to save the aggregates (Aggregate = Document) “A data model is the model through which we perceive and manipulate our data.” Martin Fowler, NoSQL Distilled

Why RavenDB? •  All our developers are familiar with the
.NET Framework and the C# language •  We operate with sensitive data so we have the need of ACID transactions •  For the technical support provided by Managed Designs

The architecture UI Command Handling command
result Thin Data Layer DTOs query load, change and persist Domain Model Query-‐side Command-‐side Bertrand Meyer’s Command Query Separation Principle RavenDB 2.5

The Command-side •  The user fills the fields of a
section •  A save command will be sent to the server •  On the server the statement is loaded and through an object mapper it is filled with the data in the command •  All the rules are applied •  Finally, all the changes will be saved on the database How it works UI Command Handling command result Thin Data Layer DTOs query load, change and persist Domain Model Query-‐side Command-‐side

Persistence Ignorance

The Persistence layer •  The UnitOfWork manages business transactions • 
It persists all the changes made within the transaction in an atomic operation •  If there are mistakes or exceptions no changes are saved UnitOfWork + Repository BeginUnitOfWork() EndUnitOfWork() using(var tran = new TransactionScope()) { tran.Complete(); }

The Persistence layer •  The Repository mediates between the domain
and data layer using a collection-like interface for accessing our aggregates •  Changes will be persisted on storage in a completely transparent way •  With RavenDB we don’t have to write a mapping layer UnitOfWork + Repository BeginUnitOfWork() EndUnitOfWork() using(var tran = new TransactionScope()) { Repository.GetById(id) tran.Complete(); } Repository.Update() session.SaveChanges(); var s = session.Load<Statement>(id); // Changes occur in the aggregate root

The Query-side •  We make a load of the entire
statement •  With an operation reverse to the saving, a view model is populated with only the data of the section to display •  The view model is then sent to the client serialized in json How it works UI Command Handling command result Thin Data Layer DTOs query load, change and persist Domain Model Query-‐side Command-‐side

What about the indexes? •  We didn’t use the indexes
in this case because: 1.  All the data that we needed were a subset of the statement document 2.  Not to handle the asynchronous update of the indexes •  We used indexes for reports, look-up lists and summary information

Scalability

What does scalability mean? •  The ability of a system
or a software solution to handle increased loads of work •  A solution that can scale out can usually grow to larger loads in a more convenient way SCALE UP SCALE OUT growing by using stronger hardware growing by adding more hardware

What is the worst enemy of the horizontal scalability for
a database? THE DATA CONSISTENCY

Aggregate boundaries •  Put together the information that need to
be consistent so they can be easily distributed without compromising the performance •  Obviously this “indivisible whole of consistency” is an aggregate •  So, first of all, define your aggregate boundaries

Data-Sharding

Data-Sharding •  The most effective solution to be adopted is
to divide the data per accountant •  Each statement may be made only by an accountant •  So we will not need to run queries on multiple servers at the same time

The ShardStrategy •  We assign to every new accountant a
serial number (a simple auto-increment integer) •  This number has a high cardinality and so it can be a good shard-key

The ShardStrategy •  Each accountant lives in its own shard
•  From the support/back office POV we need cross accountants queries •  Accountant affinity requires us to keep “related” accountants as near as possible Node 0 Node 1 Node N Shard(s) DB Shard(s) DB Shard(s) DB

Replication

Master-Slave Replication MASTER SLAVE SLAVE SLAVE

High Availability •  The life of the accountant will often
be on the cutting edge •  The unavailability of the software in those moments can bring serious harm to our clients •  A master-slave replication can be crucial to ensure high availability to the application

High Availability •  Patch/deploy/update & rebuild on the logical-slave • 
Switch –  logical-slave becomes the logical-master –  logical-master switches to logical-slave •  VIP Switch the application to use the logical-master through Master-Master Replication

Fault tolerance •  Write assurance to wait for data to
be replicated at least to n servers in the replica set; •  Deploy RavenDB server on VMs on SAN; –  Each SAN shadow copies the VM at each change, amazingly fast; –  If HW fails, restore the SAN snapshot to a replicated SAN and move on;

Pros & Cons

The Good stuff •  Simplifies the data layer development • 
Ready to run in 5 minutes •  It just works, “zero” administration •  Safe by default •  Easy scale-out

The Bad stuff •  Being an early adopter •  Lack
of monitoring tools •  Doing query on the fly •  All about the Eventual Consistency

Can RavenDB be the right choice? •  Is your aggregate
boundaries well defined? •  Are you looking for scalability and lack of verbosity? •  … •  RavenDB may just be the technology you are looking for!

Thank you @mauroservienti @manuelscapolan

Building multi-million document applications wi...

Building multi-million document applications with scalability in mind

More Decks by Manuel Scapolan

Other Decks in Technology

Featured

Transcript