Offline-First: Architektur-Patterns & Implementierung für APIs & Backends

Offline-First: Architektur-Patterns & Implementierung für APIs & Backends

"Keine Internetverbindung" – ein Satz, den wir alle aus vielen Apps kennen, sei es Desktopanwendung oder Smartphone App. Oftmals sind die Clients auf dem jeweiligen Zielgerät installiert, bedienen sich aber einer externen Datenquelle, bspw. in Form eines HTTP APIs. Wie man sein Backend um eine offline-fähige Architektur erweitert, um es bspw. von einer SPA/PWA nutzen zu können, zeigen Thomas Hilzendegen und Manuel Rauber von Thinktecture in diesem Workshop. Denn bei einer offline-fähigen Implementierung sind nicht nur Lese-, sondern vor allem auch Schreibzugriffe wichtig. Sprich, die Daten müssen in beide Richtungen synchronisiert werden können, sobald die Internetverbindung wiederhergestellt wurde. Allgemein werden Themen im API Design, u.a. Synchronisationsstrategien von Anwendungs- und großen Binärdaten, Konfliktmanagement oder Datenbankarchitektur – sowohl für neue als auch bestehende Anwendungen – diskutiert. Am Beispiel einer PWA mit einem Backend in Java als auch in .NET Core werden Client und Server zur Umsetzung der besprochenen Konzepte und Patterns gezeigt.

GitHub: https://github.com/thinktecture/thinktecture-boardist

667fbca1f58bc0215c744b5ae8f8e5d2?s=128

Manuel Rauber

December 11, 2019
Tweet

Transcript

  1. Thomas Hilzendegen @hilzendegent Consultant Manuel Rauber @ManuelRauber Consultant Architektur-Patterns &

    Implementierung Offline First APIs & Backends
  2. Thomas Hilzendegen Speakers Manuel Rauber thomas.hilzendegen@thinktecture.com @hilzendegent manuel.rauber@thinktecture.com @manuelrauber https://manuel-rauber.com

    Microsoft MVP
  3. Timetable Time Doing 09:00 - 10:30 Part I 10:30 -

    11:00 ☕ 11:00 - 12:30 Part II
  4. • Progressive Web App • Offline availability • Offline synchronization

    • Change tracking • Security considerations • Pitfalls Agenda
  5. Intro

  6. Offline… but why?

  7. • Just because the phone has a connection, e.g. Edge,

    it does not mean we are online • Depends on the use case which connection quality is required to determine, if the app is reliable online • Connection quality could be measured by the time an exclusive request takes • Duration < 150 ms: online • Duration !>=150 ms: bad connection What does “offline” mean?
  8. • A lot of reasons to be offline • Traveling,

    Train, Flights (bad or no signal) • Server is not available • Routing problems • Roaming • Costs Motivation
  9. Demo Application

  10. • A little application to manage boardgames • Angular, .NET

    Core, MS SQL Server • Brownfield application, started as a pure online application • As typical for any good demo: no security • Available on Azure: https://tt-boardist.azurewebsites.net • GitHub: https://github.com/thinktecture/thinktecture-boardist Demo Application “Thinktecture Boardist”
  11. • Backend • .NET Core 3.0 • SQL Server •

    Frontend • Angular 8.2.13 Demo Application Specs
  12. Demo Application Database Scheme

  13. Live Demo

  14. Now let’s go offline!

  15. Does it work in flight mode? NOPE!

  16. Progressive Web App

  17. Live Demo

  18. Does it work offline now? STILL NOPE!

  19. Real offline availability

  20. • ServiceWorker is only able to take data offline which

    has been requested by the application • If all the URLs are known beforehand, the ServiceWorker could cache them all • Data which was not requested, is not available offline (no real offline synchronisation) • But what about … the Background Sync API? What’s the problem?
  21. • Name is misleading • Does not offer any data

    synchronisation possibilities • Just raises an event periodically or after a delay (optionally based on the network type) • It’s totally up to the developer what to do in the onsync event • Since the sync is done in the ServiceWorker, the page can be closed, the sync will be fulfilled anyway • It does not help anything with syncing your actual data! Background Sync API
  22. Background Sync API

  23. • Cookies (not meant for large data or binaries) •

    Web Storage like session storage or local storage (not meant for large data or binaries) • IndexedDB • Cache Storage (based on request/response) Storage capabilities
  24. User can clear anything at any time by “Remove temporary

    internet files” or “Clear website data”
  25. • Key-value database of the browser • Stores data permanently

    • ServiceWorker and Web App share access to the same IndexedDB • Possibility of scenarios, where the ServiceWorker (or Web App) stores synchronised data in the IndexedDB and the Web App reads the data • Like everything based on the origin IndexedDB
  26. IndexedDB Availability

  27. • The standard API of IndexedDB is inconvenient to use

    (lots of callback) • Dexie.js is a minimalistic wrapper for IndexedDB • Operations are promise-based • Near native performance (even for bulk inserts) • Open Source @ GitHub: https://github.com/dfahlander/Dexie.js IndexedDB API
  28. Offline Sync

  29. • Offline Sync means to download all data available to

    client into a persistent offline storage, without the user having to explicitly request the data • Depending on the scenario, client can do CRUD on the offline data • Data will be synced back to the server, whenever a connection is possible Offline Sync Basics
  30. • Online/offline recognition • Conflict management • Binary data •

    Local changes • Deleting data • Update interval (incoming new data) • Error handling • Primary key generation Offline Sync Challenges
  31. • Having Edge may not mean that we are online

    • Being online could depend on several factors • Connection quality (Edge, 3G, 4G, 5G) • Connection speed (latency to your backend) • Reachability of backend systems (database, 3rd party system, storages) Offline Sync Challenges - Online/offline recognition
  32. • Client needs to be online for write operations •

    Locks the data, so no other client can overwrite it • Data stays locked, until the client either saves or discards changes • Last One (Write) Wins • Visual conflict management (diffing like in Git, SVN, etc.) Offline Sync Challenges - Conflict management
  33. • Downloading large binary data may not be possible on

    all platforms • A browser does not have a big enough storage for storing the data • Native apps (Cordova, Electron) could be a solution to directly access the platform’s file system • Uploading large binary data • Server needs to support chunk based upload with connection interruptions • Client needs permanent access to the binary file being uploaded, which could be hard in a browser-only scenario Offline Sync Challenges - Binary data
  34. • User opens an edit form and makes changes to

    data • Incoming sync from server would change the data the user is editing • Depending on your use-case, decide what to do • Inform the user about the data change? • Show a visual diff? • Override the local changed data? • “Do nothing” and override the server data on next sync Offline Sync Challenges - Local changes
  35. • Mark “deleted” data with a flag only • Use

    a trigger to keep the deleted ID in a separate table • Client needs some info about “deleted” data • Simple list of IDs “deleted” since the last sync Offline Sync Challenges - Deleting data
  36. • Decide when you need to sync which data •

    Not all data needs to be synced in the same interval • An additional real time connection (WebSocket) could be established • The server can send a signal when some data needs to be synchronised Offline Sync Challenges - Update interval
  37. • Show it, don’t hide them! • The user needs

    to know that something has occurred which shouldn’t • If you can’t sync to the client anymore, you may need to re-sync everything • Depending on your use case, you may want to write all data in one transaction on the server to not have partial data written into the database Offline Sync Challenges - Error handling
  38. Primary key generation

  39. • Inserted offline data needs to be available in the

    client immediately (e.g. the users reloads the application) • Behaves like “real synced data”, even if it was not written to server yet • Client is able to route to offline inserted data • Therefor it needs some kind of primary key/id to route to • Primary key can be generated on server-side and on client-side Primary key generation - challenge
  40. • Server is responsible for generating the primary key •

    Client will generate a temporary offline primary key • After syncing, the client needs to replace the temporary offline key with the server generated primary key • Allows to use sequential GUIDs on the server side (MSSQL) Primary key generation - server-side
  41. • Client is responsible for generating the primary key (GUIDs

    only) • Server inserts the client generated key into the database as the primary key • Server additionally needs to have a clustered key • Otherwise inserting a client non-sequential GUID will lead to a reclustering of the database, every time a client syncs new data Primary key generation - client-side
  42. Change Tracking

  43. • Possibilities to determine the client’s current state • Use

    highest rowversion within result • Determine by result set or additional query • Result of @@DBTS • Returns the last used rowversion of the database • Result of MIN_ACTIVE_ROWVERSION() • Returns the lowest (active) rowversion of the database Change Tracking - MS SQL Server’s ROWVERSION
  44. • If you do a million updates a second, every

    second, the timestamp will wrap around in about 585000 years • Could be used for concurrency checks as well (see Conflict Management) • Before writing data into a row, you could check if it still as the client’s saved row version • If yes: data was not modified, saving is easily possible • If no: data was modified, decide per use case what to do! Change Tracking - MS SQL Server’s ROWVERSION
  45. Change Tracking - Highest ROWVERSION within result Transaction #1 ID

    ROWVERSION 1 0x01 2 0x02 … … 41 0x41 Transaction #2 ID ROWVERSION 1 0x01 2 0x02 … … 41 0x41 42 0x42 43 0x43 SELECT MAX(ROWVERSION) FROM TABLE Client’s state Last known rowversion 41 43 NOT RELIABLE!
  46. • A later started transaction but committed first results in

    a queryable rowversion higher than the pending ones • Data will be lost (never seen by the client’s delta request) Change Tracking - Highest ROWVERSION within result
  47. Change Tracking - Query @@DBTS Transaction #1 ID ROWVERSION 1

    0x01 2 0x02 … … 41 0x41 Transaction #2 ID ROWVERSION 1 0x01 2 0x02 … … 41 0x41 42 0x42 43 0x43 SELECT @@DBTS Client’s state Last known rowversion 41 42 43 NOT RELIABLE!
  48. • Value is database global • Running transaction updates the

    value instantly (before commit or rollback) • Data will be lost (never seen by the client’s delta request) Change Tracking - Query @@DBTS
  49. Change Tracking - Query MIN_ACTIVE_ROWVERSION() Transaction #1 ID ROWVERSION 1

    0x01 2 0x02 … … 41 0x41 Transaction #2 ID ROWVERSION 1 0x01 2 0x02 … … 41 0x41 42 0x42 43 0x43 SELECT MIN_ACTIVE_ROWVERSION() Client’s state Last known rowversion 41 42 44 RELIABLE!
  50. • Value is database global • Running transaction updates the

    value instantly (before commit or rollback) • Data will be safe • Needs to be compared with greater or equal • May query data multiple times Change Tracking - Query MIN_ACTIVE_ROWVERSION()
  51. Backend Preparation

  52. • All syncable entities need to have rowversion column •

    rowversion is updated by MS SQL Server automatically whenever the row is changed (created & updated) • For deleted entities • Either set a IsDeleted flag to true (never delete any rows physically) • Or save the deleted IDs of the entities somewhere else (by trigger) Backend Preparation - MS SQL Server
  53. • Use equivalents of rowversion and triggers • Manual implement

    mechanism in business logic (error-prone!) • Update tracking column manually by incrementing a database global number (during one transaction!) • Will be very hard für multi-row updates/inserts • Manual implement mechanism in triggers (if available) Backend Preparation - Other Database Systems
  54. Frontend Preparation

  55. • Choose storage area for data (e.g. IndexedDB) • Write

    all the code • Periodic data synchronization • Binary synchronization when data changes • Tracking of timestamps Frontend Preparation
  56. Security

  57. Typical token-based security Security - API level Browser Identity Provider

    Web API
  58. • User is only able to see data based on

    his security level • Permissions • Roles • Policies • What happens, if the user permission change is leading to different data visible to the user? • What about data, which he does not see due to rights, but is connected to other data? • Remove frontend data after logout Security - Data Level
  59. Pitfalls

  60. • De-normalize the relational data (document style) • Results in

    multiple rowversion for one entry (use most recent one) • Multiplies the data (more traffic) • Consistent data for one entry • Keep the relations up to the frontend • Explicit rowversion for each entry • Partial consistency (related data may not be synced yet) Relational Data
  61. • Keep some relational data and de-normalize some of it

    • De-normalize many-to-many relations • Needs trigger or business logic to change main entry’s rowversion when relation changes • Partial consistency (related data may not be synced yet) Relational Data
  62. • PWA helps taking offline the application, but not the

    data • Online != Edge is available • Offline data may be “temporary” (“Remove temporary internet files”) • Use MIN_ACTIVE_ROWVERSION() and “greater or equal than” operator • Think about data level security (permissions, roles, etc.) • Think about conflict management • Depends heavily on your use case Summary
  63. And… now? • Slides: https://speakerdeck.com/manuelrauber • Repository: https://github.com/thinktecture/thinktecture-boardist