Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Mesh, Big Data Redis Mongodb Dynamodb Neo4J, Sharding

Data Mesh, Big Data Redis Mongodb Dynamodb Neo4J, Sharding

Building Cloud-Native App Series - Part 5 of 15
Microservices Architecture Series
Distributed Cache
Redis, Hazelcast, EHCache, Oracle Coherence
NoSQL vs. SQL
Redis, MongoDB, AWS DynamoDB, Neo4J
Big Data Design Patterns
Data Mesh, Data Lake, Warehouse, Data Mart
Sharding, Partitions
Multi-Tenancy

Araf Karsh Hamid

June 01, 2022
Tweet

More Decks by Araf Karsh Hamid

Other Decks in Technology

Transcript

  1. @arafkarsh arafkarsh 8 Years Network & Security 6+ Years Microservices

    Blockchain 8 Years Cloud Computing 8 Years Distributed Computing Architecting & Building Apps a tech presentorial Combination of presentation & tutorial ARAF KARSH HAMID Co-Founder / CTO MetaMagic Global Inc., NJ, USA @arafkarsh arafkarsh 1 Microservice Architecture Series Distributed Cache: Hazelcast, Redis, EHCache NoSQL Vs. SQL: Redis / MongoDB / DynamoDB Scalability: Shards and Partitions Multi-Tenancy: DB, Schema, Table Compliance and Data Security Data Lake, Warehouse, Mart, Data Mesh Part 5 of 15 To Build Cloud Native Apps Using Composable Enterprise Architecture
  2. @arafkarsh arafkarsh 2 Slides are color coded based on the

    topic colors. Distributed Cache EHCache, Hazelcast, Redis, Coherence 1 NoSQL Vs. SQL Redis, MongoDB DynamoDB, Neo4J Data Mesh / Lake 2 Scalability Sharding & Partitions 3 Multi-Tenancy Compliance Data Security 4
  3. @arafkarsh arafkarsh Agile Scrum (4-6 Weeks) Developer Journey Monolithic Domain

    Driven Design Event Sourcing and CQRS Waterfall Optional Design Patterns Continuous Integration (CI) 6/12 Months Enterprise Service Bus Relational Database [SQL] / NoSQL Development QA / QC Ops 3 Microservices Domain Driven Design Event Sourcing and CQRS Scrum / Kanban (1-5 Days) Mandatory Design Patterns Infrastructure Design Patterns CI DevOps Event Streaming / Replicated Logs SQL NoSQL CD Container Orchestrator Service Mesh
  4. @arafkarsh arafkarsh Application Modernization – 3 Transformations 4 Monolithic SOA

    Microservice Physical Server Virtual Machine Cloud Waterfall Agile DevOps Source: IBM: Application Modernization > https://www.youtube.com/watch?v=RJ3UQSxwGFY Architecture Infrastructure Delivery Modernization 1 2 3
  5. @arafkarsh arafkarsh Distributed Cache Feature Set 6 1. Language Support:

    Refers to the programming languages for which the distributed caching solution provides APIs or client libraries. 2. Partitioning & Replication: The ability to partition data across multiple nodes and maintain replicas for fault tolerance and availability. 3. Eviction Policies: Strategies to remove data from the cache when it reaches capacity. Standard policies include Least Recently Used (LRU), and Least Frequently Used (LFU). 4. Persistence: Storing cached data on disk allows cache recovery in case of node failure or restart. 5. Querying: Support for querying cached data using a query language or API. 6. Transactions: The ability to perform atomic operations and maintain data consistency across cache operations. 7. High Availability & Fault Tolerance: Support for redundancy and automatic failover to ensure the cache remains operational in case of node failures. 8. Performance: A measure of the cache's ability to handle read and write operations with low latency and high throughput. 9. Data Structures: The types of data structures supported by the caching solution. 10. Open Source: Whether the caching solution is open-source and freely available for use and modification
  6. @arafkarsh arafkarsh Distributed Cache Comparison 7 Feature EHCache Hazelcast Coherence

    Redis Language Support Java Java, .NET, C++, Python, Node.js, etc. Java Java, Python, .NET, C++, etc. Partitioning & Replication Terracotta integration (limited) Native support Native support Native support (Redis Cluster) Eviction Policies LRU, FIFO, custom LRU, LFU, custom LRU, custom LRU, LFU, volatile, custom Persistence Disk-based persistence Disk-based persistence Disk-based persistence Disk-based and in- memory persistence Querying Limited support SQL-like querying (Predicate API) SQL-like querying (Filter API) Limited querying support Transactions Limited support Native support Native support Native support High Availability & Fault Tolerance Limited (with Terracotta) Native support Native support Native support (via replication and clustering) Performance Moderate High High High Data Structures Key-value pairs Key-value pairs, queues, topics, etc. Key-value pairs, caches, and services Strings, lists, sets, hashes, etc. Open Source Yes Yes No (proprietary) Yes
  7. @arafkarsh arafkarsh Operational In-Memory Computing 8 Cache Topology Standalone This

    setup consists of a single node containing all the cached data. It’s equivalent to a single- node cluster and does not collaborate with other running instances. Distributed Data is spread across multiple nodes in a cache such that only a single node is responsible for fetching a particular entry. This is possible by distributing/partitioning the cluster in a balanced manner (i.e., all the nodes have the same number of entries and are hence load balanced). Failover is handled via configurable backups on each node. Replicated Data is spread across multiple nodes in a cache such that each node consists of the complete cache data, since each cluster node contains all the data; failover is not a concern. Caching Strategies Read Through A process by which a missing cache entry is fetched from the integrated backend store. Write Through A process by which changes to a cache entry (create, update, delete) are pushed into the backend data store. It is important to note that the business logic for Read-Through and Write- Through operations for a specific cache are confined within the caching layer itself. Hence, your application remains insulated from the specifics of the cache and its backing system-of-record. Caching Mode Embedded When the cache and the application co-exist within the same JVM, the cache can be said to be operating in embedded mode. The cache lives and dies with the application JVM. This strategy should be used when: • Tight coupling between your application and the cache is not a concern • The application host has enough capacity (memory) to accommodate the demands of the cache Client / Server In this setup, the application acts as the client to a standalone (remote) caching layer. This should be leveraged when: • The caching infrastructure and application need to evolve independently • Multiple applications use a unified caching layer which can be scaled up without affecting client applications. Java Cache API: JSR 107 [ Distributed Caching / Distributed Computing / Distributed Messaging ]
  8. @arafkarsh arafkarsh Cache Deployment Models 9 Application Standalone Embedded Cache

    JVM Application Node 1 Embedded Cache JVM Application Node 2 Embedded Cache JVM Distributed or Replicated Cache Cluster Application Using Client API JVM Standalone Remote Cache JVM Stand Alone Client Server Cache Distributed or Replicated Cache Cluster Node 1 Remote Cache JVM Node 2 Remote Cache JVM Application Using Client API JVM Stand Alone Embedded Cache
  9. @arafkarsh arafkarsh Spring Cache Example • Service definition with Cacheable

    Annotation • With Complex Object • With Custom Key Generator 10
  10. @arafkarsh arafkarsh Cache Simple Example 11 import org.springframework.cache.annotation.Cacheable; import org.springframework.stereotype.Service;

    @Service public class MyCacheService { @Cacheable(value = ”healthCareCache", key = "#name") public String getGreeting(String name) { / // Simulating an expensive operation try { Thread.sleep(7000); } catch (InterruptedException e) { e.printStackTrace(); } return "Hello, " + name + "! How are you today?"; } } The @Cacheable annotation is part of the Spring Cache abstraction. This annotation indicates that the result of a method invocation should be cached so that subsequent invocations with the same arguments can return the result from the cache. 1. Before the method execution, Spring generates a cache key based on the method arguments and the specified cache name. 2. Spring checks if the cache contains a value associated with the generated cache key. 3. If a cached value is found, it is returned directly, and the method is not executed. 4. The method is executed if no cached value is found, and the result is stored in the cache with the generated key. 5. The result of the method is returned to the caller.
  11. @arafkarsh arafkarsh Cache Annotations 12 @Service @CacheConfig(cacheNames = " healthCareCache

    ") public class PatientService { @Cacheable(key = "#id") public Patient findPatientById(String id) { // Code to Fetch Data } @CachePut(key = "#patient.id") public Patient updatePatient(Patient patient) { // Code to Update data // Cache is also updated } CacheEvict(key = "#id") public void deletePatient(String id) { // Code to Delete data // Cache is Evicted } } @CacheEvict The @CacheEvict annotation is used to remove one or more entries from the cache. You can specify the cache entry key to evict or use the allEntries attribute to remove all entries from the specified cache @CachePut The @CachePut annotation is used to update the cache with the result of the method execution. Unlike @Cacheable, which only executes the method if the result is not present in the cache, @CachePut always executes the method and then updates the cache with the returned value. This annotation is useful when you want to update the cache after modifying data.
  12. @arafkarsh arafkarsh Cache Complex Example 13 public class Patient {

    private String firstName, lastName, dateOfBirth, gender, maritalStatus; private String phone, email, address; public Patient(String firstName, String lastName, String dateOfBirth, String gender, String maritalStatus, String phone, String email, String address) { this.firstName = firstName; this.lastName = lastName; this.dateOfBirth = dateOfBirth; this.gender = gender; this.maritalStatus = maritalStatus; this.phone = phone; this.email = email; this.address = address; } // Getters omitted for brevity }
  13. @arafkarsh arafkarsh Cache Complex Example 14 import org.springframework.cache.annotation.Cacheable; import org.springframework.stereotype.Service;

    @Service public class PatientService { @Cacheable(value = "patient1Cache", key = "#patient.firstName") public Patient findPatientByFirstName(Patient patient) { // Simulate a time-consuming operation try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } return patient; } @Cacheable(value = "patient2Cache", key = "#firstName + '_' + #lastName") public Patient findPatientByFirstAndLastName(String firstName, String lastName) { try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } return new Patient(firstName, lastName, "01-01-1995", "Female", ”Single", ”98765-12345", "[email protected]", "123 Main St"); } }
  14. @arafkarsh arafkarsh Cache Example: Custom Key Generator 15 import org.springframework.cache.interceptor.KeyGenerator;

    import org.springframework.web.context.request.RequestContextHolder; import org.springframework.web.context.request.ServletRequestAttributes; import javax.servlet.http.HttpServletRequest; import java.lang.reflect.Method; public class CustomKeyGenerator implements KeyGenerator { @Override public Object generate(Object target, Method method, Object... params) { HttpServletRequest request = ((ServletRequestAttributes) RequestContextHolder.currentRequestAttributes()).getRequest(); String customHeaderValue = request.getHeader("X-Custom-Header"); StringBuilder keyBuilder = new StringBuilder(); keyBuilder.append(method.getName()).append("-"); if (customHeaderValue != null) { keyBuilder.append(customHeaderValue).append("-"); } for (Object param : params) { keyBuilder.append(param.toString()).append("-"); } return keyBuilder.toString(); } }
  15. @arafkarsh arafkarsh Cache Example: Key Generator Bean 16 import org.springframework.cache.annotation.EnableCaching;

    import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; @Configuration @EnableCaching public class CacheConfig { @Bean("customKeyGenerator") public CustomKeyGenerator customKeyGenerator() { return new CustomKeyGenerator(); } }
  16. @arafkarsh arafkarsh Cache Complex Example 17 import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier;

    import org.springframework.cache.annotation.Cacheable; import org.springframework.cache.interceptor.KeyGenerator; import org.springframework.stereotype.Service; @Service public class PatientService { @Autowired @Qualifier("customKeyGenerator") private KeyGenerator customKeyGenerator; @Cacheable(value = "patientCache", keyGenerator = "customKeyGenerator") public Patient findPatientByFirstAndLastName(String firstName, String lastName) { try { Thread.sleep(3000); } catch (InterruptedException e) { e.printStackTrace(); } return new Patient(firstName, lastName, "01-01-1995", "Female", ”Single", ” 98765-12345", "[email protected]", "123 Main St"); } }
  17. @arafkarsh arafkarsh EHCache Setup 19 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-cache</artifactId> </dependency> <dependency>

    <groupId>org.ehcache</groupId> <artifactId>ehcache</artifactId> </dependency> POM File <ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ehcache.org/schema/ehcache- core.xsd"> <defaultCache maxElementsInMemory="100" eternal="false" timeToIdleSeconds="120" timeToLiveSeconds="120" overflowToDisk="true"/> </ehcache> Configuration – ehcache.xml
  18. @arafkarsh arafkarsh EHCache Properties 20 • name: The unique name

    of the cache. • maxElementsInMemory: The maximum number of elements that can be stored in memory. Once this limit is reached, elements can be evicted or overflow to disk, depending on the configuration. The value is a positive integer. • eternal: A boolean value that indicates whether the elements in the cache should never expire. If set to true, the timeToIdleSeconds and timeToLiveSeconds attributes are ignored. The value is either true or false. • timeToIdleSeconds: The maximum number of seconds an element can be idle (not accessed) before it expires. A value of 0 means there's no limit on the idle time. The value is a non-negative integer. • timeToLiveSeconds: The maximum number of seconds an element can exist in the cache, regardless of idle time. A value of 0 means there's no limit on the element's lifespan. The value is a non-negative integer. • overflowToDisk: A boolean value that indicates whether elements can be moved from memory to disk when the maxElementsInMemory limit is reached. This attribute is deprecated in EHCache 2.10 and removed in EHCache 3.x. Instead, you should use the diskPersistent attribute or configure a disk store element. The value is either true or false.
  19. @arafkarsh arafkarsh EHCache Setup for Distributed 21 <dependency> <groupId>org.ehcache.modules</groupId> <artifactId>ehcache-clustered</artifactId>

    <version>get-latest-version</version> <!-- Replace with the desired version --> </dependency> POM File <config xmlns='http://www.ehcache.org/v3' xmlns:tc='http://www.ehcache.org/v3/clustered’> <tc:clustered> <tc:cluster-uri>terracotta://localhost:9410/my-cache-manager</tc:cluster-uri> </tc:clustered> <cache-template name="myTemplate"> <resources> <tc:clustered-dedicated unit="MB">100</tc:clustered-dedicated> </resources> </cache-template> <cache alias=”healthCareCache" uses-template="myTemplate"/> </config> Configuration – config.xml
  20. @arafkarsh arafkarsh EHCache Spring Boot Config 22 import org.springframework.cache.annotation.EnableCaching; import

    org.springframework.cache.ehcache.EhCacheCacheManager; import org.springframework.cache.ehcache.EhCacheManagerFactoryBean; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; @Configuration @EnableCaching public class CacheConfiguration { @Bean public EhCacheCacheManager cacheManager() { return new EhCacheCacheManager(ehCacheCacheManager().getObject()); } @Bean public EhCacheManagerFactoryBean ehCacheCacheManager() { EhCacheManagerFactoryBean factory = new EhCacheManagerFactoryBean(); factory.setConfigLocation(new ClassPathResource("ehcache.xml")); factory.setShared(true); return factory; } }
  21. @arafkarsh arafkarsh Hazelcast Setup 24 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-cache</artifactId> </dependency> <dependency>

    <groupId>com.hazelcast</groupId> <artifactId>hazelcast</artifactId> </dependency> <dependency> <groupId>com.hazelcast</groupId> <artifactId>hazelcast-spring</artifactId> </dependency> POM File <hazelcast xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config.xsd"> <cache name=" healthCareCache"> <eviction size="100" max-size-policy="ENTRY_COUNT” eviction-policy="LRU" /> <expiry-policy-factory> <timed-expiry-policy-factory expiry-policy-type="TOUCHED" duration-amount="120" time-unit="SECONDS"/> </expiry-policy-factory> </cache> </hazelcast> Configuration – hazelcast.xml
  22. @arafkarsh arafkarsh Hazelcast Properties 25 1. cache name: The unique

    name of the cache. 2. eviction: • size: The maximum number of elements in the cache before eviction occurs. The value is a positive integer. • max-size-policy: The cache size policy. The possible values are ENTRY_COUNT, USED_HEAP_SIZE, and USED_HEAP_PERCENTAGE. • eviction-policy: The eviction policy for the cache. The possible values are LRU (Least Recently Used), LFU (Least Frequently Used), RANDOM, and NONE. 3. expiry-policy-factory: Configures the cache's expiration policy. • timed-expiry-policy-factory: A factory for creating a timed expiry policy. • expiry-policy-type: The type of expiry policy. Possible values are CREATED, ACCESSED, MODIFIED, and TOUCHED. CREATED expires based on the creation time, ACCESSED expires based on the last access time, MODIFIED expires based on the last modification time, and TOUCHED expires based on the last access or modification time. • duration-amount: The duration amount for the expiry policy. The value is a positive integer. • time-unit: The time unit for the duration amount. Possible values are NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, MINUTES, HOURS, and DAYS.
  23. @arafkarsh arafkarsh Hazelcast Setup for Distributed 26 <hazelcast xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

    xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config.xsd"> <network> <join> <multicast enabled="true"/> <!-- or use TCP/IP for discovering other nodes --> <!— <tcp-ip enabled="true"> <member-list> <member>machine1:5701</member> <member>machine2:5701</member> </member-list> </tcp-ip> --> </join> </network> <map name=”healthCareCache"> <eviction size="100" max-size-policy="ENTRY_COUNT" eviction-policy="LRU" /> <max-idle-seconds>120</max-idle-seconds> </map> </hazelcast> Configuration – hazelcast.xml
  24. @arafkarsh arafkarsh Hazelcast Spring Boot Config 27 import com.hazelcast.config.Config; import

    com.hazelcast.core.Hazelcast; import com.hazelcast.core.HazelcastInstance; import org.springframework.cache.annotation.EnableCaching; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; import org.springframework.cache.CacheManager; import org.springframework.cache.hazelcast.HazelcastCacheManager; @Configuration @EnableCaching public class CacheConfiguration { @Bean public CacheManager cacheManager() { return new HazelcastCacheManager(hazelcastInstance()); } @Bean public HazelcastInstance hazelcastInstance() { Config config = new Config(); config.setConfigurationFile( new ClassPathResource("hazelcast.xml").getFile()); return Hazelcast.newHazelcastInstance(config); } }
  25. @arafkarsh arafkarsh Oracle Coherence Setup 29 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-cache</artifactId> </dependency>

    <!-- https://mvnrepository.com/artifact/com.oracle.coherence.sp ring/coherence-spring-boot-starter --> <dependency> <groupId>com.oracle.coherence.spring</groupId> <artifactId>coherence-spring-boot-starter</artifactId> <version>3.3.2</version> </dependency> POM File <?xml version="1.0"?> <cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config" xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config coherence-cache-config.xsd"> <caching-scheme-mapping> <cache-mapping> <cache-name> healthCareCache </cache-name> <scheme-name>example-distributed</scheme-name> </cache-mapping> </caching-scheme-mapping> <caching-schemes> <distributed-scheme> <scheme-name>example-distributed</scheme-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <autostart>true</autostart> </distributed-scheme> </caching-schemes> </cache-config> Configuration – coherence.xml
  26. @arafkarsh arafkarsh Coherence Properties 30 1.cache-config: The root element for

    the Coherence cache configuration. 2.caching-scheme-mapping: Contains the mapping of cache names to caching schemes. 1. cache-mapping: Defines the mapping between a cache name and a caching scheme. 1. cache-name: The unique name of the cache. 2. scheme-name: The name of the caching scheme that this cache should use. 3.caching-schemes: Contains the caching scheme definitions. 1. distributed-scheme: The distributed caching scheme. This scheme provides a distributed cache, partitioned across the cluster members. 1. scheme-name: The name of the distributed caching scheme. 2. backing-map-scheme: The backing map scheme that defines the storage strategy for the distributed cache. 1. local-scheme: A local backing map scheme that stores the cache data in the local member's memory. 3. autostart: A boolean value that indicates whether the cache should start automatically when the cache service starts. The value is either true or false.
  27. @arafkarsh arafkarsh Oracle Coherence Spring Boot Config 31 import com.tangosol.net.CacheFactory;

    import com.tangosol.net.ConfigurableCacheFactory; import com.tangosol.net.NamedCache; import org.springframework.cache.CacheManager; import org.springframework.cache.annotation.EnableCaching; import org.springframework.cache.support.SimpleCacheManager; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; import java.util.Collections; @Configuration @EnableCaching public class CacheConfiguration { @Bean@Bean public CacheManager cacheManager() { NamedCache<String, String> cache = getCache(”healthCareCache"); SimpleCacheManager cacheManager = new SimpleCacheManager(); cacheManager.setCaches(Collections.singletonList(cache)); return cacheManager; } @Bean public ConfigurableCacheFactory configurableCacheFactory() { return CacheFactory.getCacheFactoryBuilder() .getConfigurableCacheFactory(new ClassPathResource( "coherence.xml").getFile()); } private NamedCache<String, String> getCache(String cacheName) { return configurableCacheFactory().ensureCache(cacheName, null); } }
  28. @arafkarsh arafkarsh Redis Setup Standalone & Distributed 33 <dependency> <groupId>org.springframework.boot</groupId>

    <artifactId>spring-boot-starter-data-redis</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-cache</artifactId> </dependency>> POM File spring.redis.host=127.0.0.1 spring.redis.port=6379 Standalone Configuration – application.properties Distributed Configuration – application.properties spring.redis.cluster.nodes=node1:6379,node2:6380 cluster-enabled yes cluster-config-file nodes.conf cluster-node-timeout 5000 Update this for each Redis instance Nodes.conf is automatically created by Redis when running in cluster mode redis.conf
  29. @arafkarsh arafkarsh Redis Spring Boot Config 34 import org.springframework.cache.CacheManager; import

    org.springframework.cache.annotation.EnableCaching; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.data.redis.cache.RedisCacheConfiguration; import org.springframework.data.redis.cache.RedisCacheManager; import org.springframework.data.redis.connection.RedisConnectionFactory; import org.springframework.data.redis.core.RedisTemplate; import java.time.Duration; @Configuration @EnableCaching public class CacheConfiguration { @Bean public CacheManager cacheManager(RedisConnectionFactory redisConnectionFactory) { RedisCacheConfiguration configuration = RedisCacheConfiguration.defaultCacheConfig() .entryTtl(Duration.ofSeconds(120)); // Set the time-to-live for the cache entries return RedisCacheManager.builder(redisConnectionFactory) . cacheDefaults(configuration) .build(); } @Bean public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory redisConnectionFactory) { RedisTemplate<String, Object> redisTemplate = new RedisTemplate<>(); redisTemplate.setConnectionFactory(redisConnectionFactory); return redisTemplate; } }
  30. @arafkarsh arafkarsh 2 NoSQL Databases o Cap Theorem o Sharding

    / Partitioning o Geo Partitioning o Oracle Sharding and Geo Partitioning 35
  31. @arafkarsh arafkarsh ACID Vs. BASE 36 # Property ACID BASE

    1 Acronym Atomicity, Consistency, Isolation, Durability Basically Available, Soft state, Eventual consistency 2 Focus Strong consistency, data integrity, transaction reliability High availability, partition tolerance, high scalability 3 Applicability Traditional relational databases (RDBMS) Distributed NoSQL databases 4 Transactions Ensures all-or-nothing transactions Allows partial transactions, more flexible 5 Consistency Guarantees strong consistency Supports eventual consistency 6 Isolation Ensures transactions are isolated from each other Transactions may not be fully isolated 7 Durability Guarantees data is permanently stored once committed Data durability may be delayed, relying on eventual consistency 8 Latency Higher latency due to stricter consistency constraints Lower latency due to relaxed consistency constraints 9 Use Cases Financial systems, inventory management, etc. Social networks, recommendation systems, search engines, etc.
  32. @arafkarsh arafkarsh CAP Theorem by Eric Allen Brewer 37 Pick

    Any 2!! Say NO to 2 Phase Commit ☺ Source: https://en.wikipedia.org/wiki/CAP_theorem | http://en.wikipedia.org/wiki/Eric_Brewer_(scientist) CAP 12 years later: How the “Rules have changed” “In a network subject to communication failures, it is impossible for any web service to implement an atomic read / write shared memory that guarantees a response to every request.” Partition Tolerance (Key in Cloud) The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. Consistency Every read receives the most recent write or an error. Availability Every request receives a (non-error) response – without guarantee that it contains the most recent write. Old Single Node RDBMS
  33. @arafkarsh arafkarsh Databases that Support CA 38 Aster (Teradata Aster

    Database): A parallel, distributed, and columnar database is designed to perform advanced analytics and manage large- scale data. It provides high performance and availability but does not explicitly focus on partition tolerance. Greenplum: It is an open-source MPP database that is based on PostgreSQL. It is designed for handling large-scale analytical workloads and provides high performance and availability. Greenplum is also designed for fault tolerance and can recover from failures; however, it does not explicitly focus on partition tolerance. Vertica is a columnar MPP (Massively Parallel Processing) database designed for high-performance analytics and large-scale data management. Vertica offers high availability through data replication and automated failover, ensuring the system's resilience in case of node failures. However, Vertica does not explicitly focus on partition tolerance. Traditional RDBMS (Single Node Implementation) 1. DB2 2. MS SQL 3. MySQL 4. Oracle 5. PostgreSQL
  34. @arafkarsh arafkarsh Databases that support both AP / CP 39

    1. MongoDB 2. Cassandra 3. Amazon DynamoDB 4. Couchbase 5. Riak 6. ScyllaDB • Network partitions are considered inevitable in modern distributed systems, and most databases and systems now prioritize partition tolerance by default. • The challenge is to find the right balance between consistency and availability in the presence of partitions.
  35. @arafkarsh arafkarsh MongoDB: Consistency / Partition Tolerance 40 import com.mongodb.MongoClientSettings;

    import com.mongodb.ReadConcern; import com.mongodb.WriteConcern; import com.mongodb.client.MongoClient; import com.mongodb.client.MongoClients; import com.mongodb.client.MongoDatabase; public class MongoDBExample { public static void main(String[] args) { MongoClientSettings settings = MongoClientSettings.builder() .applyConnectionString(new ConnectionString("mongodb://localhost:27017")) .readConcern(ReadConcern.MAJORITY) .writeConcern(WriteConcern.MAJORITY) .build(); MongoClient mongoClient = MongoClients.create(settings); MongoDatabase exampleDb = mongoClient.getDatabase(”healthcare_db"); } }
  36. @arafkarsh arafkarsh MongoDB: Availability / Partition Tolerance 41 import com.mongodb.MongoClientSettings;

    import com.mongodb.ReadConcern; import com.mongodb.WriteConcern; import com.mongodb.client.MongoClient; import com.mongodb.client.MongoClients; import com.mongodb.client.MongoDatabase; public class MongoDBExample { public static void main(String[] args) { MongoClientSettings settings = MongoClientSettings.builder() . .applyConnectionString(new ConnectionString("mongodb://localhost:27017")) .readConcern(ReadConcern.LOCAL) .writeConcern(WriteConcern.W1) .build(); MongoClient mongoClient = MongoClients.create(settings); MongoDatabase exampleDb = mongoClient.getDatabase(”healthcare_db"); } }
  37. @arafkarsh arafkarsh Cassandra: Consistency / Partition Tolerance 42 import com.datastax.driver.core.Cluster;

    import com.datastax.driver.core.Session; import com.datastax.driver.core.ConsistencyLevel; import com.datastax.driver.core.SimpleStatement; Import com.datastax.driver.core.Statement; public class CassandraAPExample { public static void main(String[] args) { Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); Session session = cluster.connect(" healthcare_keyspace"); Statement writeStatement = new SimpleStatement("INSERT INTO diagnosis_t (id, value) VALUES (1, 'test')"); writeStatement.setConsistencyLevel(ConsistencyLevel.QUORUM); session.execute(writeStatement); Statement readStatement = new SimpleStatement("SELECT * FROM diagnosis_t WHERE id = 1"); readStatement.setConsistencyLevel(ConsistencyLevel.QUORUM); session.execute(readStatement); } }
  38. @arafkarsh arafkarsh Cassandra: Availability / Partition Tolerance 43 import com.datastax.driver.core.Cluster;

    import com.datastax.driver.core.Session; import com.datastax.driver.core.ConsistencyLevel; import com.datastax.driver.core.SimpleStatement; Import com.datastax.driver.core.Statement; public class CassandraAPExample { public static void main(String[] args) { Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); Session session = cluster.connect(" healthcare_keyspace"); Statement writeStatement = new SimpleStatement("INSERT INTO diagnosis_t (id, value) VALUES (1, 'test')"); writeStatement.setConsistencyLevel(ConsistencyLevel.ONE); session.execute(writeStatement); Statement readStatement = new SimpleStatement("SELECT * FROM diagnosis_t WHERE id = 1"); readStatement.setConsistencyLevel(ConsistencyLevel.ONE); session.execute(readStatement); } }
  39. @arafkarsh arafkarsh AWS DynamoDB: Consistency / Partition Tolerance 44 import

    com.amazonaws.services.dynamodbv2.AmazonDynamoDB; import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder; import com.amazonaws.services.dynamodbv2.document.DynamoDB; import com.amazonaws.services.dynamodbv2.document.Item; import com.amazonaws.services.dynamodbv2.document.spec.GetItemSpec; import com.amazonaws.services.dynamodbv2.document.spec.PutItemSpec; public class DynamoDBAPExample { public static void main(String[] args) { AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build(); DynamoDB dynamoDB = new DynamoDB(client); // Write an item (CP-like behavior) PutItemSpec putItemSpecCP = new PutItemSpec() .withItem(new Item().withPrimaryKey("id", 1) .withString("value", "test")); dynamoDB.getTable("diagnosis_t").putItem(putItemSpecCP); // Read with strongly consistent read (CP-like behavior) GetItemSpec getItemSpecStronglyConsistent = new GetItemSpec() .withPrimaryKey("id", 1) .withConsistentRead(true); Item itemCP = dynamoDB.getTable("diagnosis_t ") .getItem(getItemSpecStronglyConsistent); System.out.println("CP: " + itemCP); } }
  40. @arafkarsh arafkarsh AWS DynamoDB: Availability / Partition Tolerance 45 import

    com.amazonaws.services.dynamodbv2.AmazonDynamoDB; import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder; import com.amazonaws.services.dynamodbv2.document.DynamoDB; import com.amazonaws.services.dynamodbv2.document.Item; import com.amazonaws.services.dynamodbv2.document.spec.GetItemSpec; import com.amazonaws.services.dynamodbv2.document.spec.PutItemSpec; public class DynamoDBAPExample { public static void main(String[] args) { AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build(); DynamoDB dynamoDB = new DynamoDB(client); // Write an item (CP-like behavior) PutItemSpec putItemSpecCP = new PutItemSpec() .withItem(new Item().withPrimaryKey("id", 1) .withString("value", "test")); dynamoDB.getTable(" diagnosis_t").putItem(putItemSpecCP); // Read with eventually consistent read (AP-like behavior) GetItemSpec getItemSpecEventuallyConsistent = new GetItemSpec() .withPrimaryKey("id", 1) .withConsistentRead(false); Item itemAP = dynamoDB.getTable(" diagnosis_t") .getItem(getItemSpecEventuallyConsistent); System.out.println("AP: " + itemAP); } }
  41. @arafkarsh arafkarsh NoSQL Databases 47 Database Type ACID Query Use

    Case Couchbase Doc Based, Key Value Open Source Yes N1QL Financial Services, Inventory, IoT Cassandra Wide Column Open Source No CQL Social Analytics Retail, Messaging Neo4J Graph Open Source Commercial Yes Cypher AI, Master Data Mgmt Fraud Protection Redis Key Value Open Source Yes Many languages Caching, Queuing Mongo DB Doc Based Open Source Commercial Yes JS IoT, Feal Time Analytics Inventory, Amazon Dynamo DB Key Value Doc based Vendor Yes DQL Gamming, Retail, Financial Services Source: https://searchdatamanagement.techtarget.com/infographic/NoSQL-database-comparison-to-help-you-choose-the-right-store.
  42. @arafkarsh arafkarsh SQL Vs NoSQL 48 SQL NoSQL Database Type

    Relational Non-Relational Schema Pre-Defined Dynamic Schema Database Category Table Based 1. Documents 2. Key Value Stores 3. Graph Stores 4. Wide Column Stores Queries Complex Queries (Standard SQL for all Relational Databases) Need to apply Special Query language for each type of NoSQL DB. Hierarchical Storage Not a Good Fit Perfect Scalability Scales well for traditional Applications Scales well for Modern heavy data-oriented Application Query Language SQL – Standard Language across all the Databases Non-Standard Query Language as each of the NoSQL DB is different. ACID Support Yes For some of the Database (Ex. MongoDB) Data Size Good for traditional Applications Handles massive amount of Data for the Modern App requirements.
  43. @arafkarsh arafkarsh SQL Vs NoSQL (MongoDB) 49 1. In MongoDB

    Transactional Properties are scoped at Doc Level. 2. One or More fields can be atomically written in a Single Operation. 3. With Updates to multiple sub documents including nested arrays. 4. Any Error results in the entire operation to Roll back. 5. This is at par with Data Integrity Guarantees provided Traditional Databases.
  44. @arafkarsh arafkarsh Multi Table / Doc ACID Transactions 50 Examples

    – Systems of Record or Line of Business (LoB) Applications 1. Finance 1. Moving funds between Bank Accounts, 2. Payment Processing Systems 3. Trading Platforms 2. Supply Chain • Transferring ownership of Goods & Services through Supply Chains and Booking Systems – Ex. Adding Order and Reducing inventory. 3. Billing System 1. Adding a Call Detail Record and then updating Monthly Plan. Source: ACID Transactions in MongoDB
  45. @arafkarsh arafkarsh Redis • Data Structures • Design Patterns 51

    2020 2019 NoSQL Database Model 1 1 Redis Key-Value, Multi Model 2 2 Amazon DynamoDB Multi Model 3 3 Microsoft Cosmos Multi Model 4 4 Memcached Key-Value In-Memory Databases
  46. @arafkarsh arafkarsh Why do you need In-Memory Databases 52 1

    Users 1 Million + 2 Data Volume Terabytes to Petabytes 3 Locality Global 4 Performance Microsecond Latency 5 Request Rate Millions Per Second 6 Access Mobile, IoT, Devices 7 Economics Pay as you go 8 Developer Access Open API Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
  47. @arafkarsh arafkarsh Tables / Docs (JSON) – Why Redis is

    different? 53 • Redis is a Multi data model Key Store • Commands operate on Keys • Data types of Keys can change overtime Source: https://www.youtube.com/watch?v=ELk_W9BBTDU
  48. @arafkarsh arafkarsh Keys, Values & Data Types 54 movie:StarWars “Sold

    Out” Key Name Value String Hash List Set Sorted Set Basic Data Types Key Properties • Unique • Binary Safe (Case Sensitive) • Max Size = 512 MB Expiration / TTL • By Default – Keys are retained • Time in Seconds, Milli Second, Unix Epoch • Added / Removed from Key ➢ SET movie:StarWars ex 5000 (Expires in 5000 seconds) ➢ PEXPIRE movie:StarWars 5 (set for 5 milli seconds) https://redis.io/commands/set
  49. @arafkarsh arafkarsh Redis – Remote Dictionary Server 55 Distributed In-Memory

    Data Store String Standard String data Hash { A: “John Doe”, B: “New York”, C:USA” } List [ A -> B -> C -> D. -> E ] Set { A, B, C, D, E } Sorted Set { A:10, B:12, C:14:, D:20, E:32 } Stream … msg1, msg2, msg3 Pub / Sub … msg1, msg2, msg3 https://redis.io/topics/data-types
  50. @arafkarsh arafkarsh Data Type: Hash 56 movie:The-Force-Awakens Value J. J.

    Abrams L. Kasdan, J. J. Abrams, M. Arndt Dan Mindel ➢ HGET movie:The-Force-Awakens Director “J. J. Abrams” • Field & Value Pairs • Single Level • Add and Remove Fields • Set Operations • Intersect • Union https://redis.io/topics/data-types https://redis.io/commands#hash Key Name Director Writer Cinematography Field Use Cases • Session Cache • Rate Limiting
  51. @arafkarsh arafkarsh Data Type: List 57 movies Key Name “Force

    Awakens, The” “Last Jedi, The” “Rise of Skywalker, The” ➢ LPOP movies “Force Awakens, The” ➢ LPOP movies “Last Jedi, The” ➢ RPOP movies “Rise of Skywalker, The” ➢ RPOP movies “Last Jedi, The” • Ordered List (FIFO or LIFO) • Duplicates Allowed • Elements added from Left or Right or By Position • Max 4 Billion elements per List Type of Lists • Queues • Stacks • Capped List https://redis.io/topics/data-types https://redis.io/commands#list Use Cases • Communication • Activity List
  52. @arafkarsh arafkarsh Data Type: Set 58 movies Member / Element

    “Force Awakens, The” “Last Jedi, The” “Rise of Skywalker, The” ➢ SMEMBERS movies “Force Awakens, The” “Last Jedi, The” “Rise of Skywalker, The” • Un-Ordered List of Unique Elements • Set Operations • Difference • Intersect • Union https://redis.io/topics/data-types https://redis.io/commands#set Key Name Use Cases • Unique Visitors
  53. @arafkarsh arafkarsh Data Type: Sorted Set 59 movies Value “Force

    Awakens, The” “Last Jedi, The” “Rise of Skywalker, The” ➢ ZRANGE movies 0 1 “Last Jedi, The” “Rise of Skywalker, The” • Ordered List of Unique Elements • Set Operations • Intersect • Union https://redis.io/topics/data-types https://redis.io/commands#set Key Name 3 1 2 Score Use Cases • Leaderboard • Priority Queues
  54. @arafkarsh arafkarsh Redis: Transactions 60 • Transactions are • Atomic

    • Isolated • Redis commands are queue • All the Queued commands are executed sequentially as an Atomic unit ➢ MULTI ➢ SET movie:The-Force-Awakens:Review Good ➢ INCR movie:The-Force-Awakens:Rating ➢ EXEC
  55. @arafkarsh arafkarsh Redis In-Memory Data Store Use cases 61 Machine

    Learning Message Queues Gaming Leaderboards Geospatial Session Store Media Streaming Real-time Analytics Caching
  56. @arafkarsh arafkarsh Use Case: Sorted Set – Leader Board 62

    • Collection of Sorted Distinct Entities • Set Operations and Range Queries based on Score value: John score: 610 value : Jane score: 987 value : Sarah score: 1597 value : Maya score: 144 value : Fred score: 233 value : Ann score: 377 Game Scores ➢ ZADD game:1 987 Jane 1597 Sarah 377 Maya 610 John 144 Ann 233 Fred ➢ ZREVRANGE game:1 0 3 WITHSCORES. (Get top 4 Scores) • Sarah 1597 • Jane 987 • John 610 • Ann 377 Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg https://redis.io/commands/zadd
  57. @arafkarsh arafkarsh Use Case: Geospatial 63 • Compute distance between

    members • Find all members within a radius Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg ➢ GEOADD cities 87.6298 41.8781 Chicago ➢ GEOADD cities 122.3321 447.6062 Seattle ➢ ZRANGE cities0 -1 • “Chicago” • “Seattle” ➢ GEODIST cities Chicago Seattle mi • “1733.4089” ➢ GEORADIUS cities 122.4194 37..7749 1000 mi WITHDIST • “Seattle” • “679.4848” o m for meters o km for kilometres o mi for miles o ft for feet https://redis.io/commands/geodist
  58. @arafkarsh arafkarsh Use Case: Streams 64 • Ordered collection of

    Data • Efficient for consuming from the tail • Multiple Consumers support similar to Kafka { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } { “order”: “xy2123adbcd” { “item”: “book1”, “qty”: 1 } } START END Consumer 1 Consumer 2 Consumer n Consumer G1 Consumer G2 Consumer Group G ➢ XADD orderStream * orderId1:item1:qty1 ➢ XADD orderStream * orderId2:item1:qty2 https://redis.io/commands/xadd * Autogenerates the Uniq ID Producer ➢ XREAD BLOCK 20 STREAMS orderStream $ • orderId2 • Item1 • qty2 Consumer
  59. @arafkarsh arafkarsh MongoDB: Design Patterns 1. Prefer Embedding 2. Embrace

    Duplication 3. Know when Not to Embed 4. Relationships and Join 65
  60. @arafkarsh arafkarsh MongoDB Docs – Prefer Embedding 66 Use Structure

    to use Data within a Document Include Bounded Arrays to have multiple records
  61. @arafkarsh arafkarsh MongoDB Docs – Embrace Duplication 67 Field Info

    Duplicated from Customer Profile Address Duplicated from Customer Profile
  62. @arafkarsh arafkarsh Know When Not to Embed 68 As Item

    is used outside of Order, You don’t need to embed the whole Object here. Instead give the Item Reference ID. (Not to Embed) Name is given to decouple it from Item (Product) Service. (Embrace Duplication)
  63. @arafkarsh arafkarsh Relationships and Joins 69 Reviews are joined to

    Product Collection using Item UUID Bi-Directional Joins are also supported
  64. @arafkarsh arafkarsh MongoDB – Tips & Best Practices 70 1.

    MongoDB Will Abort any Multi Document transaction that runs for more than 60 seconds. 2. No More than 1000 documents should be modified within a Transaction. 3. Developers need to try logic to retry the transaction in case transaction is aborted due to network error. 4. Transactions that affects Multiple Shards incur a greater performance Cost as operations are coordinated across multiple participating nodes over the network. 5. Performance will be impacted if a transaction runs against a collection that is subject to rebalancing.
  65. @arafkarsh arafkarsh Amazon DynamoDB Concept Customer ID Name Category State

    Order Order Customer Cart Payments Order Cart Catalogue Catalogue Table Product ID Name Value Description Image Item ID Quantity Value Currency User ID + Item ID Attributes 1. A single Table holds multiple Entities (Customer, Catalogue, Cart, Order etc.) aka Items. 2. Item contains a collection of Attributes. 3. Primary Key plays a key role in Performance, Scalability and avoiding Joins (in a typical RDBMS way). 4. Primary Key contains a Partition Key and an option Sort Key. 5. Item Data Model is JSON, and Attribute can be a field or a Custom Object. Items Primary Key
  66. @arafkarsh arafkarsh DynamoDB – Under the Hood One Single table

    Multiple Entities with multiple documents (Records in RDBMS style) 1 Org Record 2 Employee Record 1 Org Record 2 Employee Record 1. DynamoDB Structure is JSON (Document Model) – However, it has no resemblance to MongoDB in terms DB implementation or Schema Design Patterns. 2. Multiple Entities are part of the Single Table and this helps to avoid expensive joins. For Ex. PK = ORG#Magna will retrieve all the 3 records. 1 Record from Org Entity and 2 Records from Employee Entity. 3. Partition Key helps in Sharding and Horizontal Scalability.
  67. @arafkarsh arafkarsh Features 75 In a Graph Database, data is

    represented as nodes (also called vertices) and edges (also called relationships or connections). • Nodes represent entities or objects, while • Edges represent the relationships or connections between those entities. • Both nodes and edges can have properties (key-value pairs) that store additional information about the entities or their relationships. The main components of a Graph Database are: 1. Nodes: The fundamental units representing entities, such as people, products, or locations. 2. Edges: The connections between nodes, representing relationships, such as "friends with," "purchased," or "lives in." 3. Properties: Key-value pairs that store additional information about nodes or edges, such as names, ages, or timestamps.
  68. @arafkarsh arafkarsh Advantages 76 1. Flexibility: Graph Databases can quickly

    adapt to changes in the data model and accommodate the addition or removal of nodes, edges, or properties without significant disruption. 2. Performance: Graph Databases are optimized for querying connected data, allowing for faster traversal of relationships compared to traditional relational databases. 3. Intuitive Representation: Graph Databases represent data in a way that closely mirrors real-world entities and their relationships, making it easier to understand and work with the data.
  69. @arafkarsh arafkarsh Example: Healthcare App 77 Nodes: 1. Patients 2.

    Doctors 3. Hospitals 4. Diagnoses 5. Treatments 6. Medications 7. Insurances Edges: The edges would represent relationships between these entities, such as: 1. Patient 'visited' Doctor 2. Doctor 'works at a' Hospital 3. Patient 'diagnosed with' Diagnosis 4. Diagnosis 'treated with' Treatment 5. Treatment 'involves' Medication 6. Patient 'covered by’ Insurance Properties: Nodes and edges could have properties to store additional information about the entities or their relationships, such as: 1. Patient: name, age, gender, medical history 2. Doctor: name, specialty, experience, ratings 3. Hospital: name, location, facilities, ratings 4. Diagnosis: name, description, prevalence, risk factors 5. Treatment: name, type, duration, success rate 6. Medication: name, dosage, side effects, interactions 7. Insurance: company, coverage, premium, limitations 8. Visited: Date & Time, Doctors Name, Hospital
  70. @arafkarsh arafkarsh How the data is represented in Graph DB

    78 P1 D1 Node P1 Patient Name DOB Phone Node D1 Doctor Name Phone Specialty D2 P2 Edges Node D2 Doctor Name Phone Specialty p2.d2.v1, Date, Clinic L1 Node L1 Lab Name Phone Specialty Node P2 Patient Name DOB Phone M1 Node M1 Pharmacy Phone Type (Internal) Edges with Properties Nodes with Properties
  71. @arafkarsh arafkarsh Sample Code 79 MATCH (p:Patient {patient_id: "P1"}), (d:Doctor

    {doctor_id: "D1"}) CREATE (p)-[:VISITED {date: "2023-05-01"}]->(d) MATCH (p:Patient {patient_id: "P1"}), (diag:Diagnosis {diagnosis_id: "Dg1"}) CREATE (p)-[:DIAGNOSED {date: "2023-05-02"}]->(diag) MATCH (p:Patient {patient_id: "P1"}), (l:Lab {lab_id: "L1"}) CREATE (p)-[:HAD_XRAY {date: "2023-05-03"}]->(l) CREATE (p:Patient {patient_id: "P1", name: "John Doe"}) CREATE (d:Doctor {doctor_id: "D1", name: "Dr. Smith"}) CREATE (diag:Diagnosis {diagnosis_id: "Dg1", name: "Flu"}) CREATE (l:Lab {lab_id: "L1", name: "X-Ray Lab"})
  72. @arafkarsh arafkarsh Neo4J – Graph Data Science (GDS) Library 80

    1. Graph traversal algorithms: 1. Depth-First Search (DFS) 2. Breadth-First Search (BFS) 2. Shortest path algorithms: 1. Dijkstra's algorithm 2. A* (A-Star) algorithm 3. All Pairs Shortest Path (APSP) 3. Centrality algorithms: 1. Degree Centrality: Computes a node's incoming and outgoing number of relationships. 2. Closeness Centrality: Measures how central a node is to its neighbors. 3. Betweenness Centrality: Measures the importance of a node based on the number of shortest paths passing through it. 4. PageRank: A popular centrality algorithm initially designed to rank web pages based on the idea that more important nodes will likely receive more connections from other nodes.
  73. @arafkarsh arafkarsh Neo4J – Graph Data Science (GDS) Library 81

    4. Community detection algorithms: 1. Label Propagation: A fast algorithm for detecting communities within a graph based on propagating labels to form clusters. 2. Louvain Modularity: An algorithm for detecting communities by optimizing a modularity score. 3. Weakly Connected Components: Identifies groups of nodes where each node is reachable from any other node within the same group, disregarding the direction of relationships. 5.Similarity algorithms: 1. Jaccard Similarity: Measures the similarity between two sets by comparing their intersection and union. 2. Cosine Similarity: Measures the similarity between two vectors based on the cosine of the angle between them. 3. Pearson Similarity: Measures the similarity between two vectors based on their Pearson correlation coefficient. 6.Pathfinding algorithms: 1. Minimum Weight Spanning Tree: Computes the minimum weight spanning tree for a connected graph, using algorithms like Kruskal's or Prim's.
  74. @arafkarsh arafkarsh Data Mesh o Introduction to Data Mesh and

    Key Principles o Problems Data Mesh Solves o Real-World Use Cases for Data Mesh o Case Study: Banking, Retail o Building a Data Mesh 82
  75. @arafkarsh arafkarsh Comparison Data Lake / Warehouse / Mart 83

    Data Lake Warehouse Data Mart Storage for Raw Data Data lakes store raw, unprocessed data in its native format, including structured data, semi-structured data (like logs or XML), and unstructured data (such as emails and documents). Data warehouses store data that has been processed and structured into a defined schema. Also does not store raw data, similar to data warehouses. It stores processed and refined data specific to a particular business function. Scalability Typically built on scalable cloud platforms or Hadoop, data lakes can handle massive volumes of data Moderately scalable, traditionally limited by hardware when on-premises, but modern cloud-based solutions offer considerable scalability. Least scalable due to its focused and limited scope, typically designed to serve specific departmental needs. Performance Performance can vary. While it's excellent for big data processing and machine learning tasks, it might not perform as well for quick, ad-hoc query scenarios compared to structured systems. Optimized for high performance in query processing, especially for complex queries across large datasets. Designed for speed and efficiency in retrieval operations. Generally offers high performance for its limited scope and targeted queries, enabling faster response times for the specific business area it serves.
  76. @arafkarsh arafkarsh Comparison Data Lake / Warehouse / Mart 84

    Data Lake Warehouse Data Mart Flexibility Extremely flexible in terms of the types of data it can store and how data can be used. It allows for the exploration and manipulation of data in various formats. Less flexible as it requires data to fit into a predefined schema, which might limit the types of data that can be easily integrated and queried. Also has limited flexibility, tailored to specific business functions with data structured for particular uses. Purpose Ideal for data discovery, data science, and machine learning where access to large and diverse data sets is necessary. Designed for business intelligence, analytics, and reporting, where fast, reliable, and consistent data retrieval is crucial. Serves specific departmental needs by providing data that is relevant and quickly accessible to business users within a department. Data Integrity & Consistency Data integrity and consistency can be a challenge due to the variety and volume of raw and unprocessed data. High integrity and consistency. Data is processed, cleansed, and conformed to ensure reliability and accuracy, which is critical for decision-making processes. Similar to data warehouses, data marts ensure high data integrity and consistency within their focused scope, as the data often originates from a data warehouse.
  77. @arafkarsh arafkarsh Data Mesh in a Nutshell 86 Data mesh

    is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments— within or across organizations. Source: Dehghani, Zhamak. Data Mesh (p. 46). O'Reilly Media. Zhamak Dehghani
  78. @arafkarsh arafkarsh 4 Principles of Data Mesh 87 1. Domain-Oriented

    Decentralized Data Ownership and Architecture: Data is managed by domain-specific teams that treat their data as a product. These teams are responsible for their own data pipelines and outputs. 2. Data as a Product: Data is treated as a product with a focus on the consumers' needs. This includes clear documentation, SLAs, and a user-friendly interface for accessing the data. 3. Self-Serve Data Infrastructure as a Platform: This principle aims to empower domain teams by providing them with a self-serve data infrastructure, which helps them handle their data products with minimal central oversight. 4. Federated Computational Governance: Governance is applied across domains through a federated model, ensuring that data quality, security, and access controls are maintained without stifling innovation. Source: Dehghani, Zhamak. Data Mesh (p. 56). O'Reilly Media.
  79. @arafkarsh arafkarsh Problems Data Mesh Solves 89 1. Elimination of

    Silos: By empowering domain teams to manage their own data, Data Mesh breaks down silos and encourages a more collaborative approach to data management. 2. Scalability Issues: Traditional data platforms often struggle to scale effectively as data volume and complexity grow. Data Mesh's decentralized approach allows more scalable solutions by distributing the data workload. 3. Over-centralization of Data Teams: Centralized data teams can become bottlenecks. Data Mesh decentralizes this by making domain teams responsible for their data, thus distributing workload and responsibility. 4. Adaptability and Agility: Data Mesh allows organizations to be more adaptable and agile in their data strategy, as changes can be made more swiftly and efficiently at the domain level.
  80. @arafkarsh arafkarsh Real-World Use Cases 90 1. Financial Services: In

    a banking scenario, different domains such as loans, credit cards, and customer service can independently manage their data, enabling faster innovation and personalized customer experiences while maintaining compliance through federated governance. 2. E-commerce: Large e-commerce platforms manage diverse data from inventory, sales, customer feedback, and logistics. Each domain can optimize its data management and analytics, improving service delivery and operational efficiency. 3. Healthcare: Different departments such as clinical data, patient records, and insurance processing can manage their data as discrete products, enhancing data privacy, compliance, and patient outcomes through more tailored data usage. 4. Manufacturing: Domains like production, supply chain, and maintenance in a manufacturing enterprise can leverage Data Mesh to optimize their operations independently, using real-time data streaming via Kafka for immediate responsiveness and decision-making.
  81. @arafkarsh arafkarsh Banking Data Products 91 1. Customer Segmentation Data

    Product o Purpose: To segment customers based on various parameters like income, spending habits, life stages, and financial goals to offer personalized banking services. o Contents: Demographics, transaction history, account types, customer interactions, and feedback. o Usage: Marketing campaigns, personalized product offerings, customer retention strategies. 2. Risk Assessment Data Product o Purpose: To assess and predict the risk associated with lending to individuals or businesses. o Contents: Credit scores, repayment history, current financial obligations, economic conditions, and employment status. o Usage: Credit scoring, loan approvals, setting interest rates, provisioning for bad debts. 3. Fraud Detection Data Product o Purpose: To detect and prevent fraudulent transactions in real-time. o Contents: Transaction patterns, flagged transactions, account holder's historical data, IP addresses, and device information. o Usage: Real-time fraud monitoring, alerting systems, and improving security measures.
  82. @arafkarsh arafkarsh Banking Data Products 92 4. Regulatory Compliance Data

    Product o Purpose: To ensure all banking operations comply with local and international regulatory requirements. o Contents: Transaction records, customer data, audit trails, compliance check results. o Usage: Reporting to regulatory bodies, internal audits, compliance monitoring. 5. Investment Insights Data Product o Purpose: To provide customers and bank advisors with insights into investment opportunities. o Contents: Market data, economic indicators, historical investment performance, news feeds, and predictive analytics. o Usage: Investment advisory services, customer dashboards, portfolio management.
  83. @arafkarsh arafkarsh Retail Data Products 93 1. Customer Behavior Data

    Product o Purpose: To understand customer preferences, buying patterns, and engagement across channels. o Contents: Purchase history, online browsing logs, loyalty card data, customer feedback. o Usage: Personalized marketing, product recommendations, customer experience enhancement. 2. Inventory Optimization Data Product o Purpose: To manage stock levels efficiently across all retail outlets and warehouses. o Contents: Stock levels, sales velocity, supplier lead times, seasonal trends. o Usage: Stock replenishment, markdown management, warehouse space optimization. 3. Sales Performance Data Product o Purpose: To track and analyze sales performance across various dimensions such as geography, product line, and time period. o Contents: Sales data, promotional campaigns effectiveness, customer demographics, product returns data.
  84. @arafkarsh arafkarsh Retail Data Products 94 4. Supplier Performance Data

    Product o Purpose: To evaluate and manage supplier relationships based on performance metrics. o Contents: Delivery times, quality metrics, cost analysis, compliance data. o Usage: Supplier negotiations, procurement strategy, risk management. 5. Market Trend Analysis Data Product o Purpose: To capture and analyze market trends and consumer sentiment. o Contents: Social media data, market research reports, competitor analysis, economic indicators. o Usage: New product development, competitive strategy, pricing strategies.
  85. @arafkarsh arafkarsh How? 95 Source: https://www.datamesh-architecture.com/ From the Centralized Data

    Team and Data To Distributed Decentralized Model Source: https://www.datamesh-architecture.com/
  86. @arafkarsh arafkarsh Data Mesh Architecture 96 Source: https://www.datamesh-architecture.com/ A data

    mesh architecture is a decentralized approach that enables domain teams to perform cross- domain data analysis on their own.
  87. @arafkarsh arafkarsh Building Data Mesh: 1 of 4 97 1.

    Define Requirements and Assess Current Capabilities o Assess Current Data Usage and Needs: Analyze current data flows, storage needs, and processing requirements. Identify pain points in your existing infrastructure. o Forecast Future Needs: Estimate future data growth based on business goals. Consider not only the volume but also the complexity and diversity of data that will need to be managed. o Compliance and Security Needs: Ensure that your infrastructure will comply with applicable data protection regulations (like GDPR, HIPAA, PCI) and security standards. 2. Choose the Right Data Storage Solutions o Diverse Data Storage Options: Use a combination of storage solutions (SQL databases, NoSQL databases, data warehouses, and data lakes) to cater to different types of data and access patterns. o Elastic Scalability: Opt for cloud-based solutions such as Amazon S3, Google Cloud Storage, or Azure Blob Storage for elastic scalability and durability.
  88. @arafkarsh arafkarsh Building Data Mesh: 2 of 4 98 3.

    Implement Data Processing Frameworks o Batch Processing: Implement batch processing systems for large-scale analytics and reporting. Apache Hadoop and Spark are popular choices for handling massive amounts of data with fault tolerance. o Stream Processing: For real-time data processing needs, use tools like Apache Kafka, Apache Flink, and Apache Storm. These tools can handle high throughput and low- latency processing. o Hybrid Processing: Consider hybrid models that combine batch and stream processing for more flexibility. 4. Ensure Data Integration and Orchestration o Data Integration Tools: Use robust ETL (Extract, Transform, Load) tools or more modern ELT approaches to integrate data from various sources. Tools like Apache Kafka Connect, Talend, Apache Nifi, or Stitch can automate these processes. o Workflow Orchestration: Use workflow orchestration tools like Apache Airflow or Dagster to manage dependencies and scheduling of data processing jobs across multiple platforms.
  89. @arafkarsh arafkarsh Building Data Mesh: 3 of 4 99 5.

    Adopt a Microservices Architecture o Decoupled Services: Implement microservices to break down your data infrastructure into smaller, manageable, and independently scalable services. o Containerization: Use Docker containers to encapsulate microservices, making them portable and easier to manage. o Orchestration Platforms: Utilize Kubernetes or Docker Swarm for managing containerized services, ensuring they scale properly with demand. 6. Use Data Management and Monitoring Tools o Data Cataloging: Implement data catalogue tools to manage metadata and ensure data is findable and accessible. Tools like Apache Atlas or Collibra can be useful. o Monitoring and Logging: Use monitoring tools to track the performance and health of your data systems. Service Mesh, Prometheus, Grafana, and ELK (Elasticsearch, Logstash, Kibana) stacks are effective for monitoring and visualizing metrics.
  90. @arafkarsh arafkarsh Building Data Mesh: 4 of 4 100 7.

    Ensure Scalability and Reliability o Load Balancing: Use load balancers to distribute workloads evenly across servers, preventing any single point of failure. (Kubernetes/Kafka/Flink) o Data Redundancy and Backup: Implement data replication and backup strategies to ensure data durability and recoverability. o Scalable Architecture Design: Design your infrastructure to scale out (adding more machines) or scale up (adding more power to existing machines) based on demand. (Kubernetes/Kafka/Flink) 8. Foster a Culture of Continuous Improvement o Regular Audits and Updates: Regularly review and upgrade your infrastructure to incorporate new technologies and improvements. o Training and Development: Keep your team updated with the latest data technologies and best practices through continuous training and development.
  91. @arafkarsh arafkarsh Popular Data Mesh Tech Stacks 101 o Google

    Cloud BigQuery o AWS S3 and Athena o Azure Synapse Analytics o dbt and Snowflake o Databricks (How To Build a Data Product with Databricks) o MinIO and Trino o SAP o Kafka and RisingWave Source: https://www.datamesh-architecture.com/ Data Mesh User Stories Data mesh is primarily an organizational approach, and that's why you can't buy a data mesh from a vendor.
  92. @arafkarsh arafkarsh Google Data Mesh Stack 102 BigQuery is the

    central component for storing analytical data. BigQuery is a columnar data store and can perform efficient JOIN operations with large data set. BigQuery supports both, batch ingestion and streaming ingestion. When the operational system architecture relies on Apache Kafka, then streaming through Kafka Connect Google BigQuery Sink Connector is recommended. Source: Google Cloud BigQuery
  93. @arafkarsh arafkarsh Amazon Data Mesh Stack 103 AWS S3 is

    the central component for storing analytical data. S3 is a file based object store and data can be stored in many formats, such as CSV, JSON, Avro, or Parquet. S3 buckets are used for all stages: raw files, aggregated data, and even data products. Every domain team typically has their own AWS S3 buckets to store their own data products. Analytical queries are executed through AWS Athena that queries data stored in many locations, including files on S3, with standard SQL and performs cross-dataset join operations. Athena uses Presto, a distributed query engine. Source: AWS S3 and Athena
  94. @arafkarsh arafkarsh Azure Data Mesh Stack 104 Source: Azure Synapse

    Analytics Microsoft offers Azure Synapse Analytics, along with both Data Lake Storage Gen2 and SQL database, as the central components for implementing a data mesh architecture.
  95. @arafkarsh arafkarsh DBT Snowflake Data Mesh Stack 105 Source: dbt

    and Snowflake dbt is a framework to transform, clean, and aggregate data within your data warehouse. Transformations are written as plain SQL statements and result in models that are SQL views, materialized views, or tables, without the need to define their structure using DDL upfront. dbt embraces tests to verify data when running any transformation, both for sources and results Snowflake stores data in tables that are logically organized in databases and schemas.
  96. @arafkarsh arafkarsh SAP Data Mesh Stack 106 Source: SAP SAP

    Dataspherecomes with an exceptional integration into SAP applications, allowing to re-use the rich business semantic and data entity models for building data products. SAP Datasphereintegrates out of the box with SAP S/4HANA tables and supports replication as well as federation. The integration is based on the VDM (virtual data model) which forms the basis for data access in SAP S/4HANA. SAP HANA Cloud and SAP HANA Cloud Data Lake can be fully leveraged for data stored within SAP Datasphere.
  97. @arafkarsh arafkarsh Kafka Streaming Data Mesh Stack 107 Source: Kafka

    and RisingWave Kafka already has its place in many "classical" implementations of data mesh, namely as an ingestion layer for streaming data into data products. This tech stack extends the scope of Kafka far beyond merely serving as an ingestion layer. Here, data products are not just ingested from Kafka but data products live on Kafka, for bi- directional interactions between the operational systems and data products. Open-Source Stack "Classical" data mesh implementations firmly put their data products on the analytical plane, either in data warehouses (such as Snowflake or BigQuery), data lakes (S3, MinIO) or data lakehouses (Databricks).
  98. @arafkarsh arafkarsh Challenges of Implementing Data Mesh 108 o Cultural

    Shift: Adopting Data Mesh requires significant changes in organizational culture and mindset, particularly the shift towards viewing data as a product. o Technical Heterogeneity: Implementing a self-serve data infrastructure that can accommodate diverse technologies and systems across domains can be challenging. o Governance Complexity: Balancing autonomy with oversight requires sophisticated governance mechanisms that can be complex to implement and maintain.
  99. @arafkarsh arafkarsh Benefits of Data Mesh 109 o Scalability: By

    decentralizing data ownership and management, Data Mesh can scale more effectively as organizations grow. o Agility: Domains can quickly adapt and respond to changes and needs within their specific areas, leading to faster innovation. o Enhanced Collaboration: Data Mesh fosters a collaborative environment by encouraging domains to share their data products across the organization, enhancing cross-functional projects and innovation. o Improved Data Quality and Accessibility: With domain experts managing their own data, the overall quality and relevance of data improve, making it more accessible and useful to end users.
  100. @arafkarsh arafkarsh Data Mesh Summary 110 o Data Lakes are

    best for flexible, scalable storage of raw data and are ideal for exploratory work and big data applications. o Data Warehouses excel in performance, data integrity, and consistency, making them suitable for structured business intelligence tasks. o Data Marts provide targeted performance and data consistency, optimized for department-specific analytic needs. Each system has its strengths and ideal use cases, and organizations often benefit from employing a combination of these structures to meet different needs across various aspects of their operations Data Mesh is an o innovative and strategic framework for o managing and accessing data across o large and complex organizations. o It shifts from a centralized model of data management to o a decentralized one, o treating data as a product and emphasizing o domain-specific ownership and accountability.. Domain Oriented Decentralized Data Data as a Product Self Serve Data Infrastructure Federated Computational Governance
  101. @arafkarsh arafkarsh Scalability: Sharding / Partitions • Scale Cube •

    eBay Case Study • Sharding and Partitions 111 3
  102. @arafkarsh arafkarsh App Scalability based on micro services architecture Source:

    The NewStack. Based on the Art of Scalability by By Martin Abbot & Michael Fisher 113
  103. @arafkarsh arafkarsh Scale Cube and Micro Services 114 1. Functional

    Decomposition 2. Avoid locks by Database Sharding 3. Cloning Services
  104. @arafkarsh arafkarsh Scalability Best Practices : Lessons from Best Practices

    Highlights #1 Partition By Function • Decouple the Unrelated Functionalities. • Selling functionality is served by one set of applications, bidding by another, search by yet another. • 16,000 App Servers in 220 different pools • 1000 logical databases, 400 physical hosts #2 Split Horizontally • Break the workload into manageable units. • eBay’s interactions are stateless by design • All App Servers are treated equal and none retains any transactional state • Data Partitioning based on specific requirements #3 Avoid Distributed Transactions • 2 Phase Commit is a pessimistic approach comes with a big COST • CAP Theorem (Consistency, Availability, Partition Tolerance). Apply any two at any point in time. • @ eBay No Distributed Transactions of any kind and NO 2 Phase Commit. #4 Decouple Functions Asynchronously • If Component A calls component B synchronously, then they are tightly coupled. For such systems to scale A you need to scale B also. • If Asynchronous A can move forward irrespective of the state of B • SEDA (Staged Event Driven Architecture) #5 Move Processing to Asynchronous Flow • Move as much processing towards Asynchronous side • Anything that can wait should wait #6 Virtualize at All Levels • Virtualize everything. eBay created their on O/R layer for abstraction #7 Cache Appropriately • Cache Slow changing, read-mostly data, meta data, configuration and static data. 115 Source: http://www.infoq.com/articles/ebay-scalability-best-practices
  105. @arafkarsh arafkarsh Sharding & Partitions • Horizontal Sharding • Vertical

    Sharding • Partitioning (Vertical) • Geo Partitioning 116
  106. @arafkarsh arafkarsh Sharding / Partitioning 117 Method Scalability Table Sharding

    Horizontal Rows Same Schema with Uniq Rows Sharding Vertical Columns Different Schema Partition Vertical Rows Same Schema with Uniq Rows 1. Optimize the Database 2. Separate Rows or Columns into multiple smaller tables 3. Each table has either Same Schema with Unique Rows 4. Or has a Schema that is subset of the Original Customer ID Customer Name DOB City 1 ABC Bengaluru 2 DEF Tokyo 3 JHI Kochi 4 KLM Pune Original Table Customer ID Customer Name DOB City 1 ABC Bengaluru 2 DEF Tokyo Customer ID Customer Name DOB City 3 JHI Kochi 4 KLM Pune Horizontal Sharding - 1 Horizontal Sharding - 2 Customer ID Customer Name DOB 1 ABC 2 DEF 3 JHI 4 KLM Customer ID City 1 Bengaluru 2 Tokyo 3 Kochi 4 Pune Vertical Sharding - 1 Vertical Sharding - 2
  107. @arafkarsh arafkarsh Sharding Scenarios 118 1. Horizontal Scaling: Single Server

    is unable to handle the load even after partitioning the data sets. 2. Data can be partitioned in such a way that specific server(s) can serve the search query based on the partition. For Ex. In an e-Commerce Application Searching the data based on 1. Product Type 2. Product Brand 3. Sellers Region (for Local Shipping) 4. Orders based on Year / Months
  108. @arafkarsh arafkarsh Geo Partitioning 119 • Geo-partitioning is the ability

    to control the location of data at the row level. • CockroachDB lets you control which tables are replicated to which nodes. But with geo-partitioning, you can control which nodes house data with row-level granularity. • This allows you to keep customer data close to the user, which reduces the distance it needs to travel, thereby reducing latency and improving user experience. Source: https://www.cockroachlabs.com/blog/geo-partition-data-reduce-latency/
  109. @arafkarsh arafkarsh Oracle Sharding and Geo 121 CREATE SHARDED TABLE

    customers ( cust_id NUMBER NOT NULL , name VARCHAR2(50) , address VARCHAR2(250) , geo VARCHAR2(20) , class VARCHAR2(3) , signup_date DATE , CONSTRAINT cust_pk PRIMARY KEY(geo, cust_id) ) PARTITIONSET BY LIST (geo) PARTITION BY CONSISTENT HASH (cust_id) PARTITIONS AUTO ( PARTITIONSET AMERICA VALUES (‘AMERICA’) TABLESPACE SET tbs1, PARTITIONSET ASIA VALUES (‘ASIA’) TABLESPACE SET tbs2 ); Primary Shard Standby Shards Read / Write Tx / Second Read Only Tx / Second 25 25 1.18 Million 1.62 Million 50 50 2.11 Million 3.26 Million 75 75 3.57 Million 5.05 Million 100 100 4.38 Million 6.82 Million Linear Scalability Source: https://www.oracle.com/a/tech/docs/sharding-wp-12c.pdf
  110. @arafkarsh arafkarsh MongoDB Replication 124 Application (Client App Driver) Replica

    Set1 (mongos) RS 2 (mongos) RS 3 (mongos) Secondary Servers Primary Server Replication Replication Heartbeat Source: MongoDB Replication https://docs.mongodb.com/manual/replication/ ✓ Provides redundancy High Availability. ✓ It provides Fault Tolerance as multiple copies of data on different database servers ensures that the loss of a single database server will not affect the Application. 1. Replicate the primary's oplog and 2. Apply the operations to their data sets such that the secondaries' data sets reflect the primary's data set. 3. Secondary apply the operations to their data sets asynchronously What Secondary does? What Primary does? 1. Receives all write operations mongodb:// mongodb0.example.com:27017, mongodb1.example.com:27017, mongodb2.example.com:27017/? replicaSet=myRepl Use Secure Connection mongodb://myDBReader:D1fficultP%40ssw0rd @mongodb0.example.com:27017 Replica Set Connection Configuration
  111. @arafkarsh arafkarsh MongoDB Replication: Automatic Failover 125 Source: MongoDB Replication

    https://docs.mongodb.com/manual/replication/ ✓ If the Primary is NOT reachable more than the configured electionTimeoutMillis (default 10 seconds) then ✓ One of the Secondary will become the Primary after an election process. ✓ Most updated Secondary will become the next Primary. ✓ Election should not take more than 12 seconds to elect a Primary. Replica Set1 (mongos) RS 2 (mongos) RS 3 (mongos) Secondary Servers Primary Server Heartbeat Election for new Primary Replica Set1 (mongos) Primary (mongos) RS 3 (mongos) Secondary Servers Primary Server Heartbeat Election for new Primary Replication ✓ The write Operations will be blocked until the new Primary is selected. ✓ The Secondary Replica Set can serve the Read Operations while the election is in progress provided its configured for that. ✓ MongoDB 4.2+ compatible drivers enable retryable writes by default ✓ MongoDB 4.0 and 3.6-compatible drivers must explicitly enable retryable writes by including retryWrites=true in the connection string.
  112. @arafkarsh arafkarsh MongoDB Replication: Arbiter 126 Application (Client App Driver)

    Replica Set1 (mongos) RS 2 (mongos) Arbiter (mongos) Secondary Servers Primary Server Replication ✓ An Arbiter can be used to save the cost of adding an additional Secondary Server. ✓ Arbiter will handle only the election process to select a Primary. Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
  113. @arafkarsh arafkarsh MongoDB Replication: Secondary Reads 127 Replica Set1 (mongos)

    RS 2 (mongos) RS 3 (mongos) Secondary Servers Primary Server Replication Replication Heartbeat Source: MongoDB Replication https://docs.mongodb.com/manual/core/read-preference/ ✓ Asynchronous replication to secondaries means that reads from secondaries may return data that does not reflect the state of the data on the primary. ✓ Multi-document transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member. Write to Primary and Read from Secondary Application (Client App Driver) Read from the Secondary Write mongo ‘mongodb://mongodb0,mongodb1,mongodb2/?replicaSet=rsOmega&readPreference=secondary’ $ >
  114. @arafkarsh arafkarsh MongoDB – Deploy Replica Set 128 mongod --replSet

    “rsOmega” --bind_ip localhost,<hostname(s)|ip address(es)> $ > replication: replSetName: "rsOmega" net: bindIp: localhost,<hostname(s)|ip address(es)> Config File mongod --config <path-to-replica-config> $ > Use Config file to set the Replica Config to each Mongo Instance Use Command Line to set Replica details to each Mongo Instance 1 Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
  115. @arafkarsh arafkarsh MongoDB – Deploy Replica Set 129 mongo $

    > Initiate the Replica Set Connect to Mongo DB 2 > rs.initiate( { _id : "rsOmega", members: [ { _id: 0, host: "mongodb0.host.com:27017" }, { _id: 1, host: "mongodb1.host.com :27017" }, { _id: 2, host: "mongodb2.host.com :27017" } ] }) 3 Run rs.initiate() on just one and only one mongod instance for the replica set. Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
  116. @arafkarsh arafkarsh MongoDB – Deploy Replica Set 130 mongo ‘mongodb://mongodb0,mongodb1,mongodb2/?replicaSet=rsOmega’

    $ > > rs.conf() Show Config Show the Replica Config 4 Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/ > rs.status() 5 Ensure that the replica set has a primary mongo $ > 6 Connect to the Replica Set
  117. @arafkarsh arafkarsh MongoDB Sharding 131 Application (Client App Driver) Config

    Server (mongos) Config (mongos) Config (mongos) Secondary Servers Primary Server Router (mongos) Router (mongos) Replica Set1 (mongos) RS 2 (mongos) RS 3 (mongos) Secondary Servers Primary Server Shard 1 Replica Set1 (mongos) RS 2 (mongos) RS 3 (mongos) Secondary Servers Primary Server Shard 2 Replica Set1 (mongos) RS 2 (mongos) RS 3 (mongos) Secondary Servers Primary Server Shard 3
  118. @arafkarsh arafkarsh Scalability Best Practices : Lessons from Best Practices

    Highlights #1 Partition By Function • Decouple the Unrelated Functionalities. • Selling functionality is served by one set of applications, bidding by another, search by yet another. • 16,000 App Servers in 220 different pools • 1000 logical databases, 400 physical hosts #2 Split Horizontally • Break the workload into manageable units. • eBay’s interactions are stateless by design • All App Servers are treated equal and none retains any transactional state • Data Partitioning based on specific requirements #3 Avoid Distributed Transactions • 2 Phase Commit is a pessimistic approach comes with a big COST • CAP Theorem (Consistency, Availability, Partition Tolerance). Apply any two at any point in time. • @ eBay No Distributed Transactions of any kind and NO 2 Phase Commit. #4 Decouple Functions Asynchronously • If Component A calls component B synchronously, then they are tightly coupled. For such systems to scale A you need to scale B also. • If Asynchronous A can move forward irrespective of the state of B • SEDA (Staged Event Driven Architecture) #5 Move Processing to Asynchronous Flow • Move as much processing towards Asynchronous side • Anything that can wait should wait #6 Virtualize at All Levels • Virtualize everything. eBay created their on O/R layer for abstraction #7 Cache Appropriately • Cache Slow changing, read-mostly data, meta data, configuration and static data. Source: http://www.infoq.com/articles/ebay-scalability-best-practices
  119. @arafkarsh arafkarsh Type of Multi-Tenancy 135 Tenant B Tenant A

    App Tenant A Tenant B 1. Separate Database Tenant B Tenant A App Shared DB Tenant A Schema Tenant B Schema 2. Shared Database, Separate Schema Tenant B Tenant A App Shared DB 3. Shared Database, Shared Entity Entity • Tenant A • Tenant B
  120. @arafkarsh arafkarsh Hibernate: 1. Separate DB 136 import org.hibernate.SessionFactory; import

    org.hibernate.boot.registry.StandardServiceRegistryBuilder; import org.hibernate.cfg.Configuration; public class HibernateUtil { public static SessionFactory getSessionFactory(String tenantId) { Configuration configuration = new Configuration(); configuration.configure("hibernate.cfg.xml"); configuration.setProperty("hibernate.connection.url", "jdbc:postgresql://localhost:5432/" + tenantId); configuration.setProperty("hibernate.default_schema", tenantId); StandardServiceRegistryBuilder builder = new StandardServiceRegistryBuilder() .applySettings(configuration.getProperties()); return configuration.buildSessionFactory(builder.build()); } }
  121. @arafkarsh arafkarsh Hibernate: 2. Shared DB Separate Schema 137 import

    org.hibernate.boot.model.naming.Identifier; Import org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl; import org.hibernate.engine.jdbc.env.spi.JdbcEnvironment; import org.hibernate.SessionFactory; import org.hibernate.boot.registry.StandardServiceRegistryBuilder; import org.hibernate.cfg.Configuration; public class TenantAwareNamingStrategy extends PhysicalNamingStrategyStandardImpl { private String tenantId; public TenantAwareNamingStrategy(String tenantId) { this.tenantId = tenantId; } @Override public Identifier toPhysicalTableName(Identifier name, JdbcEnvironment context) { String tableName = tenantId + "_" + name.getText(); return new Identifier(tableName, name.isQuoted()); } } public class HibernateUtil { public static SessionFactory getSessionFactory(String tenantId) { Configuration configuration = new Configuration(); configuration.configure("hibernate.cfg.xml"); configuration.setPhysicalNamingStrategy(new TenantAwareNamingStrategy(tenantId)); StandardServiceRegistryBuilder builder = new StandardServiceRegistryBuilder() .applySettings(configuration.getProperties()); return configuration.buildSessionFactory(builder.build()); } }
  122. @arafkarsh arafkarsh Hibernate: 3. Shared DB Shared Entity 138 import

    javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.Table; import org.hibernate.annotations.Filter; import org.hibernate.annotations.FilterDef; import org.hibernate.annotations.ParamDef; I import org.hibernate.SessionFactory; import org.hibernate.boot.registry.StandardServiceRegistryBuilder; Import org.hibernate.cfg.Configuration; @Entity @Table(name = "employees") @FilterDef(name = "tenantFilter", parameters = {@ParamDef(name = "tenantId", type = "string")}) @Filter(name = "tenantFilter", condition = "tenant_id = :tenantId") public class Employee { @Column(name = "tenant_id", nullable = false) private String tenantId; // // other columns …….. getters and setters } public class HibernateUtil { public static SessionFactory getSessionFactory() { Configuration configuration = new Configuration(); configuration.configure("hibernate.cfg.xml"); StandardServiceRegistryBuilder builder = new StandardServiceRegistryBuilder() .applySettings(configuration.getProperties()); return configuration.buildSessionFactory(builder.build()); } }
  123. @arafkarsh arafkarsh Multi-Tenancy in MongoDB, Redis, Cassandra 139 Feature MongoDB

    Redis Cassandra Separate databases / instances per tenant Separate databases per tenant Separate instances per tenant Separate keyspaces per tenant Strong data isolation, individual tenant backup and restoration. Complete data isolation, more resources and management overhead. Strong data isolation, simplifies tenant-specific backup and restoration. Shared database / instance with separate collections / tables / namespaces per tenant Shared database with separate collections per tenant Shared instance with separate namespaces per tenant Shared keyspace with separate tables per tenant Balances data isolation and resource utilization. Some level of data isolation, better resource utilization. Balances data isolation and resource utilization. Shared database / instance and shared collections / tables Shared database and shared collections Not applicable Shared keyspace and shared tables Optimizes resource utilization, requires careful implementation. Optimizes resource utilization, requires careful implementation to avoid data leaks.
  124. @arafkarsh arafkarsh Best Practices: Health Care App 141 1. Understand

    the regulations: Familiarize yourself with HIPAA rules, including the Privacy Rule, Security Rule, and Breach Notification Rule. These rules outline the standards and requirements for protecting sensitive patient data, known as Protected Health Information (PHI). 2. Implement Access Controls: Ensure only authorized users can access PHI by implementing robust authentication mechanisms like multi-factor authentication (MFA), role-based access controls (RBAC), and proper password policies. 3. Encrypt Data: Use encryption for data at rest and in transit. Implement encryption technologies like SSL/TLS for data in transit and AES-256 for data at rest. Store encryption keys securely and separately from the data they protect. 4. Regular Audits and Monitoring: Regularly audit and monitor your systems for security vulnerabilities and potential breaches. Implement logging mechanisms to track access to PHI and use intrusion detection systems to monitor for unauthorized access. 5. Data Backups and Disaster Recovery: Implement a robust data backup and disaster recovery plan to ensure the availability and integrity of PHI in case of data loss or system failures.
  125. @arafkarsh arafkarsh Best Practices: Health Care App 142 6. Regular

    Risk Assessments: Conduct regular risk assessments to identify potential risks to PHI's confidentiality, integrity, and availability. Develop a risk management plan to address these risks and ensure continuous improvement of your security posture. 7. Implement a Privacy Policy: Develop and maintain a privacy policy that clearly outlines how your organization handles and protects PHI. This policy should be easily accessible to users and should be updated regularly to reflect changes in your organization’s practices or regulations. 8. Employee Training: Train employees on HIPAA regulations, your organization’s privacy policy, and security best practices. Regularly update and reinforce this training to ensure continued compliance. 9. Business Associate Agreements (BAAs): Ensure that any third-party vendors, contractors, or partners who have access to PHI sign a Business Associate Agreement (BAA) that outlines their responsibilities for maintaining the privacy and security of PHI. 10. Incident Response Plan: Develop an incident response plan to handle potential data breaches or security incidents. This plan should include procedures for identifying, containing, and mitigating breaches and notifying affected individuals and relevant authorities.
  126. @arafkarsh arafkarsh Best Practices: eCommerce App 143 1. Build and

    Maintain a Secure Network: 1. Install and maintain a firewall configuration to protect cardholder data. 2. Change vendor-supplied defaults for system passwords and other security parameters. 2. Protect Cardholder Data: 1. Encrypt transmission of cardholder data across open, public networks using SSL/TLS or other encryption methods. 2. Avoid storing cardholder data whenever possible. If you must store data, use strong encryption and access controls, and securely delete data when no longer needed. 3. Maintain a Vulnerability Management Program: 1. Regularly update and patch all systems, including operating systems, applications, and security software, to protect against known vulnerabilities. 2. Use and regularly update anti-virus software or programs. 4. Implement Strong Access Control Measures: 1. Restrict access to cardholder data by business need-to-know by implementing role-based access controls (RBAC). 2. Implement robust authentication methods, such as multi-factor authentication (MFA), for all users with access to cardholder data. 3. If applicable, restrict physical access to cardholder data by implementing physical security measures like secure storage, access controls, and surveillance.
  127. @arafkarsh arafkarsh Best Practices: eCommerce App 144 5. Regularly Monitor

    and Test Networks: 1. Track and monitor all access to network resources and cardholder data using logging mechanisms and monitoring tools. 2. Regularly test security systems and processes, including vulnerability scans, penetration tests, and file integrity monitoring. 6. Maintain an Information Security Policy: 1. Develop, maintain, and disseminate a comprehensive information security policy that addresses all PCI-DSS requirements and is reviewed and updated at least annually. 7. Use Tokenization or Third-Party Payment Processors: 1. Consider using tokenization or outsourcing payment processing to a PCI-DSS-compliant third-party provider. This can reduce the scope of compliance and protect cardholder data by minimizing the exposure of sensitive information within your systems. 8. Educate and Train Employees: 1. Train employees on PCI-DSS requirements, your company's security policies, and best practices for handling cardholder data securely. Regularly reinforce and update this training. 9. Regularly Assess and Update Security Measures: 1. Conduct regular risk assessments and security audits to identify potential risks and vulnerabilities. Update your security measures accordingly to maintain compliance and ensure the continued protection of cardholder data.
  128. @arafkarsh arafkarsh Oracle Data Security 146 • Row-level security: Oracle's

    Virtual Private Database (VPD) feature enables row-level security by adding a dynamic WHERE clause to SQL statements. You can use this feature to restrict access to specific rows based on user roles or attributes. • Column-level security: Oracle provides column-level security through the use of column masking with Data Redaction. This feature allows you to mask sensitive data in specific columns for unauthorized users, ensuring only authorized users can view the data. • Encryption: Oracle's Transparent Data Encryption (TDE) feature automatically encrypts data at rest without requiring changes to the application code. TDE encrypts data within the database files and automatically decrypts it when accessed by authorized users.
  129. @arafkarsh arafkarsh PostgreSQL Data Security 147 • Row-level security: PostgreSQL

    supports row-level security using Row Security Policies. These policies allow you to define access rules for specific rows based on user roles or attributes. • Column-level security: PostgreSQL provides column-level security using column privileges. You can grant or revoke specific privileges (e.g., SELECT, INSERT, UPDATE) on individual columns to control access to sensitive data. • Encryption: PostgreSQL does not have built-in transparent data encryption like Oracle. However, you can use third-party solutions, such as pgcrypto, to encrypt data within the database. You can use file-system level encryption or full-disk encryption for data at rest.
  130. @arafkarsh arafkarsh MySQL Data Security 148 • Row-level security: MySQL

    does not have built-in row-level security features like Oracle and PostgreSQL. However, you can implement row-level security by adding appropriate WHERE clauses to SQL statements in your application code, restricting access to specific rows based on user roles or attributes. • Column-level security: MySQL supports column-level security through column privileges. Like PostgreSQL, you can grant or revoke specific privileges on individual columns to control access to sensitive data. • Encryption: MySQL Enterprise Edition provides Transparent Data Encryption (TDE) to encrypt data at rest automatically. For the Community Edition, you can use file-system level encryption or full-disk encryption to protect data at rest. Data in transit can be encrypted using SSL/TLS.
  131. @arafkarsh arafkarsh 149 100s Microservices 1,000s Releases / Day 10,000s

    Virtual Machines 100K+ User actions / Second 81 M Customers Globally 1 B Time series Metrics 10 B Hours of video streaming every quarter Source: NetFlix: : https://www.youtube.com/watch?v=UTKIT6STSVM 10s OPs Engineers 0 NOC 0 Data Centers So what do NetFlix think about DevOps? No DevOps Don’t do lot of Process / Procedures Freedom for Developers & be Accountable Trust people you Hire No Controls / Silos / Walls / Fences Ownership – You Build it, You Run it.
  132. @arafkarsh arafkarsh 150 50M Paid Subscribers 100M Active Users 60

    Countries Cross Functional Team Full, End to End ownership of features Autonomous 1000+ Microservices Source: https://microcph.dk/media/1024/conference-microcph-2017.pdf 1000+ Tech Employees 120+ Teams
  133. @arafkarsh arafkarsh 151 Design Patterns are solutions to general problems

    that software developers faced during software development. Design Patterns
  134. @arafkarsh arafkarsh 152 Thank you DREAM | AUTOMATE | EMPOWER

    Araf Karsh Hamid : India: +91.999.545.8627 http://www.slideshare.net/arafkarsh https://speakerdeck.com/arafkarsh https://www.linkedin.com/in/arafkarsh/ https://www.youtube.com/user/arafkarsh/playlists http://www.arafkarsh.com/ @arafkarsh arafkarsh
  135. @arafkarsh arafkarsh References 155 1. July 15, 2015 – Agile

    is Dead : GoTo 2015 By Dave Thomas 2. Apr 7, 2016 - Agile Project Management with Kanban | Eric Brechner | Talks at Google 3. Sep 27, 2017 - Scrum vs Kanban - Two Agile Teams Go Head-to-Head 4. Feb 17, 2019 - Lean vs Agile vs Design Thinking 5. Dec 17, 2020 - Scrum vs Kanban | Differences & Similarities Between Scrum & Kanban 6. Feb 24, 2021 - Agile Methodology Tutorial for Beginners | Jira Tutorial | Agile Methodology Explained. Agile Methodologies
  136. @arafkarsh arafkarsh References 156 1. Vmware: What is Cloud Architecture?

    2. Redhat: What is Cloud Architecture? 3. Cloud Computing Architecture 4. Cloud Adoption Essentials: 5. Google: Hybrid and Multi Cloud 6. IBM: Hybrid Cloud Architecture Intro 7. IBM: Hybrid Cloud Architecture: Part 1 8. IBM: Hybrid Cloud Architecture: Part 2 9. Cloud Computing Basics: IaaS, PaaS, SaaS 1. IBM: IaaS Explained 2. IBM: PaaS Explained 3. IBM: SaaS Explained 4. IBM: FaaS Explained 5. IBM: What is Hypervisor? Cloud Architecture
  137. @arafkarsh arafkarsh References 157 Microservices 1. Microservices Definition by Martin

    Fowler 2. When to use Microservices By Martin Fowler 3. GoTo: Sep 3, 2020: When to use Microservices By Martin Fowler 4. GoTo: Feb 26, 2020: Monolith Decomposition Pattern 5. Thought Works: Microservices in a Nutshell 6. Microservices Prerequisites 7. What do you mean by Event Driven? 8. Understanding Event Driven Design Patterns for Microservices
  138. @arafkarsh arafkarsh References – Microservices – Videos 158 1. Martin

    Fowler – Micro Services : https://www.youtube.com/watch?v=2yko4TbC8cI&feature=youtu.be&t=15m53s 2. GOTO 2016 – Microservices at NetFlix Scale: Principles, Tradeoffs & Lessons Learned. By R Meshenberg 3. Mastering Chaos – A NetFlix Guide to Microservices. By Josh Evans 4. GOTO 2015 – Challenges Implementing Micro Services By Fred George 5. GOTO 2016 – From Monolith to Microservices at Zalando. By Rodrigue Scaefer 6. GOTO 2015 – Microservices @ Spotify. By Kevin Goldsmith 7. Modelling Microservices @ Spotify : https://www.youtube.com/watch?v=7XDA044tl8k 8. GOTO 2015 – DDD & Microservices: At last, Some Boundaries By Eric Evans 9. GOTO 2016 – What I wish I had known before Scaling Uber to 1000 Services. By Matt Ranney 10. DDD Europe – Tackling Complexity in the Heart of Software By Eric Evans, April 11, 2016 11. AWS re:Invent 2016 – From Monolithic to Microservices: Evolving Architecture Patterns. By Emerson L, Gilt D. Chiles 12. AWS 2017 – An overview of designing Microservices based Applications on AWS. By Peter Dalbhanjan 13. GOTO Jun, 2017 – Effective Microservices in a Data Centric World. By Randy Shoup. 14. GOTO July, 2017 – The Seven (more) Deadly Sins of Microservices. By Daniel Bryant 15. Sept, 2017 – Airbnb, From Monolith to Microservices: How to scale your Architecture. By Melanie Cubula 16. GOTO Sept, 2017 – Rethinking Microservices with Stateful Streams. By Ben Stopford. 17. GOTO 2017 – Microservices without Servers. By Glynn Bird.
  139. @arafkarsh arafkarsh References 159 Domain Driven Design 1. Oct 27,

    2012 What I have learned about DDD Since the book. By Eric Evans 2. Mar 19, 2013 Domain Driven Design By Eric Evans 3. Jun 02, 2015 Applied DDD in Java EE 7 and Open Source World 4. Aug 23, 2016 Domain Driven Design the Good Parts By Jimmy Bogard 5. Sep 22, 2016 GOTO 2015 – DDD & REST Domain Driven API’s for the Web. By Oliver Gierke 6. Jan 24, 2017 Spring Developer – Developing Micro Services with Aggregates. By Chris Richardson 7. May 17. 2017 DEVOXX – The Art of Discovering Bounded Contexts. By Nick Tune 8. Dec 21, 2019 What is DDD - Eric Evans - DDD Europe 2019. By Eric Evans 9. Oct 2, 2020 - Bounded Contexts - Eric Evans - DDD Europe 2020. By. Eric Evans 10. Oct 2, 2020 - DDD By Example - Paul Rayner - DDD Europe 2020. By Paul Rayner
  140. @arafkarsh arafkarsh References 160 Event Sourcing and CQRS 1. IBM:

    Event Driven Architecture – Mar 21, 2021 2. Martin Fowler: Event Driven Architecture – GOTO 2017 3. Greg Young: A Decade of DDD, Event Sourcing & CQRS – April 11, 2016 4. Nov 13, 2014 GOTO 2014 – Event Sourcing. By Greg Young 5. Mar 22, 2016 Building Micro Services with Event Sourcing and CQRS 6. Apr 15, 2016 YOW! Nights – Event Sourcing. By Martin Fowler 7. May 08, 2017 When Micro Services Meet Event Sourcing. By Vinicius Gomes
  141. @arafkarsh arafkarsh References 161 Kafka 1. Understanding Kafka 2. Understanding

    RabbitMQ 3. IBM: Apache Kafka – Sept 18, 2020 4. Confluent: Apache Kafka Fundamentals – April 25, 2020 5. Confluent: How Kafka Works – Aug 25, 2020 6. Confluent: How to integrate Kafka into your environment – Aug 25, 2020 7. Kafka Streams – Sept 4, 2021 8. Kafka: Processing Streaming Data with KSQL – Jul 16, 2018 9. Kafka: Processing Streaming Data with KSQL – Nov 28, 2019
  142. @arafkarsh arafkarsh References 162 Databases: Big Data / Cloud Databases

    1. Google: How to Choose the right database? 2. AWS: Choosing the right Database 3. IBM: NoSQL Vs. SQL 4. A Guide to NoSQL Databases 5. How does NoSQL Databases Work? 6. What is Better? SQL or NoSQL? 7. What is DBaaS? 8. NoSQL Concepts 9. Key Value Databases 10. Document Databases 11. Jun 29, 2012 – Google I/O 2012 - SQL vs NoSQL: Battle of the Backends 12. Feb 19, 2013 - Introduction to NoSQL • Martin Fowler • GOTO 2012 13. Jul 25, 2018 - SQL vs NoSQL or MySQL vs MongoDB 14. Oct 30, 2020 - Column vs Row Oriented Databases Explained 15. Dec 9, 2020 - How do NoSQL databases work? Simply Explained! 1. Graph Databases 2. Column Databases 3. Row Vs. Column Oriented Databases 4. Database Indexing Explained 5. MongoDB Indexing 6. AWS: DynamoDB Global Indexing 7. AWS: DynamoDB Local Indexing 8. Google Cloud Spanner 9. AWS: DynamoDB Design Patterns 10. Cloud Provider Database Comparisons 11. CockroachDB: When to use a Cloud DB?
  143. @arafkarsh arafkarsh References 163 Docker / Kubernetes / Istio 1.

    IBM: Virtual Machines and Containers 2. IBM: What is a Hypervisor? 3. IBM: Docker Vs. Kubernetes 4. IBM: Containerization Explained 5. IBM: Kubernetes Explained 6. IBM: Kubernetes Ingress in 5 Minutes 7. Microsoft: How Service Mesh works in Kubernetes 8. IBM: Istio Service Mesh Explained 9. IBM: Kubernetes and OpenShift 10. IBM: Kubernetes Operators 11. 10 Consideration for Kubernetes Deployments Istio – Metrics 1. Istio – Metrics 2. Monitoring Istio Mesh with Grafana 3. Visualize your Istio Service Mesh 4. Security and Monitoring with Istio 5. Observing Services using Prometheus, Grafana, Kiali 6. Istio Cookbook: Kiali Recipe 7. Kubernetes: Open Telemetry 8. Open Telemetry 9. How Prometheus works 10. IBM: Observability vs. Monitoring
  144. @arafkarsh arafkarsh References 164 1. Feb 6, 2020 – An

    introduction to TDD 2. Aug 14, 2019 – Component Software Testing 3. May 30, 2020 – What is Component Testing? 4. Apr 23, 2013 – Component Test By Martin Fowler 5. Jan 12, 2011 – Contract Testing By Martin Fowler 6. Jan 16, 2018 – Integration Testing By Martin Fowler 7. Testing Strategies in Microservices Architecture 8. Practical Test Pyramid By Ham Vocke Testing – TDD / BDD
  145. @arafkarsh arafkarsh 165 1. Simoorg : LinkedIn’s own failure inducer

    framework. It was designed to be easy to extend and most of the important components are plug‐ gable. 2. Pumba : A chaos testing and network emulation tool for Docker. 3. Chaos Lemur : Self-hostable application to randomly destroy virtual machines in a BOSH- managed environment, as an aid to resilience testing of high-availability systems. 4. Chaos Lambda : Randomly terminate AWS ASG instances during business hours. 5. Blockade : Docker-based utility for testing network failures and partitions in distributed applications. 6. Chaos-http-proxy : Introduces failures into HTTP requests via a proxy server. 7. Monkey-ops : Monkey-Ops is a simple service implemented in Go, which is deployed into an OpenShift V3.X and generates some chaos within it. Monkey-Ops seeks some OpenShift components like Pods or Deployment Configs and randomly terminates them. 8. Chaos Dingo : Chaos Dingo currently supports performing operations on Azure VMs and VMSS deployed to an Azure Resource Manager-based resource group. 9. Tugbot : Testing in Production (TiP) framework for Docker. Testing tools
  146. @arafkarsh arafkarsh References 166 CI / CD 1. What is

    Continuous Integration? 2. What is Continuous Delivery? 3. CI / CD Pipeline 4. What is CI / CD Pipeline? 5. CI / CD Explained 6. CI / CD Pipeline using Java Example Part 1 7. CI / CD Pipeline using Ansible Part 2 8. Declarative Pipeline vs Scripted Pipeline 9. Complete Jenkins Pipeline Tutorial 10. Common Pipeline Mistakes 11. CI / CD for a Docker Application
  147. @arafkarsh arafkarsh References 167 DevOps 1. IBM: What is DevOps?

    2. IBM: Cloud Native DevOps Explained 3. IBM: Application Transformation 4. IBM: Virtualization Explained 5. What is DevOps? Easy Way 6. DevOps?! How to become a DevOps Engineer??? 7. Amazon: https://www.youtube.com/watch?v=mBU3AJ3j1rg 8. NetFlix: https://www.youtube.com/watch?v=UTKIT6STSVM 9. DevOps and SRE: https://www.youtube.com/watch?v=uTEL8Ff1Zvk 10. SLI, SLO, SLA : https://www.youtube.com/watch?v=tEylFyxbDLE 11. DevOps and SRE : Risks and Budgets : https://www.youtube.com/watch?v=y2ILKr8kCJU 12. SRE @ Google: https://www.youtube.com/watch?v=d2wn_E1jxn4
  148. @arafkarsh arafkarsh References 168 1. Lewis, James, and Martin Fowler.

    “Microservices: A Definition of This New Architectural Term”, March 25, 2014. 2. Miller, Matt. “Innovate or Die: The Rise of Microservices”. e Wall Street Journal, October 5, 2015. 3. Newman, Sam. Building Microservices. O’Reilly Media, 2015. 4. Alagarasan, Vijay. “Seven Microservices Anti-patterns”, August 24, 2015. 5. Cockcroft, Adrian. “State of the Art in Microservices”, December 4, 2014. 6. Fowler, Martin. “Microservice Prerequisites”, August 28, 2014. 7. Fowler, Martin. “Microservice Tradeoffs”, July 1, 2015. 8. Humble, Jez. “Four Principles of Low-Risk Software Release”, February 16, 2012. 9. Zuul Edge Server, Ketan Gote, May 22, 2017 10. Ribbon, Hysterix using Spring Feign, Ketan Gote, May 22, 2017 11. Eureka Server with Spring Cloud, Ketan Gote, May 22, 2017 12. Apache Kafka, A Distributed Streaming Platform, Ketan Gote, May 20, 2017 13. Functional Reactive Programming, Araf Karsh Hamid, August 7, 2016 14. Enterprise Software Architectures, Araf Karsh Hamid, July 30, 2016 15. Docker and Linux Containers, Araf Karsh Hamid, April 28, 2015
  149. @arafkarsh arafkarsh References 169 16. MSDN – Microsoft https://msdn.microsoft.com/en-us/library/dn568103.aspx 17.

    Martin Fowler : CQRS – http://martinfowler.com/bliki/CQRS.html 18. Udi Dahan : CQRS – http://www.udidahan.com/2009/12/09/clarified-cqrs/ 19. Greg Young : CQRS - https://www.youtube.com/watch?v=JHGkaShoyNs 20. Bertrand Meyer – CQS - http://en.wikipedia.org/wiki/Bertrand_Meyer 21. CQS : http://en.wikipedia.org/wiki/Command–query_separation 22. CAP Theorem : http://en.wikipedia.org/wiki/CAP_theorem 23. CAP Theorem : http://www.julianbrowne.com/article/viewer/brewers-cap-theorem 24. CAP 12 years how the rules have changed 25. EBay Scalability Best Practices : http://www.infoq.com/articles/ebay-scalability-best-practices 26. Pat Helland (Amazon) : Life beyond distributed transactions 27. Stanford University: Rx https://www.youtube.com/watch?v=y9xudo3C1Cw 28. Princeton University: SAGAS (1987) Hector Garcia Molina / Kenneth Salem 29. Rx Observable : https://dzone.com/articles/using-rx-java-observable