Slide 1

Slide 1 text

Exploring NoSQL Polyglot Persistence with Java

Slide 2

Slide 2 text

Otávio Santana @otaviosantana Welcome Topics Persistence introduction ○ Persistence challenges ○ Persistence Universe Jakarta EE ○ JAR-RS ○ CDI ○ Bean Validation NoSQL databases ○ Key-value (Redis) ○ Wide-Column (Apache Cassandra) ○ Document (MongoDB) ○ Graph (Neo4J) [Bonus]

Slide 3

Slide 3 text

Otávio Santana @otaviosantana Setup

Slide 4

Slide 4 text

Otávio Santana @otaviosantana Requirements https://o-s-expert.github.io/polyglot-nosql/ ✔ JDK 17 + installed ✔ Modern IDE (Intellij, VSCode, etc...) ✔ Git ✔ Docker ✔ Docker Compose Setup

Slide 5

Slide 5 text

Otávio Santana @otaviosantana Welcome Introduce yourself ✔ What is your name? ✔ Where are you from? ✔ What is your Java and Database experience? ✔ Fun fact about you.

Slide 6

Slide 6 text

Otávio Santana @otaviosantana NoSQL Introduction

Slide 7

Slide 7 text

Otávio Santana @otaviosantana Endures knowledge Throughout humanity Temples, Caves as database Why do we need databases?

Slide 8

Slide 8 text

Otávio Santana @otaviosantana Database evolutions Why do we need them?

Slide 9

Slide 9 text

Otávio Santana @otaviosantana Why Do Modern Applications Need Data Storage? The new opportunities The current opportunity It's where the state is Business rules (database?)

Slide 10

Slide 10 text

Otávio Santana @otaviosantana The data cost Challenge has changed

Slide 11

Slide 11 text

Otávio Santana @otaviosantana The time is the new cost Nobody wants to wait

Slide 12

Slide 12 text

Otávio Santana @otaviosantana Joins vs. data volume Everything has trade-offs, including normalization https://medium.com/@benmorel/to-join-or-not-to-join-bba9c1377c10

Slide 13

Slide 13 text

Otávio Santana @otaviosantana key value key key key value value value Wide-Column Graph Document Key Value NoSQL Database SQL Database Database solutions Polyglot Persistence

Slide 14

Slide 14 text

Maturity Model Database flavors Paradigms Persistence landscape State of affairs https://survey.stackoverflow.co/2023/

Slide 15

Slide 15 text

Microservices vs Database Database agonistic Trade-offs Persistence landscape State of affairs https://survey.stackoverflow.co/2023/

Slide 16

Slide 16 text

Otávio Santana @otaviosantana Development lifecycle very often starts with immature data structures Changing a database type in existing applications is complex and expensive. Over time, maintenance of data and schema evolution gets challenging First thing we do Persistence landscape Change Maintenance State of affairs Evolutionary Data

Slide 17

Slide 17 text

Otávio Santana @otaviosantana Application (Object Oriented Language) Mismatch Database (Relational) Object Challenges in a database land Different paradigms: Apps x DBMS Tables

Slide 18

Slide 18 text

Otávio Santana @otaviosantana Mapping Mismatch 1 * Addresses Inheritance Polymorphism Encapsulation Types Normalization Denormalization Structure Application Database key val ue key key key valu e val ue val ue wide-Column Graph Document Key Value Challenges in a database land Different paradigms: Apps x DBMS

Slide 19

Slide 19 text

Otávio Santana @otaviosantana Driver Data Mapper Database integration Data Mapping Handling Data Integration in Java Active Record Repository

Slide 20

Slide 20 text

Otávio Santana @otaviosantana Driver Database integration Data Mapping Database Application Data Mapper Active Record Repository DAO Data-oriented Programing Object Oriented Programming

Slide 21

Slide 21 text

Otávio Santana @otaviosantana Database-oriented Programming 4 Principles Database Data-oriented Programing 1. Separating code (behavior) from data. 2. Representing data with generic data structures. 3. Treating data as immutable. 4. Separating data schema from data representation.

Slide 22

Slide 22 text

Otávio Santana @otaviosantana Object Oriented Programming Principles Application Object Oriented Programming 1. Expose behavior 2. Hide data 3. Explore Abstraction 4. Use of layers and Modules

Slide 23

Slide 23 text

Otávio Santana @otaviosantana Driver Data Mapping try(Connection conn = DriverManager.getConnection(DB_URL, USER, PASS){ Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(QUERY);) { // Extract data from result set while (rs.next()) { // Retrieve by column name System.out.print("ID: " + rs.getInt("id")); System.out.print(",name: " + rs.getInt("name")); System.out.print(",birthday: " + rs.getString("birthday")); System.out.println(",city: " + rs.getString("city")); System.out.println(",street: " + rs.getString("street")); …} } Complex relation w/ business logic Code flexibility

Slide 24

Slide 24 text

Otávio Santana @otaviosantana Data Mapper Data Mapping @Entity public class Person { @Id @GeneratedValue(strategy = AUTO) Long id; String name; LocalDate birthday; @ManyToOne List
address; … } impedance mismatch centralize mapper responsibility public class PersonRowMapper implements RowMapper { @Override public Person mapRow(ResultSet rs, int rowNum) throws SQLException { Person person = new Person(); person.setId(rs.getInt("ID")); return person; } }

Slide 25

Slide 25 text

Otávio Santana @otaviosantana Data Access Object Data Mapping public interface PersonDAO { List getAll(); void update(Person person); void delete(Person person); void insert(Person person); } Centralize Data operations impedance mismatch

Slide 26

Slide 26 text

Otávio Santana @otaviosantana Active Record Data Mapping @Entity public class Person extends PanacheEntity { public String name; public LocalDate birthday; public List
addresses; } Person person =...; // persist it person.persist(); List people = Person.listAll(); // finding a specific person by ID person = Person.findById(personId); SOLID breaking Higher domain's responsibility

Slide 27

Slide 27 text

Otávio Santana @otaviosantana Repository Data Mapping Far from database Domain oriented @Entity public class Person { private @Id Long id; private @Column String name; private @Column LocalDate birthday; private @ManyToOne List
addresses; } public interface PersonRepository extends {} Person person =...; // persist it repository.save(person); List people = repository.findAll(); // finding a specific person by ID person = repository.findById(personId);

Slide 28

Slide 28 text

Otávio Santana @otaviosantana Database integration Data Mapping Client Database Database Client Mapper DAO Repository Data mapping and conversion

Slide 29

Slide 29 text

Otávio Santana @otaviosantana Flexibility vs Complexity, Use with Caution Database Layers

Slide 30

Slide 30 text

Otávio Santana @otaviosantana DTO Entity Resource DTO Flexibility vs Complexity, Use with Caution Database

Slide 31

Slide 31 text

Otávio Santana @otaviosantana Application NoSQL database Java Framework Boilerplate Communications There is no standard Particular behavior matters Challenges of Persistence In the Java Landscape

Slide 32

Slide 32 text

Otávio Santana @otaviosantana Persistence Frameworks There is no standard! Driver Proximity Types & Trade-offs in Frameworks Usability Mapper Agnostic Specific Executability Legibility Declarative Imperative Reflectionless Reflection

Slide 33

Slide 33 text

Otávio Santana @otaviosantana Goals Paradigms Business Isolation Performance

Slide 34

Slide 34 text

Otávio Santana @otaviosantana The rules of Software Architecture Everything in Software Architecture is a trade-off Why is more important than How

Slide 35

Slide 35 text

Otávio Santana @otaviosantana key value key key key value value value Wide-Column Graph Document Key Value NoSQL types Defined by structure

Slide 36

Slide 36 text

Otávio Santana @otaviosantana Database metrics Biz. transactions Oversized / downsized Invalid/stale connections Apps on await state Fixed cache size No cache usage Consistency impacts w/ distributed cache Complex mapping Auto-generated schemas On-prem x Cloud Many NoSQL types SQL x NoSQL x NewSQL Eager x Lazy loading N+1 Problem Hard to change db types Persistence Config. Data storage Data Manipulation Cache Conn. Pool Framework

Slide 37

Slide 37 text

Otávio Santana @otaviosantana Apollo Key-value Structure Ares Love Beauty War Sun Aphrodite Key Value

Slide 38

Slide 38 text

Otávio Santana @otaviosantana Criteria Key-value database Relational database Data structure Key-Value Pairs Tables with rows/columns Query flexibility Limited (Lookup by Key) Complex (SQL queries) Scalability Excellent Good Schema flexibility Limited Highly flexible Relationships Minimal Richly defined ACID compliance Varied (Depends on DB) Strong Key-value Key-value vs. Relational database

Slide 39

Slide 39 text

Otávio Santana @otaviosantana Apollo Aphrodite Ares Kratos Duty Dead Gods Love, happy Sun War 13 Color Sword Row-key Columns Wide-Column Structure Duty Duty weapon

Slide 40

Slide 40 text

Otávio Santana @otaviosantana Criteria Wide-column database Relational database Data Structure Columns within Rows Tables with Rows/Columns Query Flexibility Flexible Complex (SQL Queries) Scalability Excellent Good Schema Flexibility High Highly Flexible Relationships Limited Richly Defined ACID Compliance Varies (Depends on DB) Strong Wide-Column Wide-Column vs. Relational database

Slide 41

Slide 41 text

Otávio Santana @otaviosantana { "name":"Diana", "duty":[ "Hunt", "Moon", "Nature" ], "siblings":{ "Apollo":"brother" } } Document Structure

Slide 42

Slide 42 text

Otávio Santana @otaviosantana Criteria Document database Relational database Data structure Structured documents Tables with rows/columns Query flexibility High (Document-level) Complex (SQL queries) Scalability Excellent Good Schema flexibility High flexible Not flexible Relationships Limited (Embedded) Richly defined ACID compliance Strong Strong Document Document vs. Relational database

Slide 43

Slide 43 text

Otávio Santana @otaviosantana Apollo Ares Kratos was killed by was killed by killed killed Graph Structure

Slide 44

Slide 44 text

Otávio Santana @otaviosantana Criteria Graph database Relational database Data structure Nodes and relationships Tables with rows/columns Query flexibility Excellent Complex (SQL queries) Scalability Good Good Schema flexibility Moderate Highly flexible Relationships Core strength Core strength ACID compliance Strong Strong Graph Graph vs. Relational database

Slide 45

Slide 45 text

Otávio Santana @otaviosantana Database architecture Developer perspective Flexibility vs Scalability Database Replication Partitioning Schemaless vs schema Normalization vs Denormalization

Slide 46

Slide 46 text

Otávio Santana @otaviosantana Scalability vs Flexibility Query and speed Scalability Flexibility key-value Wide-Column Document Graph Time-series

Slide 47

Slide 47 text

Otávio Santana @otaviosantana Masterless Database replication

Slide 48

Slide 48 text

Otávio Santana @otaviosantana Master-slave (leader-follow) Database replication

Slide 49

Slide 49 text

Otávio Santana @otaviosantana Partitioning Type Characteristics Benefits Considerations Hash-based Partitioning Data distributed based on hash function Even data distribution Limited range queries Range-based Partitioning Data divided by predefined value ranges Suitable for time-based data Data skew in uneven ranges Directory-based Partitioning Central directory maps data to partitions Control over data distribution Dependency on directory performance Composite Partitioning Combination of multiple partitioning strategies Adaptable to complex data models Complexity in design and management Partitioning Modeling impact

Slide 50

Slide 50 text

Otávio Santana @otaviosantana Aspect Schema Approach Schemaless Approach Structure Flexibility Rigid structure prescribed for data consistency Dynamic structure, adaptable to changing data needs Data Evolution It may require schema changes for evolving data Easily accommodates evolving and diverse data Read Performance Optimized for complex queries with joins Maximized read efficiency due to denormalization Write Performance Affected by complex joins and constraints Improved write efficiency due to simplified joins Use Case Examples Financial systems, ERP applications Social media platforms, content-sharing platforms Schema vs. Schemaless Data structure

Slide 51

Slide 51 text

Otávio Santana @otaviosantana Aspect Normalization Denormalization Goal Reduce data redundancy and ensure data integrity Enhance read performance and simplify data retrieval Data structure Split data into related tables Combine data into fewer tables Joins Often requires complex joins for data retrieval Minimizes or eliminates joins for faster reads Storage efficiency This may lead to better storage optimization This can result in higher storage consumption Insert, update anomalies Minimized due to distributed data Insert and update anomalies may arise Query performance Might suffer due to frequent joins Generally faster query performance due to denormalization Write performance Writes can be faster due to fewer tables Might be affected due to denormalization Normalization vs. Denormalization Data structure

Slide 52

Slide 52 text

Otávio Santana @otaviosantana Quiz Time

Slide 53

Slide 53 text

Otávio Santana @otaviosantana Jakarta EE Overview

Slide 54

Slide 54 text

Otávio Santana @otaviosantana Jakarta EE

Slide 55

Slide 55 text

Otávio Santana @otaviosantana Microprofile

Slide 56

Slide 56 text

Otávio Santana @otaviosantana Bean Validation Entity Conversor to/from Entity Jakarta Validation Framework integration Is valid? Database Exception

Slide 57

Slide 57 text

Otávio Santana @otaviosantana Lab Time Bean Validation

Slide 58

Slide 58 text

Otávio Santana @otaviosantana CDI Request Request Request Request Request Request Request Request conversation conversation session session Application

Slide 59

Slide 59 text

Otávio Santana @otaviosantana CDI Resource Description and Use in a NoSQL Application Injection Automatically injects dependencies into managed beans. Qualifier Differentiates between different NoSQL implementations. Produces Controls for the creation of NoSQL-related bean instances. Disposes Manages the disposal of NoSQL-related resources. Event Enables communication of NoSQL data changes across components. Decorator Extends or modifies the behavior of NoSQL-related methods. Interceptor Intercepts and modifies NoSQL interaction method calls. CDI Features

Slide 60

Slide 60 text

Otávio Santana @otaviosantana Lab Time CDI

Slide 61

Slide 61 text

Otávio Santana @otaviosantana JAX-RS get post put delete Response storage Application server

Slide 62

Slide 62 text

Otávio Santana @otaviosantana JAX-RS Several communications

Slide 63

Slide 63 text

Otávio Santana @otaviosantana Lab Time JAX-RS

Slide 64

Slide 64 text

Otávio Santana @otaviosantana BaseDocument baseDocument = new BaseDocument(); baseDocument.addAttribute(name, value); Document document = new Document(); document.append(name, value); JsonObject jsonObject = JsonObject.create(); jsonObject.put(name, value); ODocument document = new ODocument(“collection”); document.field(name, value); Jakarta NoSQL Main motivation

Slide 65

Slide 65 text

Otávio Santana @otaviosantana @Entity record Book(@Id String id, @Column("name") String name) { } Jakarta NoSQL Main motivation

Slide 66

Slide 66 text

Otávio Santana @otaviosantana Jakarta NoSQL Template interface @Inject Template template; template.insert(book);

Slide 67

Slide 67 text

Otávio Santana @otaviosantana Specializations Particular behavior matters

Slide 68

Slide 68 text

Otávio Santana @otaviosantana Jakarta Data Motivation

Slide 69

Slide 69 text

Otávio Santana @otaviosantana @Repository public interface CarRepository extends CrudRepository { List findByType(CarType type); Optional findByName(String name); } Jakarta Data Repository

Slide 70

Slide 70 text

Otávio Santana @otaviosantana @Inject CarRepository repository; ... Car ferrari = Car.id(10L).name("Ferrari").type(CarType.SPORT); repository.save(ferrari); Jakarta Data Sample code

Slide 71

Slide 71 text

Otávio Santana @otaviosantana @Repository public interface ProductRepository extends PageableRepository { Page findByTypeOrderByName(CarType type, Pageable pageable); } PageableRepository Pagination

Slide 72

Slide 72 text

Otávio Santana @otaviosantana Repository Built-in repository

Slide 73

Slide 73 text

Otávio Santana @otaviosantana NoSQL databases

Slide 74

Slide 74 text

Otávio Santana @otaviosantana Use Case Description Caching Speeds up data retrieval by storing frequently accessed data in memory, reducing the need to fetch it from the backend. Session Management Stores and manages user sessions, enhancing application performance and scalability. Real-time Analytics Powers real-time dashboards and analytics by swiftly processing and aggregating data. Pub/Sub Messaging Facilitates real-time communication between components through publish-subscribe messaging patterns. Redis User cases

Slide 75

Slide 75 text

Otávio Santana @otaviosantana Redis Do and Don'ts Simplify Schema Identify Key Patterns Use Contextual Keys Batching Operations Cache-friendly Design Avoid Complex Joins Beware of Over-Normalization Avoid Bloated Values Limit Indexes

Slide 76

Slide 76 text

Otávio Santana @otaviosantana Redis Clusters

Slide 77

Slide 77 text

Otávio Santana @otaviosantana Lab Time

Slide 78

Slide 78 text

Otávio Santana @otaviosantana Apache Cassandra User Cases Use Case Description Online Retail Managing e-commerce platforms, handling product catalogs, user profiles, and transaction records. Social Media Analytics Monitoring and analyzing social media interactions, tracking trends, and user engagement. Real-Time Analytics Enabling quick analysis of data streams for instant insights, critical in financial and logistics sectors. Logging and Monitoring Centralized storage and analysis of logs and monitoring data from applications and servers.

Slide 79

Slide 79 text

Otávio Santana @otaviosantana Apache Cassandra Modeling tips Denormalize When Necessary Design for Query Patterns Choose Optimal Column Families Utilize Secondary Indexes Compression and Bloom Filters Avoid Over-Denormalization Limit Column Count Avoid Overusing Secondary Indexes

Slide 80

Slide 80 text

Otávio Santana @otaviosantana 1 2 3 4 5 6 7 8 9 11 12 10 Client R1 R2 R3 1 2 3 4 5 6 7 8 9 11 12 10 R4 R5 R6 DC1 DC2 Apache Cassandra Cluster

Slide 81

Slide 81 text

Otávio Santana @otaviosantana Jim Car: Camaro Age: 32 Carol Color: Pink Work: Hobby Suzy Team: Bahia Country: USA B[4-8] Jim 1 Carol 13 Suzy 15 A[0-3] D[14-18] C[9-13] Apache Cassandra Partitioning

Slide 82

Slide 82 text

Otávio Santana @otaviosantana Lab Time

Slide 83

Slide 83 text

Otávio Santana @otaviosantana MongoDB User Cases Use Case Description Content Management Store and manage diverse content types like articles, images, and videos with schema flexibility. Catalog Management Efficiently organize and categorize products or items with changing attributes and metadata. Mobile Applications Provide offline capabilities and sync data seamlessly once online, enhancing user experience. Social Media Platforms Facilitate rapid storage and retrieval of user-generated content, profiles, and social interactions.

Slide 84

Slide 84 text

Otávio Santana @otaviosantana MongoDB Modeling tips Leverage Embedded Documents Design Around Use Cases Employ Indexes Judiciously Normalize When Logical Stay Flexible with Arrays Avoid Over-Embedding Steer Clear of Monolithic Documents Don’t Over-Index Beware of the One-Size-Fits-All Approach

Slide 85

Slide 85 text

Otávio Santana @otaviosantana MongoDB Cluster

Slide 86

Slide 86 text

Otávio Santana @otaviosantana MongoDB Cluster

Slide 87

Slide 87 text

Otávio Santana @otaviosantana Lab Time

Slide 88

Slide 88 text

Otávio Santana @otaviosantana Neo4J User Cases Use Case Description Social Networks Modeling user profiles, friendships, and interactions for effective social networking platforms. Recommendation Engines Powering personalized recommendations by analyzing connections and preferences. Knowledge Graphs Organizing and querying complex relationships in fields like healthcare, finance, and research. Fraud Detection Uncovering hidden patterns and connections indicative of fraudulent activities. Network Analysis Analyzing intricate relationships in data networks, such as transportation and communication.

Slide 89

Slide 89 text

Otávio Santana @otaviosantana Neo4J Modeling tips Avoid Over-Reliance on Joins Steer Clear of Over-Connecting Nodes Don’t Overcomplicate Graph Design Beware of Property Index Overuse Embrace Relationship-Driven Design Craft Efficient Traversal Paths Use Labels and Types Wisely Leverage Indexing for Nodes Utilize Property Indexes

Slide 90

Slide 90 text

Otávio Santana @otaviosantana Leader Follower Follower Read Replica Read Replica Neo4J Modeling tips

Slide 91

Slide 91 text

Otávio Santana @otaviosantana Lab Time

Slide 92

Slide 92 text

Otávio Santana @otaviosantana Conclusions Final considerations

Slide 93

Slide 93 text

Otávio Santana @otaviosantana Books recommendations NoSQL introduction https://bpbonline.com/products/java-persistence-with-nosql coupon code: Otavio Discount: 20%

Slide 94

Slide 94 text

Otávio Santana @otaviosantana Books recommendations NoSQL introduction

Slide 95

Slide 95 text

Otávio Santana @otaviosantana Books recommendations NoSQL introduction

Slide 96

Slide 96 text

Otavio Santana Software Engineer & Architect @otaviojava Java Champion, Oracle ACE JCP-EC-EG-EGL Apache and Eclipse Committer Jakarta EE and MicroProfile Duke Choice Award JCP Award Book and blog writer Who am I?

Slide 97

Slide 97 text

Elias Nogueira Principal Software Engineer in Test @eliasnogueira Java Champion, Oracle ACE JCP-EC-EG-EGL Apache and Eclipse Committer Jakarta EE and MicroProfile Duke Choice Award JCP Award Book and blog writer Who am I?

Slide 98

Slide 98 text

Thank you! Otávio Santana Software Engineer & Architect @otaviojava