Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to ArangoDB & Customer Data Platform

Introduction to ArangoDB & Customer Data Platform

Transforming your Customer Data Platform with Open Source Big Data Technology

Why ArangoDB ?
One database. One Query Language. Three data models. Endless Possibilities.

What are the cool features of ArangoDB which support to build CDP ?
Flexibility, scalability and advanced graph queries

Trieu Nguyen

October 12, 2019
Tweet

More Decks by Trieu Nguyen

Other Decks in Technology

Transcript

  1. Introduction to ArangoDB & Customer Data Platform Transforming your Customer

    Data Platform with Open Source Big Data Technology Presented by: Nguyễn Tấn Triều (Thomas) Founder of BigDataVietnam.org Email: [email protected] FB: https://facebook.com/tantrieuf31 Twitter: https://twitter.com/tantrieuf31
  2. About myself • Started BigDataVietnam.org as Knowledge Sharing Blog in

    2014 • Former Head of Platform at Blueseed Digital • Former Lead Software Engineer at FPT Telecom • Former Backend Engineer at Greengar • Former Backend developer at FPT Online Details at https://www.linkedin.com/in/tantrieuf31/
  3. AGENDA Why ArangoDB ? One database. One Query Language. Three

    data models. Endless Possibilities. Why Customer Data Platform (CDP) ? Introduction to USPA framework and customer data platform What are the cool features of ArangoDB which support to build CDP ? Flexibility, scalability and advanced graph queries DEMO with case studies 1. 2. 3. 4.
  4. The 21st century is the age of "Big Data". And

    Big Data need "use the right tool(s) for the job"
  5. AGENDA Why ArangoDB ? One database. One Query Language. Three

    data models. Endless Possibilities. Why Customer Data Platform (CDP) ? Introduction to USPA framework and customer data platform What are cool features of ArangoDB which support to build CDP ? Flexibility, scalability and advanced graph queries DEMO with case studies 1. 2. 3. 4.
  6. What is "Polyglot data persistence" ? Philosophy: "use the right

    tool(s) for the job" Assumption: specialized products are better suited than generic products Examples: • The original RDBMS with a relational data model for financial data (such as checkouts, invoices, refunds and more) and reporting. • MongoDB with a flexible document data model for product catalog. • Cassandra for high volume use cases such as real-time analytics (using Apache Spark) and user activity logs. • Riak key-value store for managing shopping carts. • Redis for managing user sessions and an in-memory cache for low latency reads. • Neo4J graph database for storing recommendations.
  7. Issues with polyglot data persistence • Requires learning, administering and

    maintaining multiple technologies • Needs custom scripts and / or application logic for shipping data from one system to the other, or for syncing systems • Potential atomicity and consistency issues across the different database systems (i.e. no transactions)
  8. ArangoDB is a native multi-model database, with support for key-values,

    documents, graphs and (recently added) search functionality
  9. Databases, collections, documents On the highest level, data in ArangoDB

    is organized in "databases" and "collections" (think "schemas" and "tables") • Collections are used to store "documents" of similar types • "Documents" are just JSON objects with arbitrary attributes, with optional nesting (sub-objects, sub-arrays) • There is no fixed schema for documents (any JSON object is valid)
  10. Homogeneous documents In the easiest case, documents in a collection

    are homogeneous (i.e. same attributes and types) Example use case: product categories { "_key" : "books", "title" : "Books" } { "_key" : "cam", "title" : "Camera products" } { "_key" : "kitchen", "title" : "Kitchen Appliances" } { "_key" : "toys", "title" : "Toys & Games" } Processing such data in AQL queries is as straightforward as with an SQL query on a relational, fixed-schema table
  11. AQL queries – hello world examples // SELECT c.* FROM

    categories c WHERE c._key IN ... FOR c IN categories FILTER c._key IN [ 'books', 'kitchen' ] RETURN c // SELECT c._key, c.title FROM categories c ORDER BY c.title FOR c IN categories SORT c.title RETURN { _key: c._key, title: c.title }
  12. Heterogeneous documents example { "_key" : "A053720452", "category" : "books",

    "name" : "Harry Potter and the Cursed Child", "author" : "Joanne K. Rowling", "isbn" : "978-0-7515-6535-5", "published" : 2016 } { "_key" : "ZB4061305X34", "category" : "toys", "name" : "Nerf N-Strike Elite Mega CycloneShock Blaster", "upc" : "630509278862", "colors" : [ "black", "red" ] }
  13. The graph data model • ArangoDB also supports the graph

    data model • Graph queries can reveal which documents are directly or indirectly connected to which other documents, and via what connections • Graphs are often used for data exploration, and to understand connections in the data
  14. Edges • In graphs, connections between documents are called "edges"

    • In ArangoDB edges are stored in "edge collections" • Edges have "_from" and "_to" attributes, which reference the connected vertices • Edges are always directed (_from -> _to), but can also be queried in opposite order
  15. Edge collection example Let's assume there are some "employees" documents

    like this: { "_key" : "sofia", "_id" : "employees/sofia" } { "_key" : "adam", "_id" : "employees/adam" } { "_key" : "sarah", "_id" : "employees/sarah" } { "_key" : "jon", "_id" : "employees/jon" } And there is an "isManagerOf" edge collection connecting them: { "_from" : "employees/sofia", "_to" : "employees/adam" } { "_from" : "employees/sofia", "_to" : "employees/sarah" } { "_from" : "employees/sarah", "_to" : "employees/jon" }
  16. AGENDA Why ArangoDB ? One database. One Query Language. Three

    data models. Endless Possibilities. Why Customer Data Platform (CDP) ? Introduction to USPA framework and customer data platform What are cool features of ArangoDB which support to build CDP ? Flexibility, scalability and advanced graph queries DEMO with case studies 1. 2. 3. 4.
  17. Customer Journey Analysis, Customer Need Analysis & Digital Marketing Analytics

    are the top requirements from business Source: Gartner Investment in customer analytics (Q04,Q05) exclude unsure, n=142
  18. Augmented/predictive analytics Increase analytics productivity to focus on better Customer

    Experience insights. Customer segmentation Identify, reach and communicate with specific groups of like-minded customers. Recommendation engines Increase retention and conversion by offering highly individualized digital experiences based on historical activity and preferences. Customer journey analytics & orchestration Reach the right customers at the right time and on the right channel to offer the optimal experience and maximize conversion Practical use cases of CDP for E-Commerce
  19. AGENDA Why ArangoDB ? One database. One Query Language. Three

    data models. Endless Possibilities. Why Customer Data Platform (CDP) ? Introduction to USPA framework and customer data platform What are cool features of ArangoDB which support to build CDP ? Flexibility, scalability and advanced graph queries DEMO with case studies 1. 2. 3. 4.
  20. In fact, CDP is also the data lake platform to

    store anything about customer profile
  21. Implicit data is gathered by predictive analytics. It also includes

    loyalty, length of relationship, purchasing history, and prior responses to marketing campaigns. Explicit data is easier to gather and analyze because it is usually provided to the company directly from the customer.
  22. CDP need native multi-model database 1. Graph data model can

    be used for customer journey management 2. Homogeneous document model can be used for customer profile 3. Flexibility of ArangoDB is the key to scale system easier 4. Scalability of ArangoDB can help business analytics grow faster Graph database Document + Key-Value
  23. AGENDA Why ArangoDB ? One database. One Query Language. Three

    data models. Endless Possibilities. Why Customer Data Platform (CDP) ? Introduction to USPA framework and customer data platform What are cool features of ArangoDB which support to build CDP ? Flexibility, scalability and advanced graph queries DEMO with case studies 1. 2. 3. 4.
  24. User Story 1. You need to do social analytics to

    find key trends 2. You develop a social media crawler to crawl public data from FaceBook API 3. After crawling, you have 988 records in the collection “fb_feeds” The key task: classify all feeds into 3 segments 1. High value: top trending feeds that have more than 1000 likes 2. At risk: top feed with users that have any ANGRY reaction 3. Sell opportunity: top feed with users that have LOVE reaction
  25. User Story 1. An e-commerce website needs to track all

    data points in customer journey. 2. A web developer put a CDP JavaScript tag into website for collecting data. 3. After a week, they have a data collection for customer journey analytics The key task: classify all customer profiles into 3 segments 1. High value: spending more than 100 USD in a week 2. At risk: take web browsing more than 5 minutes but no order and do log-out 3. Cross-sell opportunity: is a student and "place an order" for things like "textbook"
  26. ArangoDB: One engine. One query language. Multiple models For more

    information: https://www.arangodb.com/arangodb-training-center/first-day/