Slide 1

Slide 1 text

1 Azure Saturday 2018 Cosmos DB, Graph and Azure Search, building a compelling cloud solution Steef-Jan Wiggers| Codit

Slide 2

Slide 2 text

2 Azure Saturday 2018 Thank you, sponsors!

Slide 3

Slide 3 text

[email protected] +31 653 12 29 57 @SteefJan nl.linkedin.com/in/steefjan

Slide 4

Slide 4 text

Not my first visit to Munich!

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

The next race …

Slide 7

Slide 7 text

Who we are Customers Entities 2000 Belgium 2004 France 2013 Portugal 2016 Switzerland 2016 UK 2016 The Netherlands 2017 Malta 180 worldwide

Slide 8

Slide 8 text

Codit & Microsoft

Slide 9

Slide 9 text

DATA & AI What the future will bring

Slide 10

Slide 10 text

What can you expected in this session • Scenario • Cosmos DB • Graph model • Azure Search • Demo

Slide 11

Slide 11 text

Scenario

Slide 12

Slide 12 text

Scenario • Old knowledge base implementation with SOLR struggled with related content • Dependency on hosting - and service partner • High cohesion and tight coupling Knowledge Platform Tax Returns

Slide 13

Slide 13 text

Business: • Increase quality of content • Better user experience • New business models & revenue streams • Independent Technology: • Completely PaaS • Azure (Pay as you go) • No IT management support only DevOps • Independent A new future proof knowledge base

Slide 14

Slide 14 text

High Level Architecture Editing CMS Content Creation Knowledge Platform Integrate Cleanse Match & Merge Connectors Ingestion Standardize Validation Meta data Enrichment Value decay XRef Data quality, workflows & monitoring Content Constitution Content collection Relation Store Index Search Models Content API Knowledge base

Slide 15

Slide 15 text

Solution Building Blocks Content collection Azure Search Document DB Graph Integrate Match & Merge Content API Importer/.NET Web App/.Net Search Index Relations Store

Slide 16

Slide 16 text

Solution Architecture CMS Output FRONT-END Content Related Content

Slide 17

Slide 17 text

• What are the costs (ROI, TCO) • Will it work for content and related content (Quality) • Meet business requirements • New revenue streams, business models Business challenges

Slide 18

Slide 18 text

• Architectural Fit – POA • Microsoft Support • Training • Compare with other search solutions, and graph solutions • Performance • Scale • Complexity Technology challenges

Slide 19

Slide 19 text

Cosmos DB

Slide 20

Slide 20 text

Cosmos DB Column-family Document Graph Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Table API Key-value MongoDB API A globally distributed, massively scalable, multi-model database service Cassandra API

Slide 21

Slide 21 text

• Turnkey global distribution • Multiple datacenters • Auto replication • 99,99% Availability • All resources are horizontally partitioned and vertically distributed • Replication topology is dynamic based on consistency level and network conditions Global Distribution

Slide 22

Slide 22 text

Multi-model + multi-API • Different models: • Graph • Key-Value • Document DB • No schema or index management • Automatic indexing • API support: • SQL • JavaScript • Gremlin • MongoDB • Azure Table Storage • Cassandra

Slide 23

Slide 23 text

Scale • Pay as go for storage and throughput • Elastic scale across regions • Partitions

Slide 24

Slide 24 text

• Five levels of consistency • Programmatically change at anytime • Can be overridden on a per-request basis • Writing correct distribution applications is hard • Global distribution forces CAP theorem • Intuitive and practical with clear PACELC tradeoffs Consistency

Slide 25

Slide 25 text

• Simultaneously read/write Latency

Slide 26

Slide 26 text

SLA • Fully managed service • 99,99% SLA for latency • Guaranteed throughput, consistency and high availability

Slide 27

Slide 27 text

Request Units • Request Units (RU) is a rate-based currency • Abstracts physical resources for performing requests • 1 RU = 1 read of 1 Kb document • Each request consumes fixed Rus • Provisioned in terms of RU/sec and RU/min • Rate limiting based on provisioned throughput • Can be in- and decreased instantly • Metered hourly

Slide 28

Slide 28 text

Capacity Planner

Slide 29

Slide 29 text

• Azure Search • Apache Spark Connector • Azure Functions Integrations

Slide 30

Slide 30 text

Graph Model

Slide 31

Slide 31 text

• TinkerPop is a developer group creating an open-source stack for graphs (http://tinkerpop.apache.org/) • Graph database and analytics systems Graph

Slide 32

Slide 32 text

Data is your model, model is your data

Slide 33

Slide 33 text

• LinkedIn (Business) • Facebook (Social Media) • Walmart (Recommendation) • Google (Search) • Airbnb (Search) • Cisco (Master Management) Graph implementation examples

Slide 34

Slide 34 text

• Gremlin is a graph traversal language • Vertices, Edges, and Properties Graph API

Slide 35

Slide 35 text

Add Vertices (V)

Slide 36

Slide 36 text

Add Edge (E)

Slide 37

Slide 37 text

Execute Gremlin

Slide 38

Slide 38 text

Demo Graph Explorer

Slide 39

Slide 39 text

Azure Search

Slide 40

Slide 40 text

Azure Search

Slide 41

Slide 41 text

“Search service” ▪ Scope for capacity ▪ Bound to a region ▪ Has keys, indexes, indexers, data sources Provisioning ▪ Azure Portal ▪ Azure resource management API Elastic scale ▪ Capacity can be changed dynamically ▪ Replicas ~ more QPS, HA ▪ Partitions ~ more documents, write throughput Provisioning

Slide 42

Slide 42 text

“Index” ▪ Container for data, think “table” ▪ Has schema, CORS options, search options ▪ Create in portal or during app initialization Typical schema ▪ Fields definition: name, type, key Search specifics ▪ Field attributes – searchable, facetable, etc. ▪ Linguistics and analysis ▪ Suggesters for auto-complete ▪ Scoring profiles for ranking tuning Index

Slide 43

Slide 43 text

Push - using indexing API ▪ POST to /indexes//docs/index ▪ Up to 1000 actions per batch ▪ Actions can be upload, merge, delete, etc. ▪ WebJobs are great for regular execution Pull - using indexers ▪ Azure SQL DB and Document DB ▪ Change detection, deletion markers ▪ Point it at the data source, define policy, done Index data

Slide 44

Slide 44 text

Search + typical data operations ▪ Simple search options, + - * () “” ▪ Filter, sort, project, page over results ▪ Options work with search and suggest Search from client or server ▪ Use query keys when searching from clients ▪ CORS allows direct calls from browsers Render from search results ▪ Include necessary non-searchable data ▪ E.g. URLs for pictures, keys to main content Search

Slide 45

Slide 45 text

Scoring profiles ▪ Field weights ▪ Scoring functions ▪ magnitude, freshness, distance, tags 3 main patterns ▪ Known data directly available in the index ▪ Personalization using tag boosting ▪ Analytics, compute externally and push to the index Customization

Slide 46

Slide 46 text

Integrate with Azure Search

Slide 47

Slide 47 text

Integrate with Azure Search

Slide 48

Slide 48 text

Demo Search

Slide 49

Slide 49 text

Scenario Revisted

Slide 50

Slide 50 text

Development Content collection Azure Search Document DB Graph Integrate Match & Merge Content API Importer/.NET Web App/.Net Search Index Relations Store

Slide 51

Slide 51 text

Solution Architecture CMS Output FRONT-END Content Related Content

Slide 52

Slide 52 text

Search Content FRONT-END MENU BAR Content Dividend tax SEARCH Related Content Related Content Content B A A

Slide 53

Slide 53 text

Demo API

Slide 54

Slide 54 text

• Cosmos DB Graph fit for related content purpose • Cosmos DB Document fit for content • Cosmos DB + Search good combination • Meets business requirements • Complete Architecture on PaaS • Cool eh! Summary

Slide 55

Slide 55 text

• CosmosDB • CosmosDB Graph Explorer • Azure Search • Azure Search Resources Resources

Slide 56

Slide 56 text

Azure Saturday 2018 We appreciate your feedback! https://form.responster.com/mWJ1VI

Slide 57

Slide 57 text

Azure Saturday 2018 Thank you!