Slide 1

Slide 1 text

1 Is Data Warehousing Dead? UKOUG Technology Conference & Exhibition 2017

Slide 2

Slide 2 text

2 Google's pizza • Hello! Gordon's pizza? o No sir it's Google's pizza. • So it's a wrong number? Sorry. o No sir, Google bought it. • OK. Take my order please o Well sir, you want the usual? • The usual? You know me? o According to our caller ID data sheet, in the last 12 times, you ordered pizza with cheeses, sausage, thick crust. • OK! This is it ... o May I suggest to you this time ricotta, arugula with dry tomato.? • What? I hate vegetables.

Slide 3

Slide 3 text

3 Google's pizza o Your cholesterol is not good, sir. • How do you know? o We crossed the number of your fixed line ☎with your name, through the subscribers guide. We have the result of your blood tests for the last 7 years. • Okay, but I do not want this pizza! I already take medicine ... o Excuse me, but you have not taken the medicine regularly. From our commercial database, 4 months ago, you only purchased a box with 30 cholesterol tablets at Drugsale Network. • I bought more from another drugstore. o It's not showing on your credit card statement o I paid in cash

Slide 4

Slide 4 text

4 Google's pizza o But you did not withdraw that much cash according to your bank statement • I have have other source of cash o This is not showing as per you last Tax form unless you bought them from undeclared income source. • WHAT THE HELL? o I'm sorry, sir, we use such information only with the intention of helping you. • Enough! I'm sick of Google, Facebook, Twitter, WhatsApp. I'm going to an Island without internet, cable TV, where there is no cell phone line and no one to watch me or spy on me o I understand sir but you need to renew your passport first as it has expired 5 weeks ago

Slide 5

Slide 5 text

5 Introduction Quistor: Your Business Analytics Partner of Choice Customers Worldwide 150+ Analytics & Big Data 12Years In Business Value Propositions 4 Delivery Centers 170 Employees 10 European Offices 35y Average Age Oracle Platinum Partner Managed Services JD Edwards Digital 24 7 Cloud ExaHotel

Slide 6

Slide 6 text

6 Who am I? http://www.daanalytics.nl https://twitter.com/daanbakboord https://nl.linkedin.com/in/daanbakboord Daan Bakboord • Oracle Big Data Anlytics Consultant @ Quistor – Oracle BI EE (OBIEE) – Oracle Analytics Cloud (OAC, BICS) – Oracle Data Visualization – Oracle Big Data – Oracle BI Applications (OBIA) • Information Architecture – TOGAF – Archimate http://blog.daanalytics.nl #obihackers nl.OUG BIWA SIG Lead

Slide 7

Slide 7 text

7 Bloodhound: The 1,000 Mph Car, Powered by Data ”Oracle is helping Bloodhound smash the land speed record and reach 1,000 mph” http://bit.ly/oracle_bloodhound • Artificial Intelligence • Augmented Reality • Geo Spatial Technology • Data Visualisation • Virtual Reality • Real Time Education

Slide 8

Slide 8 text

8 Data Explosion Mobile Social Internet of Things

Slide 9

Slide 9 text

9 • Challenge – Massive amounts of storage to indexing the entire web – Process large amounts of data requires a new approach • Solution – GFS, the Google File System o Described in a paper released in 2003 – Distributed MapReduce o Described in a paper released in 2004

Slide 10

Slide 10 text

10 What’s Hadoop? Hadoop is a Software Framework for Storing, Processing and Analyzing Big Data • Distributed • Scalable • Fault-tolerant • Open Source

Slide 11

Slide 11 text

11 Core Hadoop Core Hadoop • Distributed File System HDFS – Stores data • Hadoop MapReduce – Processes data • Hadoop Yarn – Schedules work Hadoop Eco System

Slide 12

Slide 12 text

12 ”Data Warehousing is Dead?” Data Management Reservoir Factory Warehouse

Slide 13

Slide 13 text

13 The Data Warehouse • Defenition (Bill Inmon) – "A (Data) Warehouse is a subject- oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.“ William H. Inmon (1990) A Data Warehouse is NOT the ultimate goal; It servers to support the ‘decision-making proces’. This process should be clear first before you can make a definitive decision about the development of the Data Warehouse.

Slide 14

Slide 14 text

14 Challenges Business • What do we want with Analytics? • Uniformed Data Definitions • Ownership Technical • Complexity of the Source – Lack of documentation • Combining Data Sources • Keeping history • Data Quality – Shit in – Shit out • Performance

Slide 15

Slide 15 text

15 Data Lake • Collect & organize large volumes of diverse data for later use – raw / original / native / as-is • Preparation & transformation based on the use case – Schema on Read • Benefits – Lower costs – Greater flexibility James Dixon, CTO of Pentaho in 2012

Slide 16

Slide 16 text

16 Data Swamp ”Storing all data does not automatically return Value” • Context • Governance – planning (e.g. ingestion still needed?) – rules – processes – health checks • Security • Ownership & Sponsorship • Architecture & Technologies Data Lakes don’t replace the Data Warehouse

Slide 17

Slide 17 text

17 Data Lake compared to the Data Warehouse ”What is Data Lake capable of and a Data Warehouse is not?” • Quickly and Cheaply Store & Process any type of data DWH ”Schema-on-Write” Data Lake ”Schema-on-Read” Create schema before load Schema (changes) No schema, just copy the data Explicit load operation Transformation SerDer to extract columns Standards Governance Loosely structured Limited Processing Coupled with the data Structured Data Types Un-/semi-structured Scale up Scalability Scale out

Slide 18

Slide 18 text

18 Use cases for Hadoop compared to the Data Warehouse Data Lake is for….. • Data Discovery • Processing & Storing Large (un-/semi-structured) data sets Data Warehouse is for…. • Interactive OLAP-Analytics • Complex ACID transactions DWH ”Schema-on-Write” Data Lake ”Schema-on-Read” Reads are Fast Writes are Fast Governance and Structure Flexibilty & Agility

Slide 19

Slide 19 text

19 Traditional Business Intelligence – Oracle Data Warehouse Deployment Choice Data Management Reservoir Factory Warehouse • Ideal Database Hardware • Smart System Software • Full-Stack Integration On-Premises Oracle Exadata Customer Data Center Purchased Customer Managed Oracle Cloud Oracle Exadata Cloud Service Oracle Data Center Subscription Oracle Managed Customer Cloud Oracle Exadata Cloud Service Customer Data Center Subscription Oracle Managed Oracle Cloud Autonomous Database Cloud Service Oracle Data Center Subscription Oracle Managed

Slide 20

Slide 20 text

20 Oracle Database Platform Analytical Services SQL, In-Memory, R, Advanced Analytics, OLAP Data Support Node.js, Python, .NET, Java, PHP, Ruby, PL/SQL, C, C++, Perl, ORDS, APEX, SODA Relational, JSON, XML, Spatial, Graph, Text, Binary Platform Services Cloud to On-Premise, Clustering, Security, High Availability, Zero Data Loss, Administration Development Services Node.js, Python, .NET, Java, PHP, Ruby, PL/SQL, C, C++, Perl, ORDS, APEX, SODA

Slide 21

Slide 21 text

21 Traditional Business Intelligence – Data Warehousing Data Management Reservoir Factory Warehouse Oracle Database 12.2 – New Features • Better In-Memory capabilities for DWH – Data Scans – Joins – Aggregation • New SQL Features – Approximate Query processing – Faster JSON processing via in-memory – Analytic Views • Common business logic inside the database • New highly-scalable Property Graph analytics

Slide 22

Slide 22 text

22 Oracle REST Data Services Relational DB Document Store NoSQL Oracle REST Data Services Standalone (Jetty) Weblogic, Tomcat and Glasfish REST

Slide 23

Slide 23 text

23 • Included in Oracle Database & Oracle SQL Developer installs o Mid-tier Java application o ORDS maps HTTP(S) verbs - (GET, POST, PUT, DELETE, etc.) o database transactions o returns any results formatted using JSON

Slide 24

Slide 24 text

24 JSON • What is JSON? – JavaScript Object Notation (JSON) • a lightweight data-interchange format • a syntax for storing and exchanging data • "self-describing" & easy to understand • language independent • Why use JSON? – the JSON format is text only, • easily send to and from a server • data format for any programming language

Slide 25

Slide 25 text

25 JSON support in Oracle Database • Creating Tables to Hold JSON • Querying JSON Data – IS JSON – JSON_EXISTS – JSON_VALUE – JSON_QUERY – JSON_TABLE – JSON_TEXTCONTAINS • Identifying Columns Containing JSON • Loading JSON Files Using External Tables

Slide 26

Slide 26 text

26 Analytical Views – a new type of view in the Oracle Database • Business logic back into the database – 3 new database objects – aggregations, hierarchies, calculations • Easily queried and designed with SQL • No persistent storage – works on existing tables and views • Built-in data visualization via APEX

Slide 27

Slide 27 text

27 3 New Database Objects Attribute Dimensions Maps to Dimension / Attribute data Hierarchies Organizes levels into aggregations and drill paths Analytical Views Maps to data objects with Fact / Measure data Can be queried with MDX & SQL

Slide 28

Slide 28 text

28

Slide 29

Slide 29 text

29 Polyglot Persistence ”Polyglot Persistence is a fancy term to mean that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by individual applications or components of a single application. Different kinds of data are best dealt with different data stores. In short, it means picking the right tool for the right use case. ” http://www.jamesserra.com/archive/2015/07/what-is-polyglot-persistence/

Slide 30

Slide 30 text

30 ”Different Data Storage technologies to support different Data Storage needs” Travel Booking System

Slide 31

Slide 31 text

31 Polyglot Persistence – Multi Model • Integrated access to all data in the different database objects – Relational – XML – JSON – Text – Graph & Spatial

Slide 32

Slide 32 text

32 Polyglot Persistence – Single Model • Support for multiple Single Model data stores • Integrated access via Oracle Big Data SQL

Slide 33

Slide 33 text

33 Oracle Big Data SQL

Slide 34

Slide 34 text

34 Oracle Big Data SQL

Slide 35

Slide 35 text

35 Illustration of Borchert's Model

Slide 36

Slide 36 text

36 Oracle Data Warehouse Evolution – “Transforming to Big Data” ”Data Warehousing is Dead?” Data Management Reservoir Factory Warehouse Combine the best of both worlds • Extend Oracle DWH with Oracle Big Data • Combining (new) Big Data with Enterprise Data • Relational & Hadoop & NoSQL – On-Premises & Cloud • Transactional & Social and Web & IoT • Analytics & Data Mining & Machine Learning ”Data Warehousing is not Dead, it’s Evolving!”

Slide 37

Slide 37 text

37 Big Data Analytics in the Oracle Cloud

Slide 38

Slide 38 text

38

Slide 39

Slide 39 text

39 Let’s get SOCIAL