Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Is Data Warehousing dead?

Daan Bakboord
December 06, 2017

Is Data Warehousing dead?

More and more people and devices are connected. Mobile, social and the Internet of Things are causing data volumes to grow. New technologies lead to new possibilities. These new technologies also lead to new ways of Data Management.
Traditional Data Warehousing might not be suitable enough to serve new data needs. Think about structured versus unstructured data, batch versus streaming data. And what about graph data? Do we need new technologies and architectures to be able to keep up with the ever-changing and evolving business demands?

This presentation covers the subject of Oracle Data Warehousing in the Big Data Age. Data Warehousing is around for years and years. The Oracle Database plays a central role in Oracle Data Warehousing. The Oracle 12c Database is not the relational database it used to be. It is equipped to challenge the changing needs of Data Management. What does the Oracle 12c Database have to offer for these changing needs?

Daan Bakboord

December 06, 2017
Tweet

More Decks by Daan Bakboord

Other Decks in Technology

Transcript

  1. 1
    Is Data Warehousing Dead?
    UKOUG Technology Conference &
    Exhibition 2017

    View full-size slide

  2. 2
    Google's pizza
    • Hello! Gordon's pizza?
    o No sir it's Google's pizza.
    • So it's a wrong number? Sorry.
    o No sir, Google bought it.
    • OK. Take my order please
    o Well sir, you want the usual?
    • The usual? You know me?
    o According to our caller ID data sheet, in the last 12 times, you
    ordered pizza with cheeses, sausage, thick crust.
    • OK! This is it ...
    o May I suggest to you this time ricotta, arugula with dry tomato.?
    • What? I hate vegetables.

    View full-size slide

  3. 3
    Google's pizza
    o Your cholesterol is not good, sir.
    • How do you know?
    o We crossed the number of your fixed line ☎with your name,
    through the subscribers guide. We have the result of your blood
    tests for the last 7 years.
    • Okay, but I do not want this pizza! I already take medicine ...
    o Excuse me, but you have not taken the medicine regularly. From our
    commercial database, 4 months ago, you only purchased a box with
    30 cholesterol tablets at Drugsale Network.
    • I bought more from another drugstore.
    o It's not showing on your credit card statement
    o I paid in cash

    View full-size slide

  4. 4
    Google's pizza
    o But you did not withdraw that much cash according to your bank
    statement
    • I have have other source of cash
    o This is not showing as per you last Tax form unless you bought them
    from undeclared income source.
    • WHAT THE HELL?
    o I'm sorry, sir, we use such information only with the intention of
    helping you.
    • Enough! I'm sick of Google, Facebook, Twitter, WhatsApp. I'm going to an
    Island without internet, cable TV, where there is no cell phone line and no
    one to watch me or spy on me
    o I understand sir but you need to renew your passport first as it has
    expired 5 weeks ago

    View full-size slide

  5. 5
    Introduction Quistor: Your Business Analytics Partner of Choice
    Customers
    Worldwide
    150+
    Analytics &
    Big Data
    12Years
    In Business
    Value Propositions
    4 Delivery Centers
    170 Employees
    10 European Offices
    35y Average Age
    Oracle Platinum Partner
    Managed
    Services
    JD Edwards Digital
    24
    7
    Cloud ExaHotel

    View full-size slide

  6. 6
    Who am I?
    http://www.daanalytics.nl
    https://twitter.com/daanbakboord
    https://nl.linkedin.com/in/daanbakboord
    Daan Bakboord
    • Oracle Big Data Anlytics Consultant @ Quistor
    – Oracle BI EE (OBIEE)
    – Oracle Analytics Cloud (OAC, BICS)
    – Oracle Data Visualization
    – Oracle Big Data
    – Oracle BI Applications (OBIA)
    • Information Architecture
    – TOGAF
    – Archimate
    http://blog.daanalytics.nl
    #obihackers
    nl.OUG BIWA SIG Lead

    View full-size slide

  7. 7
    Bloodhound: The 1,000 Mph Car, Powered by Data
    ”Oracle is helping Bloodhound smash the land speed record and reach 1,000 mph”
    http://bit.ly/oracle_bloodhound
    • Artificial Intelligence
    • Augmented Reality
    • Geo Spatial Technology
    • Data Visualisation
    • Virtual Reality
    • Real Time Education

    View full-size slide

  8. 8
    Data Explosion
    Mobile
    Social
    Internet of Things

    View full-size slide

  9. 9
    • Challenge
    – Massive amounts of storage to indexing the entire web
    – Process large amounts of data requires a new approach
    • Solution
    – GFS, the Google File System
    o Described in a paper released in 2003
    – Distributed MapReduce
    o Described in a paper released in 2004

    View full-size slide

  10. 10
    What’s Hadoop?
    Hadoop is a Software Framework for Storing, Processing and
    Analyzing Big Data
    • Distributed
    • Scalable
    • Fault-tolerant
    • Open Source

    View full-size slide

  11. 11
    Core Hadoop
    Core Hadoop
    • Distributed File System HDFS – Stores
    data
    • Hadoop MapReduce – Processes data
    • Hadoop Yarn – Schedules work
    Hadoop Eco System

    View full-size slide

  12. 12
    ”Data Warehousing is Dead?”
    Data Management
    Reservoir Factory Warehouse

    View full-size slide

  13. 13
    The Data Warehouse
    • Defenition (Bill Inmon)
    – "A (Data) Warehouse is a subject-
    oriented, integrated, time-variant and
    non-volatile collection of data in support
    of management's decision making
    process.“
    William H. Inmon (1990)
    A Data Warehouse is NOT the ultimate goal;
    It servers to support the ‘decision-making proces’. This process should be clear first before you can make
    a definitive decision about the development of the Data Warehouse.

    View full-size slide

  14. 14
    Challenges
    Business
    • What do we want with Analytics?
    • Uniformed Data Definitions
    • Ownership
    Technical
    • Complexity of the Source
    – Lack of documentation
    • Combining Data Sources
    • Keeping history
    • Data Quality
    – Shit in – Shit out
    • Performance

    View full-size slide

  15. 15
    Data Lake
    • Collect & organize large volumes of diverse data for later use
    – raw / original / native / as-is
    • Preparation & transformation based on the use case
    – Schema on Read
    • Benefits
    – Lower costs
    – Greater flexibility
    James Dixon, CTO of Pentaho in 2012

    View full-size slide

  16. 16
    Data Swamp
    ”Storing all data does not automatically return Value”
    • Context
    • Governance
    – planning (e.g. ingestion still needed?)
    – rules
    – processes
    – health checks
    • Security
    • Ownership & Sponsorship
    • Architecture & Technologies
    Data Lakes don’t replace the Data Warehouse

    View full-size slide

  17. 17
    Data Lake compared to the Data Warehouse
    ”What is Data Lake capable of and a Data Warehouse is not?”
    • Quickly and Cheaply Store & Process any type of data
    DWH
    ”Schema-on-Write”
    Data Lake
    ”Schema-on-Read”
    Create schema before load Schema (changes) No schema, just copy the data
    Explicit load operation Transformation SerDer to extract columns
    Standards Governance Loosely structured
    Limited Processing Coupled with the data
    Structured Data Types Un-/semi-structured
    Scale up Scalability Scale out

    View full-size slide

  18. 18
    Use cases for Hadoop compared to the Data Warehouse
    Data Lake is for…..
    • Data Discovery
    • Processing & Storing Large (un-/semi-structured) data sets
    Data Warehouse is for….
    • Interactive OLAP-Analytics
    • Complex ACID transactions
    DWH
    ”Schema-on-Write”
    Data Lake
    ”Schema-on-Read”
    Reads are Fast Writes are Fast
    Governance and
    Structure
    Flexibilty & Agility

    View full-size slide

  19. 19
    Traditional Business Intelligence – Oracle Data Warehouse Deployment Choice
    Data Management
    Reservoir Factory Warehouse
    • Ideal Database Hardware
    • Smart System Software
    • Full-Stack Integration
    On-Premises
    Oracle Exadata
    Customer Data Center
    Purchased
    Customer Managed
    Oracle Cloud
    Oracle Exadata
    Cloud Service
    Oracle Data Center
    Subscription
    Oracle Managed
    Customer Cloud
    Oracle Exadata
    Cloud Service
    Customer Data Center
    Subscription
    Oracle Managed
    Oracle Cloud
    Autonomous Database
    Cloud Service
    Oracle Data Center
    Subscription
    Oracle Managed

    View full-size slide

  20. 20
    Oracle Database Platform
    Analytical Services
    SQL, In-Memory, R,
    Advanced Analytics,
    OLAP
    Data Support
    Node.js, Python, .NET, Java,
    PHP, Ruby, PL/SQL, C, C++,
    Perl, ORDS, APEX, SODA
    Relational, JSON, XML,
    Spatial, Graph, Text,
    Binary
    Platform Services
    Cloud to On-Premise,
    Clustering,
    Security, High Availability,
    Zero Data Loss, Administration
    Development Services
    Node.js, Python, .NET, Java,
    PHP, Ruby, PL/SQL, C, C++,
    Perl, ORDS, APEX, SODA

    View full-size slide

  21. 21
    Traditional Business Intelligence – Data Warehousing
    Data Management
    Reservoir Factory Warehouse
    Oracle Database 12.2 – New Features
    • Better In-Memory capabilities for DWH
    – Data Scans
    – Joins
    – Aggregation
    • New SQL Features
    – Approximate Query processing
    – Faster JSON processing via in-memory
    – Analytic Views
    • Common business logic inside the database
    • New highly-scalable Property Graph analytics

    View full-size slide

  22. 22
    Oracle REST Data Services
    Relational DB
    Document Store
    NoSQL
    Oracle REST Data
    Services
    Standalone (Jetty)
    Weblogic, Tomcat
    and Glasfish
    REST

    View full-size slide

  23. 23
    • Included in Oracle Database & Oracle SQL Developer installs
    o Mid-tier Java application
    o ORDS maps HTTP(S) verbs
    - (GET, POST, PUT, DELETE, etc.)
    o database transactions
    o returns any results formatted using JSON

    View full-size slide

  24. 24
    JSON
    • What is JSON?
    – JavaScript Object Notation (JSON)
    • a lightweight data-interchange format
    • a syntax for storing and exchanging data
    • "self-describing" & easy to understand
    • language independent
    • Why use JSON?
    – the JSON format is text only,
    • easily send to and from a server
    • data format for any programming language

    View full-size slide

  25. 25
    JSON support in Oracle Database
    • Creating Tables to Hold JSON
    • Querying JSON Data
    – IS JSON
    – JSON_EXISTS
    – JSON_VALUE
    – JSON_QUERY
    – JSON_TABLE
    – JSON_TEXTCONTAINS
    • Identifying Columns Containing JSON
    • Loading JSON Files Using External Tables

    View full-size slide

  26. 26
    Analytical Views – a new type of view in the Oracle Database
    • Business logic back into the database
    – 3 new database objects
    – aggregations, hierarchies, calculations
    • Easily queried and designed with SQL
    • No persistent storage
    – works on existing tables and views
    • Built-in data visualization via APEX

    View full-size slide

  27. 27
    3 New Database Objects
    Attribute Dimensions
    Maps to Dimension /
    Attribute data
    Hierarchies
    Organizes levels into
    aggregations and drill paths
    Analytical Views
    Maps to data objects with Fact /
    Measure data
    Can be queried with MDX & SQL

    View full-size slide

  28. 29
    Polyglot Persistence
    ”Polyglot Persistence is a fancy term to mean that when storing data, it is best to use multiple
    data storage technologies, chosen based upon the way data is being used by individual
    applications or components of a single application. Different kinds of data are best dealt with
    different data stores. In short, it means picking the right tool for the right use case. ”
    http://www.jamesserra.com/archive/2015/07/what-is-polyglot-persistence/

    View full-size slide

  29. 30
    ”Different Data Storage technologies to support different Data Storage needs”
    Travel Booking System

    View full-size slide

  30. 31
    Polyglot Persistence – Multi Model
    • Integrated access to all data in the different database
    objects
    – Relational
    – XML
    – JSON
    – Text
    – Graph & Spatial

    View full-size slide

  31. 32
    Polyglot Persistence – Single Model
    • Support for multiple Single Model data
    stores
    • Integrated access via Oracle Big Data
    SQL

    View full-size slide

  32. 33
    Oracle Big Data SQL

    View full-size slide

  33. 34
    Oracle Big Data SQL

    View full-size slide

  34. 35
    Illustration of Borchert's Model

    View full-size slide

  35. 36
    Oracle Data Warehouse Evolution – “Transforming to Big Data”
    ”Data Warehousing is Dead?”
    Data Management
    Reservoir Factory Warehouse
    Combine the best of both worlds
    • Extend Oracle DWH with Oracle Big Data
    • Combining (new) Big Data with Enterprise Data
    • Relational & Hadoop & NoSQL
    – On-Premises & Cloud
    • Transactional & Social and Web & IoT
    • Analytics & Data Mining & Machine Learning
    ”Data Warehousing is not Dead, it’s
    Evolving!”

    View full-size slide

  36. 37
    Big Data Analytics in the Oracle Cloud

    View full-size slide

  37. 39
    Let’s get SOCIAL

    View full-size slide