Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unifying Analytics Across Data Sources with Red Hat JBoss Data Virtualization

Unifying Analytics Across Data Sources with Red Hat JBoss Data Virtualization

How can a business intelligence team evaluate and unify data that exists in many different types of data sources? For Red Hat IT, the answer is our own JBoss Data Virtualization product, which enables unifying access to these various sources behind a SQL interface, easily integrated with existing reporting software.

In this session you will learn:
* The problems data virtualization solves for Red Hat Business Intelligence, such as relating data between databases, web services, spreadsheets and flat files.
* How Red Hat IT enabled analytics across numerous data sources using JBoss Data Virtualization:
** Cloud friendly architecture
** Automated installation and configuration using Puppet on AWS
** Caching and materialized views
** Supporting multiple clients with different data needs

Naveen Malik

June 29, 2016
Tweet

More Decks by Naveen Malik

Other Decks in Technology

Transcript

  1. #redhat #rhsummit UNIFYING ANALYTICS UNIFYING ANALYTICS ACROSS DATA SOURCES WITH

    ACROSS DATA SOURCES WITH RED HAT RED HAT JBOSS DATA JBOSS DATA VIRTUALIZATION VIRTUALIZATION Naveen Malik - Principal Software Engineer Burak Serdar - Principal Software Engineer Ian Firman - Business Intelligence Architect June 29, 2016 1 1
  2. RED HAT RED HAT IT JOURNEY IT JOURNEY #redhat #rhsummit

    Opportunity Research Test Plan Execute Maintain 3 3
  3. OUR JOURNEY TODAY OUR JOURNEY TODAY #redhat #rhsummit Opportunity Research

    Test Plan Execute Maintain 4 4 Ian Firman Burak Serdar Naveen Malik
  4. WHAT IS JBOSS DATA WHAT IS JBOSS DATA VIRTUALIZATION? VIRTUALIZATION?

    #redhat #rhsummit JBoss Data Virtualization is a data abstraction solution that sits in front of multiple data sources, allows them to be treated as a single source and accessed by various data consumers and/or applications Heterogeneous Sources Heterogeneous Clients 5 5
  5. Diverse data sources Data is coming in fast Diverse data

    consumers #redhat #rhsummit OPPORTUNITY OPPORTUNITY 9 9
  6. OPPORTUNITY OPPORTUNITY Diverse data sources Data is coming in fast

    Diverse data consumers #redhat #rhsummit Combining real time and historical data 10 10
  7. RED HAT RED HAT USE CASES USE CASES TRANSACTIONAL TRANSACTIONAL

    #redhat #rhsummit LOW COMPLEXITY (ONE END POINT) LOW COMPLEXITY (ONE END POINT) Report from Bugzilla Simplify queries Filter sensitive data Extract to Warehouse 11 11
  8. RED HAT RED HAT USE CASES USE CASES OPERATIONAL OPERATIONAL

    #redhat #rhsummit MEDIUM COMPLEXITY (MANY END POINTS) MEDIUM COMPLEXITY (MANY END POINTS) "Where's my message?" Choice of reporting tools 12 12
  9. RED HAT RED HAT USE CASES USE CASES ANALYTICAL ANALYTICAL

    #redhat #rhsummit HIGH COMPLEXITY (MANY END POINTS, DYNAMIC) HIGH COMPLEXITY (MANY END POINTS, DYNAMIC) Data Science - Discovery, Mining, Advanced Analytics Real-time and Historical 13 13
  10. "All problems in computer science can be solved by another

    level of abstraction" paraphrased from David Wheeler #redhat #rhsummit 14 14
  11. LOGICAL ARCHITECTURE LOGICAL ARCHITECTURE #redhat #rhsummit Data Sources Data Warehouse(s)

    Files Data Consumers BI Reports & Analytics Mobile ESB & ETL 15 15
  12. LOGICAL ARCHITECTURE LOGICAL ARCHITECTURE #redhat #rhsummit Data Sources Data Warehouse(s)

    Files 16 16 Data Consumers BI Reports & Analytics Mobile ESB & ETL
  13. LOGICAL ARCHITECTURE LOGICAL ARCHITECTURE #redhat #rhsummit JBOSS Data Virtualization Integrated

    and abstracted sources Data Sources 17 17 Data Warehouse(s) Files Data Consumers BI Reports & Analytics Mobile ESB & ETL Multiple protocol access
  14. LOGICAL ARCHITECTURE LOGICAL ARCHITECTURE #redhat #rhsummit Business Logic and data

    formatting Central Security JBOSS Data Virtualization Integrated and abstracted sources 18 18 Data Sources Data Warehouse(s) Files Data Consumers BI Reports & Analytics Mobile ESB & ETL Multiple protocol access
  15. LOGICAL ARCHITECTURE LOGICAL ARCHITECTURE #redhat #rhsummit Business Logic and data

    formatting Central Security JBOSS Data Virtualization Integrated and abstracted sources 19 19 Data Sources Data Warehouse(s) Files Data Consumers BI Reports & Analytics Mobile ESB & ETL Multiple protocol access
  16. VIRTUAL DATABASES VIRTUAL DATABASES DEFINITION DEFINITION #redhat #rhsummit A virtual

    database (or VDB) is a container for components used to integrate data from multiple data sources, so that they can be accessed in an integrated manner through a single, uniform API. 20 20
  17. VDB STRATEGY VDB STRATEGY BASE VDB BASE VDB Abstracts the

    physical source #redhat #rhsummit 22 22
  18. VDB STRATEGY VDB STRATEGY VIRTUAL DATA MART VIRTUAL DATA MART

    #redhat #rhsummit Combine Base VDBs for analysis/reporting Business logic/formatting applied Data security applied at this layer Abstracts the physical source 23 23
  19. THINGS TO CONSIDER THINGS TO CONSIDER FLEXIBILITY FLEXIBILITY #redhat #rhsummit

    WHERE TO DEPLOY? WHERE TO DEPLOY? Cloud and on premise Real-time data vs. materialized views Locality to clients 29 29
  20. THINGS TO CONSIDER THINGS TO CONSIDER SCALABILITY SCALABILITY #redhat #rhsummit

    HOW EASY TO SCALE HOW EASY TO SCALE Initial starting point Easily scale by adding and removing nodes Clients isolated from infrastructure changes 31 31
  21. SIZING SIZING JDV SIZING TOOL JDV SIZING TOOL INPUT: REQUIREMENTS

    INPUT: REQUIREMENTS OUTPUT: RECOMMENDATION OUTPUT: RECOMMENDATION How much data? How is data being accessed? CPU Storage Memory JVM Architecture #redhat #rhsummit 32 32
  22. THINGS TO CONSIDER THINGS TO CONSIDER SECURITY SECURITY #redhat #rhsummit

    IN TRANSIT, AT REST, AUTH? IN TRANSIT, AT REST, AUTH? Transport Layer Security (TLS) Disk encryption Authentication Authorization 33 33
  23. JBOSS EAP ROLE MANAGEMENT JBOSS EAP ROLE MANAGEMENT SAML LDAP

    Basic Auth Custom more.. SECURITY SECURITY ROLE BASED ACCESS CONTROL ROLE BASED ACCESS CONTROL #redhat #rhsummit 34 34
  24. PHYSICAL ARCHITECTURE PHYSICAL ARCHITECTURE Amazon Web Services and AWS are

    trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries. #redhat #rhsummit 38 38
  25. PUPPET & JCLIFF PUPPET & JCLIFF #redhat #rhsummit jcliff .

    . . Differences Configuration Snippets 43 43
  26. DATA SOURCES DATA SOURCES REDSHIFT RESOURCE REDSHIFT RESOURCE #redhat #rhsummit

    jcliff::datasource { 'redshift_ds': } jndi_name => 'java:/redshift', url => hiera('jdvbi::redshift::url'), driver_name => 'RedshiftJDBC4-1.1.1.0001.jar', username => hiera('jdvbi::redshift::username'), 45 45
  27. DATA SOURCES DATA SOURCES REDSHIFT RESOURCE REDSHIFT RESOURCE #redhat #rhsummit

    { "datasource" => { "redshift_ds" => { "jndi-name" => "java:/redshift", "driver-name" => "RedshiftJDBC4-1.1.1.0001.jar", "enabled" => "true", " " " " jcliff::datasource { 'redshift_ds': } jndi_name => 'java:/redshift', url => hiera('jdvbi::redshift::url'), driver_name => 'RedshiftJDBC4-1.1.1.0001.jar', username => hiera('jdvbi::redshift::username'), 46 46
  28. #redhat #rhsummit jcliff::teiid_salesforce_ra { 'sfcom': jndi_name =>'java:/sf_ds', url => hiera('jdvbi::salesforce::url'),

    username => hiera('jdvbi:salesforce:username'), SALESFORCE SALESFORCE DEFINE RESOURCE ADAPTER DEFINE RESOURCE ADAPTER 47 47
  29. #redhat #rhsummit { "resource-adapter" => { "sf_ra" => { "module"

    => "org.jboss.teiid.resource-adapter.salesforce:main", "transaction-support" => "NoTransaction", "connection-definitions" => { "sf_ra" => { "enabled" => true, "jndi-name" => "java:/sf_ds", "config-properties" => { jcliff::teiid_salesforce_ra { 'sfcom': jndi_name =>'java:/sf_ds', url => hiera('jdvbi::salesforce::url'), username => hiera('jdvbi:salesforce:username'), SALESFORCE SALESFORCE DEFINE RESOURCE ADAPTER DEFINE RESOURCE ADAPTER 48 48
  30. CONNECTIONS CONNECTIONS SECURE DATABASE SECURE DATABASE # Add a teiid

    JDBC transport with TLS jcliff::configfile { 'ssl-jdbc.conf': content > template('jbossdvbi/ssl jdbc conf erb') #redhat #rhsummit 49 49
  31. CONNECTIONS CONNECTIONS SECURE DATABASE SECURE DATABASE {"teiid" => { "transport"

    => { "jdbc" => { "keystore-key-alias" => "<%=@keystore_alias%>", "keystore-key-password" => "<%=@keystore_password%>", "keystore-password" => "<%=@keystore_password%>", "keystore-type" => "JKS", "socket-binding" => "teiid-jdbc", "ssl-authentication-mode" => "1-way", # Add a teiid JDBC transport with TLS jcliff::configfile { 'ssl-jdbc.conf': content > template('jbossdvbi/ssl jdbc conf erb') #redhat #rhsummit 50 50
  32. CONNECTIONS CONNECTIONS SECURE CLIENTS SECURE CLIENTS # Add a socket

    binding for TLS JDBC jcliff::socket_binding { 'teiid-jdbc': #redhat #rhsummit 51 51
  33. CONNECTIONS CONNECTIONS SECURE CLIENTS SECURE CLIENTS { "standard-sockets" => {

    "socket-binding" => { " " "t iid jdb " # Add a socket binding for TLS JDBC jcliff::socket_binding { 'teiid-jdbc': #redhat #rhsummit 52 52
  34. VIRTUAL DATABASES VIRTUAL DATABASES DEPLOY VDB DEPLOY VDB #redhat #rhsummit

    # Retrieve VDB from staging area exec { 'get-vdb': command => "wget ..." } # deploy VDB 53 53
  35. VIRTUAL DATABASES VIRTUAL DATABASES DEPLOY VDB DEPLOY VDB #redhat #rhsummit

    { "deployments" => { "my.vdb" => { # Retrieve VDB from staging area exec { 'get-vdb': command => "wget ..." } # deploy VDB 54 54
  36. VALUE VALUE IS IT WORKING OUT? IS IT WORKING OUT?

    #redhat #rhsummit CURRENT DEPLOYMENTS CURRENT DEPLOYMENTS MORE IN THE FUTURE! MORE IN THE FUTURE! Marketing VDB Data Scientists VDB 56 56
  37. USEFUL NEW FEATURES USEFUL NEW FEATURES NEW THINGS FROM THE

    PRODUCT! NEW THINGS FROM THE PRODUCT! #redhat #rhsummit NEW AND EXCITING NEW AND EXCITING Dynamic VDB Unified RBAC Redshift Translator 57 57
  38. CONCLUSION CONCLUSION PARTING THOUGHTS PARTING THOUGHTS Centralize business logic. Centralize

    security. Flexible architecture. Fast integration. Heterogeneous sources. Decoupling. #redhat #rhsummit Execute 60 60
  39. Repeatable. Automated. Supported. CONCLUSION CONCLUSION PARTING THOUGHTS PARTING THOUGHTS Centralize

    business logic. Centralize security. Flexible architecture. Fast integration. Heterogeneous sources. Decoupling. #redhat #rhsummit 61 61
  40. RESOURCES RESOURCES PRODUCTS PRODUCTS #redhat #rhsummit Red Hat JBoss Data

    Virtualization https://www.redhat.com/en/technologies/jboss-middleware/data-virtualization Red Rat CloudForms https://www.redhat.com/en/technologies/cloud-computing/cloudforms Ansible https://www.ansible.com/ 65 65
  41. RESOURCES RESOURCES TOOLS TOOLS #redhat #rhsummit JBoss Data Virtualization Sizing

    Architecture Tool https://access.redhat.com/labs/jbossdvsat/ jcliff https://github.com/bserdar/jcliff puppet-jcliff https://github.com/bserdar/puppet-jcliff 66 66