Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unifying Analytics Across Data Sources with Red Hat JBoss Data Virtualization

Unifying Analytics Across Data Sources with Red Hat JBoss Data Virtualization

How can a business intelligence team evaluate and unify data that exists in many different types of data sources? For Red Hat IT, the answer is our own JBoss Data Virtualization product, which enables unifying access to these various sources behind a SQL interface, easily integrated with existing reporting software.

In this session you will learn:
* The problems data virtualization solves for Red Hat Business Intelligence, such as relating data between databases, web services, spreadsheets and flat files.
* How Red Hat IT enabled analytics across numerous data sources using JBoss Data Virtualization:
** Cloud friendly architecture
** Automated installation and configuration using Puppet on AWS
** Caching and materialized views
** Supporting multiple clients with different data needs

Naveen Malik

June 29, 2016
Tweet

More Decks by Naveen Malik

Other Decks in Technology

Transcript

  1. #redhat #rhsummit
    UNIFYING ANALYTICS
    UNIFYING ANALYTICS
    ACROSS DATA SOURCES WITH
    ACROSS DATA SOURCES WITH
    RED HAT
    RED HAT JBOSS DATA
    JBOSS DATA
    VIRTUALIZATION
    VIRTUALIZATION
    Naveen Malik - Principal Software Engineer
    Burak Serdar - Principal Software Engineer
    Ian Firman - Business Intelligence Architect
    June 29, 2016
    1
    1

    View Slide

  2. GET THIS DECK NOW...
    GET THIS DECK NOW...
    #redhat #rhsummit
    2
    2

    View Slide

  3. RED HAT
    RED HAT IT JOURNEY
    IT JOURNEY
    #redhat #rhsummit
    Opportunity Research Test
    Plan Execute Maintain
    3
    3

    View Slide

  4. OUR JOURNEY TODAY
    OUR JOURNEY TODAY
    #redhat #rhsummit
    Opportunity Research Test
    Plan Execute Maintain
    4
    4
    Ian Firman
    Burak Serdar
    Naveen Malik

    View Slide

  5. WHAT IS JBOSS DATA
    WHAT IS JBOSS DATA
    VIRTUALIZATION?
    VIRTUALIZATION?
    #redhat #rhsummit
    JBoss Data Virtualization is a
    data abstraction solution that
    sits in front of multiple data
    sources, allows them to be
    treated as a single source and
    accessed by various data
    consumers and/or applications
    Heterogeneous Sources
    Heterogeneous Clients
    5
    5

    View Slide

  6. OPPORTUNITY
    OPPORTUNITY
    BUSINESS CASE
    BUSINESS CASE
    #redhat #rhsummit
    Opportunity
    6
    6

    View Slide

  7. Data is coming in fast
    #redhat #rhsummit
    OPPORTUNITY
    OPPORTUNITY
    7
    7

    View Slide

  8. Data is coming in fast
    Diverse data consumers
    #redhat #rhsummit
    OPPORTUNITY
    OPPORTUNITY
    8
    8

    View Slide

  9. Diverse data sources
    Data is coming in fast
    Diverse data consumers
    #redhat #rhsummit
    OPPORTUNITY
    OPPORTUNITY
    9
    9

    View Slide

  10. OPPORTUNITY
    OPPORTUNITY
    Diverse data sources
    Data is coming in fast
    Diverse data consumers
    #redhat #rhsummit
    Combining real time and historical data
    10
    10

    View Slide

  11. RED HAT
    RED HAT USE CASES
    USE CASES
    TRANSACTIONAL
    TRANSACTIONAL
    #redhat #rhsummit
    LOW COMPLEXITY (ONE END POINT)
    LOW COMPLEXITY (ONE END POINT)
    Report from Bugzilla
    Simplify queries
    Filter sensitive data
    Extract to Warehouse
    11
    11

    View Slide

  12. RED HAT
    RED HAT USE CASES
    USE CASES
    OPERATIONAL
    OPERATIONAL
    #redhat #rhsummit
    MEDIUM COMPLEXITY (MANY END POINTS)
    MEDIUM COMPLEXITY (MANY END POINTS)
    "Where's my message?"
    Choice of reporting tools
    12
    12

    View Slide

  13. RED HAT
    RED HAT USE CASES
    USE CASES
    ANALYTICAL
    ANALYTICAL
    #redhat #rhsummit
    HIGH COMPLEXITY (MANY END POINTS, DYNAMIC)
    HIGH COMPLEXITY (MANY END POINTS, DYNAMIC)
    Data Science - Discovery, Mining, Advanced Analytics
    Real-time and Historical
    13
    13

    View Slide

  14. "All problems in computer science can be
    solved by another level of abstraction"
    paraphrased from David Wheeler
    #redhat #rhsummit
    14
    14

    View Slide

  15. LOGICAL ARCHITECTURE
    LOGICAL ARCHITECTURE
    #redhat #rhsummit
    Data Sources
    Data Warehouse(s) Files
    Data Consumers
    BI Reports & Analytics Mobile ESB & ETL
    15
    15

    View Slide

  16. LOGICAL ARCHITECTURE
    LOGICAL ARCHITECTURE
    #redhat #rhsummit
    Data Sources
    Data Warehouse(s) Files
    16
    16
    Data Consumers
    BI Reports & Analytics Mobile ESB & ETL

    View Slide

  17. LOGICAL ARCHITECTURE
    LOGICAL ARCHITECTURE
    #redhat #rhsummit
    JBOSS Data Virtualization
    Integrated and abstracted sources
    Data Sources
    17
    17
    Data Warehouse(s) Files
    Data Consumers
    BI Reports & Analytics Mobile ESB & ETL
    Multiple protocol access

    View Slide

  18. LOGICAL ARCHITECTURE
    LOGICAL ARCHITECTURE
    #redhat #rhsummit
    Business Logic
    and data formatting
    Central Security
    JBOSS Data Virtualization
    Integrated and abstracted sources
    18
    18
    Data Sources
    Data Warehouse(s) Files
    Data Consumers
    BI Reports & Analytics Mobile ESB & ETL
    Multiple protocol access

    View Slide

  19. LOGICAL ARCHITECTURE
    LOGICAL ARCHITECTURE
    #redhat #rhsummit
    Business Logic
    and data formatting
    Central Security
    JBOSS Data Virtualization
    Integrated and abstracted sources
    19
    19
    Data Sources
    Data Warehouse(s) Files
    Data Consumers
    BI Reports & Analytics Mobile ESB & ETL
    Multiple protocol access

    View Slide

  20. VIRTUAL DATABASES
    VIRTUAL DATABASES
    DEFINITION
    DEFINITION
    #redhat #rhsummit
    A virtual database (or VDB) is a container for
    components used to integrate data from multiple
    data sources, so that they can be accessed in
    an integrated manner through a single, uniform
    API.
    20
    20

    View Slide

  21. VDB STRATEGY
    VDB STRATEGY
    SOURCES
    SOURCES
    #redhat #rhsummit
    21
    21

    View Slide

  22. VDB STRATEGY
    VDB STRATEGY
    BASE VDB
    BASE VDB
    Abstracts the physical source
    #redhat #rhsummit
    22
    22

    View Slide

  23. VDB STRATEGY
    VDB STRATEGY
    VIRTUAL DATA MART
    VIRTUAL DATA MART
    #redhat #rhsummit
    Combine Base VDBs for
    analysis/reporting
    Business logic/formatting applied
    Data security applied at this layer
    Abstracts the physical source
    23
    23

    View Slide

  24. #redhat #rhsummit
    DEVELOP
    DEVELOP
    VDB LIFECYCLE
    VDB LIFECYCLE
    Virtual
    Database
    24
    24

    View Slide

  25. #redhat #rhsummit
    DEPLOY
    DEPLOY
    VDB LIFECYCLE
    VDB LIFECYCLE
    Virtual
    Database
    25
    25

    View Slide

  26. #redhat #rhsummit
    USE!
    USE!
    VDB LIFECYCLE
    VDB LIFECYCLE
    Virtual
    Database
    26
    26

    View Slide

  27. PLAN
    PLAN
    ARCHITECTURE
    ARCHITECTURE
    #redhat #rhsummit
    Opportunity Plan
    27
    27

    View Slide

  28. THINGS TO CONSIDER
    THINGS TO CONSIDER
    DESIGN OPPORTUNITIES
    DESIGN OPPORTUNITIES
    #redhat #rhsummit
    28
    28

    View Slide

  29. THINGS TO CONSIDER
    THINGS TO CONSIDER
    FLEXIBILITY
    FLEXIBILITY
    #redhat #rhsummit
    WHERE TO DEPLOY?
    WHERE TO DEPLOY?
    Cloud and on premise
    Real-time data vs. materialized views
    Locality to clients
    29
    29

    View Slide

  30. OPEN HYBRID CLOUD
    OPEN HYBRID CLOUD
    Heterogeneous Sources
    Heterogeneous Clients
    #redhat #rhsummit
    30
    30

    View Slide

  31. THINGS TO CONSIDER
    THINGS TO CONSIDER
    SCALABILITY
    SCALABILITY
    #redhat #rhsummit
    HOW EASY TO SCALE
    HOW EASY TO SCALE
    Initial starting point
    Easily scale by adding and removing nodes
    Clients isolated from infrastructure changes
    31
    31

    View Slide

  32. SIZING
    SIZING
    JDV SIZING TOOL
    JDV SIZING TOOL
    INPUT: REQUIREMENTS
    INPUT: REQUIREMENTS
    OUTPUT: RECOMMENDATION
    OUTPUT: RECOMMENDATION
    How much data?
    How is data being accessed?
    CPU
    Storage
    Memory
    JVM Architecture
    #redhat #rhsummit
    32
    32

    View Slide

  33. THINGS TO CONSIDER
    THINGS TO CONSIDER
    SECURITY
    SECURITY
    #redhat #rhsummit
    IN TRANSIT, AT REST, AUTH?
    IN TRANSIT, AT REST, AUTH?
    Transport Layer Security (TLS)
    Disk encryption
    Authentication
    Authorization
    33
    33

    View Slide

  34. JBOSS EAP ROLE MANAGEMENT
    JBOSS EAP ROLE MANAGEMENT
    SAML
    LDAP
    Basic Auth
    Custom
    more..
    SECURITY
    SECURITY
    ROLE BASED ACCESS CONTROL
    ROLE BASED ACCESS CONTROL
    #redhat #rhsummit
    34
    34

    View Slide

  35. Object
    #redhat #rhsummit
    SECURITY
    SECURITY
    DEFINED AT...
    DEFINED AT...
    35
    35

    View Slide

  36. Object
    Row
    SECURITY
    SECURITY
    DEFINED AT...
    DEFINED AT...
    #redhat #rhsummit
    36
    36

    View Slide

  37. Object
    Row
    Field
    #redhat #rhsummit
    SECURITY
    SECURITY
    DEFINED AT...
    DEFINED AT...
    37
    37

    View Slide

  38. PHYSICAL ARCHITECTURE
    PHYSICAL ARCHITECTURE
    Amazon Web Services and AWS are trademarks of Amazon.com, Inc. or
    its affiliates in the United States and/or other countries.
    #redhat #rhsummit
    38
    38

    View Slide

  39. EXECUTE
    EXECUTE
    IMPLEMENTATION
    IMPLEMENTATION
    Opportunity Plan Execute
    #redhat #rhsummit
    39
    39

    View Slide

  40. #redhat #rhsummit
    Virtual
    Database
    MAKING IT HAPPEN!
    MAKING IT HAPPEN!
    TOOLS
    TOOLS
    40
    40

    View Slide

  41. #redhat #rhsummit
    Virtual
    Database
    MAKING IT HAPPEN!
    MAKING IT HAPPEN!
    TOOLS
    TOOLS
    jcliff
    41
    41

    View Slide

  42. PUPPET & JCLIFF
    PUPPET & JCLIFF
    #redhat #rhsummit
    jcliff
    . . .
    Configuration
    Snippets
    42
    42

    View Slide

  43. PUPPET & JCLIFF
    PUPPET & JCLIFF
    #redhat #rhsummit
    jcliff
    . . .
    Differences
    Configuration
    Snippets
    43
    43

    View Slide

  44. PUPPET & JCLIFF
    PUPPET & JCLIFF
    #redhat #rhsummit
    Differences
    jcliff
    . . .
    Configuration
    Snippets
    44
    44

    View Slide

  45. DATA SOURCES
    DATA SOURCES
    REDSHIFT RESOURCE
    REDSHIFT RESOURCE
    #redhat #rhsummit
    jcliff::datasource { 'redshift_ds': }
    jndi_name => 'java:/redshift',
    url => hiera('jdvbi::redshift::url'),
    driver_name => 'RedshiftJDBC4-1.1.1.0001.jar',
    username => hiera('jdvbi::redshift::username'),
    45
    45

    View Slide

  46. DATA SOURCES
    DATA SOURCES
    REDSHIFT RESOURCE
    REDSHIFT RESOURCE
    #redhat #rhsummit
    { "datasource" => {
    "redshift_ds" => {
    "jndi-name" => "java:/redshift",
    "driver-name" => "RedshiftJDBC4-1.1.1.0001.jar",
    "enabled" => "true",
    " " " "
    jcliff::datasource { 'redshift_ds': }
    jndi_name => 'java:/redshift',
    url => hiera('jdvbi::redshift::url'),
    driver_name => 'RedshiftJDBC4-1.1.1.0001.jar',
    username => hiera('jdvbi::redshift::username'),
    46
    46

    View Slide

  47. #redhat #rhsummit
    jcliff::teiid_salesforce_ra { 'sfcom':
    jndi_name =>'java:/sf_ds',
    url => hiera('jdvbi::salesforce::url'),
    username => hiera('jdvbi:salesforce:username'),
    SALESFORCE
    SALESFORCE
    DEFINE RESOURCE ADAPTER
    DEFINE RESOURCE ADAPTER
    47
    47

    View Slide

  48. #redhat #rhsummit
    { "resource-adapter" => {
    "sf_ra" => {
    "module" => "org.jboss.teiid.resource-adapter.salesforce:main",
    "transaction-support" => "NoTransaction",
    "connection-definitions" => { "sf_ra" => {
    "enabled" => true,
    "jndi-name" => "java:/sf_ds",
    "config-properties" => {
    jcliff::teiid_salesforce_ra { 'sfcom':
    jndi_name =>'java:/sf_ds',
    url => hiera('jdvbi::salesforce::url'),
    username => hiera('jdvbi:salesforce:username'),
    SALESFORCE
    SALESFORCE
    DEFINE RESOURCE ADAPTER
    DEFINE RESOURCE ADAPTER
    48
    48

    View Slide

  49. CONNECTIONS
    CONNECTIONS
    SECURE DATABASE
    SECURE DATABASE
    # Add a teiid JDBC transport with TLS
    jcliff::configfile { 'ssl-jdbc.conf':
    content > template('jbossdvbi/ssl jdbc conf erb')
    #redhat #rhsummit
    49
    49

    View Slide

  50. CONNECTIONS
    CONNECTIONS
    SECURE DATABASE
    SECURE DATABASE
    {"teiid" => {
    "transport" => {
    "jdbc" => {
    "keystore-key-alias" => "<%=@keystore_alias%>",
    "keystore-key-password" => "<%=@keystore_password%>",
    "keystore-password" => "<%=@keystore_password%>",
    "keystore-type" => "JKS",
    "socket-binding" => "teiid-jdbc",
    "ssl-authentication-mode" => "1-way",
    # Add a teiid JDBC transport with TLS
    jcliff::configfile { 'ssl-jdbc.conf':
    content > template('jbossdvbi/ssl jdbc conf erb')
    #redhat #rhsummit
    50
    50

    View Slide

  51. CONNECTIONS
    CONNECTIONS
    SECURE CLIENTS
    SECURE CLIENTS
    # Add a socket binding for TLS JDBC
    jcliff::socket_binding { 'teiid-jdbc':
    #redhat #rhsummit
    51
    51

    View Slide

  52. CONNECTIONS
    CONNECTIONS
    SECURE CLIENTS
    SECURE CLIENTS
    { "standard-sockets" => {
    "socket-binding" => {
    " " "t iid jdb "
    # Add a socket binding for TLS JDBC
    jcliff::socket_binding { 'teiid-jdbc':
    #redhat #rhsummit
    52
    52

    View Slide

  53. VIRTUAL DATABASES
    VIRTUAL DATABASES
    DEPLOY VDB
    DEPLOY VDB
    #redhat #rhsummit
    # Retrieve VDB from staging area
    exec { 'get-vdb':
    command => "wget ..."
    }
    # deploy VDB
    53
    53

    View Slide

  54. VIRTUAL DATABASES
    VIRTUAL DATABASES
    DEPLOY VDB
    DEPLOY VDB
    #redhat #rhsummit
    { "deployments" => {
    "my.vdb" => {
    # Retrieve VDB from staging area
    exec { 'get-vdb':
    command => "wget ..."
    }
    # deploy VDB
    54
    54

    View Slide

  55. CONCLUSION
    CONCLUSION
    #redhat #rhsummit
    55
    55

    View Slide

  56. VALUE
    VALUE
    IS IT WORKING OUT?
    IS IT WORKING OUT?
    #redhat #rhsummit
    CURRENT DEPLOYMENTS
    CURRENT DEPLOYMENTS
    MORE IN THE FUTURE!
    MORE IN THE FUTURE!
    Marketing VDB
    Data Scientists VDB
    56
    56

    View Slide

  57. USEFUL NEW FEATURES
    USEFUL NEW FEATURES
    NEW THINGS FROM THE PRODUCT!
    NEW THINGS FROM THE PRODUCT!
    #redhat #rhsummit
    NEW AND EXCITING
    NEW AND EXCITING
    Dynamic VDB
    Unified RBAC
    Redshift Translator
    57
    57

    View Slide

  58. CONCLUSION
    CONCLUSION
    PARTING THOUGHTS
    PARTING THOUGHTS
    Opportunity Plan Execute
    #redhat #rhsummit
    58
    58

    View Slide

  59. CONCLUSION
    CONCLUSION
    PARTING THOUGHTS
    PARTING THOUGHTS
    Fast integration.
    Heterogeneous sources.
    Decoupling.
    #redhat #rhsummit
    Plan Execute
    59
    59

    View Slide

  60. CONCLUSION
    CONCLUSION
    PARTING THOUGHTS
    PARTING THOUGHTS
    Centralize business logic.
    Centralize security.
    Flexible architecture.
    Fast integration.
    Heterogeneous sources.
    Decoupling.
    #redhat #rhsummit
    Execute
    60
    60

    View Slide

  61. Repeatable.
    Automated.
    Supported.
    CONCLUSION
    CONCLUSION
    PARTING THOUGHTS
    PARTING THOUGHTS
    Centralize business logic.
    Centralize security.
    Flexible architecture.
    Fast integration.
    Heterogeneous sources.
    Decoupling.
    #redhat #rhsummit
    61
    61

    View Slide

  62. #redhat #rhsummit
    QUESTIONS?
    QUESTIONS?
    62
    62

    View Slide

  63. LEARN. NETWORK.
    LEARN. NETWORK.
    EXPERIENCE OPEN SOURCE.
    EXPERIENCE OPEN SOURCE.
    #redhat #rhsummit
    63
    63

    View Slide

  64. APPENDIX
    APPENDIX
    ADDITIONAL MATERIAL
    ADDITIONAL MATERIAL
    #redhat #rhsummit
    64
    64

    View Slide

  65. RESOURCES
    RESOURCES
    PRODUCTS
    PRODUCTS
    #redhat #rhsummit
    Red Hat JBoss Data Virtualization
    https://www.redhat.com/en/technologies/jboss-middleware/data-virtualization
    Red Rat CloudForms
    https://www.redhat.com/en/technologies/cloud-computing/cloudforms
    Ansible
    https://www.ansible.com/
    65
    65

    View Slide

  66. RESOURCES
    RESOURCES
    TOOLS
    TOOLS
    #redhat #rhsummit
    JBoss Data Virtualization Sizing Architecture Tool
    https://access.redhat.com/labs/jbossdvsat/
    jcliff
    https://github.com/bserdar/jcliff
    puppet-jcliff
    https://github.com/bserdar/puppet-jcliff
    66
    66

    View Slide