Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a heterogeneous Hadoop Olap system wit...

Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/building-a-heterogeneous-hadoop-olap-system-with-microsoft-bi-stack/pablo-doval-and-ibon-landa

Big Data Spain

November 16, 2012
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. WHO… … AM I? • SQL/BI Team Lead at Plain

    Concepts • e-mail: [email protected] • Blog: http://geek.ms/blogs/palvarez • Twitter: @PabloDoval … ARE YOU? • Quick Poll in the Room 
  2. SCADA Historical Analysis and Reporting Platform Demonstrate the feasibility of

    a custom end to end global architecture: • SCADA: Local, Mobile and Central • Historical Data: High speed and High volume • Reporting • Analysis SHARP Overview
  3. Production Centers Central MAGUS Central MongoDB Capped collections For each

    Production Center 2 months of 1s data 1 year of 10m data MAGUS MongoDB Capped collections 2 months of 1s data 1 year of 10m data MAGUS Local Operation Mobile Operation MAGUS Remote Operation DAT Files Mongo Export Production Center A Production Center B MAGUS MongoDB Capped collections 2 months of 1s data 1 year of 10m data MAGUS Local Operation Mobile Operation SHARP MAGUS
  4. DAT DAT DAT DAT DAT DAT Mongo Export Hadoop DWH

    MAGUS Central Source 1 Loader Source2 Loader Source3 Loader Source4 Loader Source5 Loader MAGUS Source6 Loader DAT Source7 Loader DAT Production Centers Central SHARP Historical Data
  5. DWH Microsoft Office Reporting Services • Dynamic reports • Scheduled

    reports • Automatic Distribution • Multiformat (PDF, XLS, etc.) OLAP Tabular OLAP Tabular Power View Power Pivot Future ¿Cloud? StreamInsight Events Production Centers Central SHARP Analysis and Reporting
  6. CURRENT SHARP IMPLEMENTATION DWH Hadoop HDFS HIVE Map Reduce SSIS

    Load Service Azure Storage SSRS PowerView
  7. DWH Microsoft Office Reporting Services • Dynamic reports • Scheduled

    reports • Automatic Distribution • Multiformat (PDF, XLS, etc.) OLAP Tabular OLAP Tabular Power View Power Pivot Future ¿Cloud? StreamInsight Events Production Centers Central COMPLEX EVENT PROCESSING StreamInsight
  8. IMPROV. TO HIGHER RESOLUTION DATA The Goal Ability to work

    with data in DW and Hive seamlessly and in a performant way. Export
  9. IMPROV. TO HIGHER RESOLUTION DATA Sqoop with PDW… Sqoop Map/

    Reduce Job SQL Server SQL Server SQL Server … SQL Server
  10. IMPROV. TO HIGHER RESOLUTION DATA Sqoop refresher… SQL Server SQL

    Server SQL Server … SQL Server Hadoop Cluster Sqoop
  11. IMPROV. TO HIGHER RESOLUTION DATA The Goal – Polybase! Ability

    to work with data in DW and Hive seamlessly and in a performant way. SQL HDF SQL Server (PDW) T-SQL Queries
  12. IMPROV. TO HIGHER RESOLUTION DATA Polybase parallelism via DMS SQL

    Server SQL Server SQL Server … SQL Server Hadoop Cluster
  13. IMPROV. TO HIGHER RESOLUTION DATA That’s just the beginning… Uses

    the same T-SQL Syntax to query both worlds at the same time The QO is able to check what data to push into what environment to process optimally.
  14. STORIES WE COULD TELL What went right… Cloud Environment Tabular

    Model for OLAP SSIS for ETL via ODBC Hive Driver
  15. STORIES WE COULD TELL What was not so good… Mappers

    and Reducers in C# via Hadoop Streaming
  16. LEARN MORE 1. Microsoft Big Data Solution: www.microsoft.com/bigdata 2. Windows

    Azure: www.windowsazure.com/en- us/home/scenarios/big-data TRY NOW 1. Preview of the Windows Azure HDInsight Service: https://www.hadooponazure.com 2. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata CALL TO ACTION