Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Building a heterogeneous Hadoop Olap system with Microsoft BI stack. PABLO DOVAL & IBON LANDA at Big Data Spain 2012

Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/building-a-heterogeneous-hadoop-olap-system-with-microsoft-bi-stack/pablo-doval-and-ibon-landa

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

November 16, 2012
Tweet

Transcript

  1. BUILDING AN HETEROGENEOUS HADOOP/OLAP SYSTEM WITH MICROSOFT'S BI STACK

  2. WHO… … AM I? • SQL/BI Team Lead at Plain

    Concepts • e-mail: pablod@plainconcepts.com • Blog: http://geek.ms/blogs/palvarez • Twitter: @PabloDoval … ARE YOU? • Quick Poll in the Room 
  3. WHAT… … ARE WE GOING TO SEE? … I’M NOT

    GOING TO SHOW?
  4. None
  5. SOME PICS…

  6. SCADA Historical Analysis and Reporting Platform Demonstrate the feasibility of

    a custom end to end global architecture: • SCADA: Local, Mobile and Central • Historical Data: High speed and High volume • Reporting • Analysis SHARP Overview
  7. Production Centers Central MAGUS Central MongoDB Capped collections For each

    Production Center 2 months of 1s data 1 year of 10m data MAGUS MongoDB Capped collections 2 months of 1s data 1 year of 10m data MAGUS Local Operation Mobile Operation MAGUS Remote Operation DAT Files Mongo Export Production Center A Production Center B MAGUS MongoDB Capped collections 2 months of 1s data 1 year of 10m data MAGUS Local Operation Mobile Operation SHARP MAGUS
  8. DAT DAT DAT DAT DAT DAT Mongo Export Hadoop DWH

    MAGUS Central Source 1 Loader Source2 Loader Source3 Loader Source4 Loader Source5 Loader MAGUS Source6 Loader DAT Source7 Loader DAT Production Centers Central SHARP Historical Data
  9. DWH Microsoft Office Reporting Services • Dynamic reports • Scheduled

    reports • Automatic Distribution • Multiformat (PDF, XLS, etc.) OLAP Tabular OLAP Tabular Power View Power Pivot Future ¿Cloud? StreamInsight Events Production Centers Central SHARP Analysis and Reporting
  10. INITIAL ASSESMENT Proof of Concept Microsoft Ecosystem On Premise Infrastructure

  11. PowerPivot Power View TOOLS OF THE TRADE

  12. None
  13. SO… WHAT DOES IT LOOK LIKE?

  14. CURRENT SHARP IMPLEMENTATION DWH Hadoop HDFS HIVE Map Reduce SSIS

    Load Service Azure Storage SSRS PowerView
  15. LET’S TAKE A DEEPER LOOK…

  16. FUTURE IMPROVEMENTS New Analytical Processes CEP Integration with Stream Insight

    Improvements on the Higher Resolution data
  17. DWH Microsoft Office Reporting Services • Dynamic reports • Scheduled

    reports • Automatic Distribution • Multiformat (PDF, XLS, etc.) OLAP Tabular OLAP Tabular Power View Power Pivot Future ¿Cloud? StreamInsight Events Production Centers Central COMPLEX EVENT PROCESSING StreamInsight
  18. StreamInsight Events Production Centers Central COMPLEX EVENT PROCESSING StreamInsight

  19. IMPROV. TO HIGHER RESOLUTION DATA The Goal Ability to work

    with data in DW and Hive seamlessly and in a performant way. Export
  20. IMPROV. TO HIGHER RESOLUTION DATA Sqoop Refresher

  21. IMPROV. TO HIGHER RESOLUTION DATA Sqoop with PDW… Sqoop Map/

    Reduce Job SQL Server SQL Server SQL Server … SQL Server
  22. IMPROV. TO HIGHER RESOLUTION DATA Sqoop refresher… SQL Server SQL

    Server SQL Server … SQL Server Hadoop Cluster Sqoop
  23. IMPROV. TO HIGHER RESOLUTION DATA The Goal – Polybase! Ability

    to work with data in DW and Hive seamlessly and in a performant way. SQL HDF SQL Server (PDW) T-SQL Queries
  24. IMPROV. TO HIGHER RESOLUTION DATA Polybase parallelism via DMS SQL

    Server SQL Server SQL Server … SQL Server Hadoop Cluster
  25. IMPROV. TO HIGHER RESOLUTION DATA Parallelism

  26. IMPROV. TO HIGHER RESOLUTION DATA That’s just the beginning… Uses

    the same T-SQL Syntax to query both worlds at the same time The QO is able to check what data to push into what environment to process optimally.
  27. STORIES WE COULD TELL What went right… Cloud Environment Tabular

    Model for OLAP SSIS for ETL via ODBC Hive Driver
  28. STORIES WE COULD TELL What was not so good… Mappers

    and Reducers in C# via Hadoop Streaming
  29. LEARN MORE 1. Microsoft Big Data Solution: www.microsoft.com/bigdata 2. Windows

    Azure: www.windowsazure.com/en- us/home/scenarios/big-data TRY NOW 1. Preview of the Windows Azure HDInsight Service: https://www.hadooponazure.com 2. Developer CTP of Microsoft HDInsight Server for Windows Server: http://www.microsoft.com/bigdata CALL TO ACTION
  30. None