Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Databricks

An introduction to Databricks

A introduction to Databricks, what is it and
how does it work ? What can it do ?

Mike Frampton

June 16, 2015
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Databricks • What is Databricks ? • Cloud services used

    • Functionality • Languages • Spark Usage • 3rd Party Apps • Architecture • Books www.semtech-solutions.co.nz [email protected]
  2. Databricks – What is it ? • A Cloud based

    Apache Spark cluster service • Offers scalable Spark clusters based on AWS • Developed by the same people who created Spark • Multiple cluster management • Job scheduling and library import • Offers access to all Spark modules www.semtech-solutions.co.nz [email protected]
  3. Databricks – Cloud Services • Currently uses Amazon AWS •

    Uses EC2 and has access to S3 buckets • Uses a minimum of 2 EC2 instances • Attempts to optimise EC2 usage • Plans to extend to other cloud providers www.semtech-solutions.co.nz [email protected]
  4. Databricks – Functionality • Architecture based on Notebooks and folders

    • Has a cluster manager for – Defined (min 54gb) clusters – Spot clusters – On Demand clusters • Has a job manager and scheduler • Has user management • Has full Spark functionality • Has strong data visualisation capability • Can export reports and dashboards www.semtech-solutions.co.nz [email protected]
  5. Databricks – Languages • Can have Notebooks in – Scala

    – Python – SQL • SQL can be executed in non SQL Notebooks • Markdown comments can be placed in Notebooks • Notebooks can be shared by multiple sessions • Libraries can be imported and called in Notebooks www.semtech-solutions.co.nz [email protected]
  6. Databricks – Spark Usage • Lastest Spark version available –

    i.e. DB 1.3.4 uses Spark 1.3.1 at June 2015 • All Spark modules available – SQL, GraphX, MlLib, Streaming • Strong integration between modules and visualisation • Extensive use of tables to import data • Tables available via SQL www.semtech-solutions.co.nz [email protected]
  7. Databricks – 3rd Party Apps • Current available and more

    to come – Pentaho – Qlik – Tableau – TIBC Jaspersoft – PanTera – ZoomData www.semtech-solutions.co.nz [email protected]
  8. Available Books • See our Hadoop book from Apress /

    Springer – “Big Data Made Easy” • Look out for our Apache Spark based book – from Packt in 2015 www.semtech-solutions.co.nz [email protected]
  9. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems