Apache Spark cluster service • Offers scalable Spark clusters based on AWS • Developed by the same people who created Spark • Multiple cluster management • Job scheduling and library import • Offers access to all Spark modules www.semtech-solutions.co.nz [email protected]
Uses EC2 and has access to S3 buckets • Uses a minimum of 2 EC2 instances • Attempts to optimise EC2 usage • Plans to extend to other cloud providers www.semtech-solutions.co.nz [email protected]
• Has a cluster manager for – Defined (min 54gb) clusters – Spot clusters – On Demand clusters • Has a job manager and scheduler • Has user management • Has full Spark functionality • Has strong data visualisation capability • Can export reports and dashboards www.semtech-solutions.co.nz [email protected]
– Python – SQL • SQL can be executed in non SQL Notebooks • Markdown comments can be placed in Notebooks • Notebooks can be shared by multiple sessions • Libraries can be imported and called in Notebooks www.semtech-solutions.co.nz [email protected]
i.e. DB 1.3.4 uses Spark 1.3.1 at June 2015 • All Spark modules available – SQL, GraphX, MlLib, Streaming • Strong integration between modules and visualisation • Extensive use of tables to import data • Tables available via SQL www.semtech-solutions.co.nz [email protected]
www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems