Slide 1

Slide 1 text

Confidential. Copyright © Idiro Analytics, all rights reserved. 1 Igor Souza January 2015 AN INTRODUCTION TO

Slide 2

Slide 2 text

© Red Sqirl Analytics 2 Diversity is Power All these tools have their use cases and can/should be combined.

Slide 3

Slide 3 text

Confidential. © Red Sqirl Analytics 3 What is Red Sqirl? • A drag-and-drop analytics framework for Hadoop • Makes advanced analytics workflows easy to run, modify, save, share. • Highly-skilled data scientists can create models / workflows and analysts can modify and execute

Slide 4

Slide 4 text

© Red Sqirl Analytics 4 What we want to solve

Slide 5

Slide 5 text

Confidential. © Red Sqirl Analytics We want to make what is complex…. public void map(LongWritable key, Text value, OutputCollector outputCollector, Reporter reporter) throws IOException { String dataRow = value.toString(); // since these are tab seperated files lets tokenize on tab StringTokenizer dataTokenizer = new StringTokenizer(dataRow, "\t"); String articleName = dataTokenizer.nextToken(); String pointType = dataTokenizer.nextToken(); String geoPoint = dataTokenizer.nextToken(); // we know that this data row is a GEO RSS type point. if (GEO_RSS_URI.equals(pointType)) { // now we process the GEO point data. StringTokenizer st = new StringTokenizer(geoPoint, " "); String strLat = st.nextToken(); String strLong = st.nextToken(); double lat = Double.parseDouble(strLat); double lang = Double.parseDouble(strLong); long roundedLat = Math.round(lat); long roundedLong = Math.round(lang); String locationKey = "(" + String.valueOf(roundedLat) + "," + String.valueOf(roundedLong) + ")"; String locationName = URLDecoder.decode(articleName, "UTF-8"); locationName = locationName.replace("_", " "); geoLocationKey.set(locationKey); geoLocationName.set(locationName); outputCollector.collect(geoLocationKey, geoLocationName); } } An example of Hadoop MapReduce 5

Slide 6

Slide 6 text

6 Confidential. © Red Sqirl Analytics This represents a data file e.g. a list of customers This represents an action – e.g. exclude customers with a spend < €20 This represents a K-Means clustering algorithm. This is the type of algorithm one might use to segment their customer base. ...easy

Slide 7

Slide 7 text

Demo in Action

Slide 8

Slide 8 text

Confidential. © Red Sqirl Analytics 8 Red Sqirl also allows for complex predictive modelling, encapsulating complexity in super-actions. ...even for the most complex tasks

Slide 9

Slide 9 text

Confidential. © Red Sqirl Analytics 9 Sharing via the Analytics Store • Private or public sharing of Red Sqirl models or Red Sqirl packages • Dependency management and version control

Slide 10

Slide 10 text

© Red Sqirl Analytics 10 Red Sqirl Road Map 1. Parallelization of a workflow 2. Online repository 3. ETL Using standard technology (Pig, Hive) 4. Basic modeling functionalities 5. SQL Database Import/Export 6. Visualisation 7. Scheduling 8. Near real time queries 9. Streaming

Slide 11

Slide 11 text

Future Versions ●Fast Data o Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral

Slide 12

Slide 12 text

© Red Sqirl Analytics 12 Red Sqirl Stage 1. MVP Development 2. Sanity tests 3. Trials 4. Stable Release

Slide 13

Slide 13 text

Confidential. © Red Sqirl Analytics 13 Who we are: Idiro Analytics Since 2004 we have analysed the data of over 12% of the world’s population Pioneers in Big Data

Slide 14

Slide 14 text

Confidential. Copyright © Idiro Analytics, all rights reserved. 14 Thank you Igor Souza igor.souza@idiro.com +353 087 216 1413 @igfasouza