Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Uma introdução ao Red Sqirl

Uma introdução ao Red Sqirl

Apresentação dada pelo Igor Souza, Arquiteto de Software na Red Sqirl no 5º Belo Horizonte Data Science Meetup (http://www.meetup.com/Belo-Horizonte-Data-Science-Meetup).

Red Sqirl is a web-based big data application that simplifies the analysis of large data sets. With Red Sqirl, you can quickly and cost-effectively access the power of the Hadoop eco-system, enhancing the productivity of data scientists and analysts. If you want to analyse large data sets in the most efficient way possible, Red Sqirl is the answer.

Other Decks in Technology

Transcript

  1. © Red Sqirl Analytics 2 Diversity is Power All these

    tools have their use cases and can/should be combined.
  2. Confidential. © Red Sqirl Analytics 3 What is Red Sqirl?

    • A drag-and-drop analytics framework for Hadoop • Makes advanced analytics workflows easy to run, modify, save, share. • Highly-skilled data scientists can create models / workflows and analysts can modify and execute
  3. Confidential. © Red Sqirl Analytics We want to make what

    is complex…. public void map(LongWritable key, Text value, OutputCollector<Text, Text> outputCollector, Reporter reporter) throws IOException { String dataRow = value.toString(); // since these are tab seperated files lets tokenize on tab StringTokenizer dataTokenizer = new StringTokenizer(dataRow, "\t"); String articleName = dataTokenizer.nextToken(); String pointType = dataTokenizer.nextToken(); String geoPoint = dataTokenizer.nextToken(); // we know that this data row is a GEO RSS type point. if (GEO_RSS_URI.equals(pointType)) { // now we process the GEO point data. StringTokenizer st = new StringTokenizer(geoPoint, " "); String strLat = st.nextToken(); String strLong = st.nextToken(); double lat = Double.parseDouble(strLat); double lang = Double.parseDouble(strLong); long roundedLat = Math.round(lat); long roundedLong = Math.round(lang); String locationKey = "(" + String.valueOf(roundedLat) + "," + String.valueOf(roundedLong) + ")"; String locationName = URLDecoder.decode(articleName, "UTF-8"); locationName = locationName.replace("_", " "); geoLocationKey.set(locationKey); geoLocationName.set(locationName); outputCollector.collect(geoLocationKey, geoLocationName); } } An example of Hadoop MapReduce 5
  4. 6 Confidential. © Red Sqirl Analytics This represents a data

    file e.g. a list of customers This represents an action – e.g. exclude customers with a spend < €20 This represents a K-Means clustering algorithm. This is the type of algorithm one might use to segment their customer base. ...easy
  5. Confidential. © Red Sqirl Analytics 8 Red Sqirl also allows

    for complex predictive modelling, encapsulating complexity in super-actions. ...even for the most complex tasks
  6. Confidential. © Red Sqirl Analytics 9 Sharing via the Analytics

    Store • Private or public sharing of Red Sqirl models or Red Sqirl packages • Dependency management and version control
  7. © Red Sqirl Analytics 10 Red Sqirl Road Map 1.

    Parallelization of a workflow 2. Online repository 3. ETL Using standard technology (Pig, Hive) 4. Basic modeling functionalities 5. SQL Database Import/Export 6. Visualisation 7. Scheduling 8. Near real time queries 9. Streaming
  8. Future Versions •Fast Data o Sentiment Analysis is the process

    of determining whether a piece of writing is positive, negative or neutral
  9. © Red Sqirl Analytics 12 Red Sqirl Stage 1. MVP

    Development 2. Sanity tests 3. Trials 4. Stable Release
  10. Confidential. © Red Sqirl Analytics 13 Who we are: Idiro

    Analytics Since 2004 we have analysed the data of over 12% of the world’s population Pioneers in Big Data