Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Most Intriguing Data Scientist in the Universe.

The Most Intriguing Data Scientist in the Universe.

The most intriguing data scientist in the universe just endorsed this kickstarter project which is a complete, practical, hands-on course which comprises of step-by-step video tutorials, a book, practice exercises and downloadable code samples covering everything you need to know about aggregating, processing, searching and visualizing log data generated at high volume and high velocity using only open source software.

Course Contents

# Installing Java and Configuring Environment Variables
# Introduction to Apache Flume Architecture and Inner Workings
# Introduction to Logstash Architecture and Inner Workings
# Downloading, Installing and Configuring Apache Flume
# Downloading, Installing and Configuring Logstash
# Overview of Different Forms of Capturing Application Log Data
# Parsing Raw Data to Extract Metadata and Useful Information
# Strategies and Techniques for Buffering Log Events before Storage
# Using Elasticsearch and HDFS as Centralized Datastores
# Installing and Configuring ElasticSearch
# Setting up Hadoop and Configuring HDFS
# Installing Kibana as a User Interface to the ElasticSearch Indicies
# Moving Log Data into ElasticSearch and HDFS
# Introduction to Hadoop, HDFS and MapReduce
# Using Hadoop to Process the Log data in HDFS with MapReduce jobs
# Introduction to Lucene Query Syntax
# Quering ElasticSearch to Retrieve Information
# Using D3.js to Visualize ElasticSearch Query Results
# Using D3.js to Visualize Results from MapReduce jobs
# Setting up Dashboards in Kibana to Visualize Log Events in Realtime.
# Brining It All Together with Sample Projects
# Design Patterns, Best Practices, Tips and Strategies for Scaling Log data Aggregation, Processing, Searching and Visualization

http://www.kickstarter.com/projects/1368497725/massive-log-data-aggregation-processing-and-visual

Israel Ekpo

July 23, 2013
Tweet

More Decks by Israel Ekpo

Other Decks in Technology

Transcript

  1. He  is  the  first  man  ever  to  sort   1024

     Yo:abytes  of  data  in  just   under  60  seconds.  
  2. He  is  known  to  successfully   process  Exabytes  of  data

     by   just  recalling  the  MAC   addresses  of  any  device  that   has  ever  come  in  contact  with   the  data.  
  3. He  is  so  intriguing  and   capJvaJng,  even  technical  

    recruiters  plead  with  him  to   recruit  them.  
  4. When  he  sneezes,  cell  towers   appear,  dead  zones  disappear

      and  network  signals  strength   improves  by  at  least  128%   within  a  64-­‐mile  radius  of  his   presence.  
  5. He  is  so  electrifying;  if  he  were   to  press

     the  power  bu:on  on   any  device,  it  will  come  on   with  or  without  any  visible   source  of  electricity.  
  6. His  skin  texture  is  so   intriguing  even  massage  

    therapists  aspire  to  massage   him  just  to  relieve  their  own   stress.  
  7. His  goatee  has  encountered   more  data  molecules  and  

    caffeine  from  energy  drinks   than  any  other  data  scienJst   would  experience  in  their   enJre  career.  
  8. When  he  gives  you   instrucJons  on  how  to  

    implement  an  algorithm  …  
  9. You  never  get  confused  and  in   fact,  you  will

     proceed  to  create   soTware  that  is  bug  free  and   even  deliver  at  least  4  days   ahead  of  schedule.    
  10. Bugs  are  so  inJmidated  by   his  presence;  when  he

      reviews  your  code  they   disappear.  
  11. He  does  not  need  any  input   device  such  as

     a  keyboard  or   mouse  to  use  his  compuJng   devices.    
  12. He  sends  raw  data  directly   from  his  mind  to

     the  CPU   and  the  computer   understands  him.  
  13. His  personal  trainers  and   doctors  tell  him  that  his

      body  weight  and  heart  rate     increase  or  decrease  only  by   powers  of  10    
  14. He  eats  extra  spicy  Pad  Thai   for  breakfast  and

     usually   drinks  Tabasco  sauce  with   Wasabi  to  cleanse  his   pale:es.  
  15. We  rarely  drives  himself,  but   when  he  does,  he

     only   drives  on  roads  where  the   speed  limits  are  prime   numbers.  
  16. When  he  travels,  he  only   uses  ATMs  that  dispenses

      dollar  bills  in  mulJples  of   Infinity.  
  17. My  friends,  there  are   elements  of  truth  in  

    everything  you  have  just   read  about  me.    
  18. The  amount  of  data  generated   during  the  proof  will

     make  the   total  combined  disc  space  at  all  the   data  centers  used  by  Facebook,   Twi:er,  Amazon  and  Google  look   like  mere  floppy  diske:es.  
  19. However,  in  my  quest  to   analyze  a  subset  of

     acJve   crowd-­‐funded  projects  in  the   universe  …  
  20. So,  I  invite  you  to  join  me  in   backing

     the  most  intriguing   Kickstarter  project  in  the   universe.      
  21. Just  in  case  you  are  wondering   who  the  most

     intriguing  data   scien5st  in  the  universe  is.  
  22. A  Documentary  by  Israel  Ekpo     About    

    The  Most  Intriguing     Data  ScienJst  in  the  Universe