Analytics to the masses by JOSE LUIS LÓPEZ at Big Data Spain 2014

BIG DATA ANALYTICS TO THE MASSES JOSE LUIS LÓPEZ PINO
DATA ENGINEER GETYOURGUIDE

Big Data Analytics to the masses Why it has failed
and how we can fix it Jose Luis Lopez Pino

Who am I? BI Consultant Large-Scale & Distributed Founding Data
Engineer

Big Data is like Tourism But if you aren’t an
expert, you can’t make the most of it It seems easy to do

Struggle to analyze Big Data Harlan Harris, Sean Murphy, and
Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media, Inc., 2013 Also: Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Enterprise data analysis and visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions

Tools Volker Markl. Breaking the chains: On declarative data analysis
and data independence in the big data era. Proceedings of the VLDB Endowment, 7(13), 2014

Tools (October 2014) Original: Volker Markl. Breaking the chains: On
declarative data analysis and data independence in the big data era. Proceedings of the VLDB Endowment, 7(13), 2014

Deep analytics

Libraries! We need libraries... Query languages Write your own MR/RDD/Transformations

… comprehensive ones!

Say it with memes! When you do Deep analytics in
small data using R and CRAN packages When you do deep analytics in BIG data using R and CRAN packages

When you try to program it using MapReduce When you
try to program it using Apache Spark / Apache Flink When you try to use a library scalable to large data sets

Can’t we do it better? - Make it similar to
normal R programs. - Hide complexity. - Make file manipulation easier. - Part of the computing in the cluster and part of the computer in the client.

Our approach

Behind the scenes: Before

Behind the scenes: After

Without writing significantly different code

Competitive or even faster than R native code in small
data

And it scales

Some relevant findings - Transmission time was not significant. -
Stratosphere/Flink was competitive in highly iterative programs. - We were not able to do it keeping the code 100% the same. - Ensemble scenarios are the most exciting ones.

4 Takeaways from this talk - We still need to
bring Big Data to the right people in the right place. - We need comprehensive libraries. - We need to move data back and forth. - Use a syntax that the users are familiar with.

That’s all! - Have you found this talk interesting? -
Follow me: @jllopezpino - Interested in a job as SEM Data Analyst (Berlin)? - Ask me for the details: - Are you interested in Data + Energy? - Keep in touch:

17TH ~ 18th NOV 2014 MADRID (SPAIN)

Analytics to the masses by JOSE LUIS LÓPEZ at ...

Analytics to the masses by JOSE LUIS LÓPEZ at Big Data Spain 2014

Big Data Spain

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript

BIG DATA ANALYTICS TO THE MASSES JOSE LUIS LÓPEZ PINO

Big Data Analytics to the masses Why it has failed

Who am I? BI Consultant Large-Scale & Distributed Founding Data

Big Data is like Tourism But if you aren’t an

Struggle to analyze Big Data Harlan Harris, Sean Murphy, and

Tools Volker Markl. Breaking the chains: On declarative data analysis

Tools (October 2014) Original: Volker Markl. Breaking the chains: On

Deep analytics

Libraries! We need libraries... Query languages Write your own MR/RDD/Transformations

… comprehensive ones!

Say it with memes! When you do Deep analytics in

When you try to program it using MapReduce When you

Can’t we do it better? - Make it similar to

Our approach

Our approach

Behind the scenes: Before

Behind the scenes: After

Without writing significantly different code

Competitive or even faster than R native code in small

And it scales

Some relevant findings - Transmission time was not significant. -

4 Takeaways from this talk - We still need to

That’s all! - Have you found this talk interesting? -

17TH ~ 18th NOV 2014 MADRID (SPAIN)