How to store your data?
• Files in a distributed file system
• Rows in NoSQL Table
• Index in Search Engine
Slide 25
Slide 25 text
Process Data
Slide 26
Slide 26 text
Data Processing
• Transform the data
• Enrich the data
• Examples:
• Store data in multiple formats
• Aggregate data
• Build Recommendations
• ….
Slide 27
Slide 27 text
MapReduce Processing Model
• Define mappers
• Shuffling is automatic
• Define reducers
• For complex work, chain jobs together
– Use a higher level language or DSL that does this for you
Slide 28
Slide 28 text
Apache Spark: Fast Big Data
– Rich APIs in Java,
Scala, Python
– Interactive shell
• Fast to Run
– General execution
graphs
– In-memory storage
Files
HBase
Hive
SQL on Hadoop
• SQL Shell
• JDBC ODBC
• BI Tools
• Reporting
Slide 36
Slide 36 text
Elasticsearch
Slide 37
Slide 37 text
Kibana as a frontend
Slide 38
Slide 38 text
Example: Recommendation Platform
Slide 39
Slide 39 text
Machine Learning
MapR Cluster
HBase
MapR DB
MapR-FS
Add recommendations
to movies
Capture Ratings
Movies & Recommendations
Movie Database
Slide 40
Slide 40 text
Conclusion
• If possible use Streams: Kafka, Logstash
• Advanced Data Processing and Machine Learning : Spark
• Expose your data using SQL for your “BI folks” : Drill
• Aggregation and Full Text Search : Elasticsearch
• Data Visualisation : Kibana