Slide 98
Slide 98 text
Hadoop Technologies
Copyright © William El Kaim 2016
98
NoSQL Databases:
Cassandra, Ceph,
DynamoDB, Hbase,
Hive, Impala, Ring,
OpenStack Swift, etc.
Data Lake
Data Preparation
Data Sourcing
Data Science: Dataiku,
Datameer, Tamr, R, SaS,
Python, RapidMiner, etc.
Data Ingestion
Data Sources
BI Tools & Platforms
App. Services
Cascading, Crunch, Hfactory,
Hunk, Spring for Hadoop,
D3.js, Leaflet
Feature Preparation
Ingestion Technologies:
Apex, Flink, Flume,
Kafka, Amazon Kinesis,
Nifi, Samza, Spark,
Sqoop, Scribe, Storm,
NFS Gateway, etc.
Distributed File
System: GlusterFS,
HDFS, Amazon S3,
MapRFS, ElasticSearch
Batch
Streaming
Encoding Format: JSON,
Rcfile, Parquet, ORCfile
Map Reduce
Event Stream & Micro Batch
Open
Data
Operational
Systems
(ODS, IoT)
Existing Sources
of Data
(Databases,
DW, DataMart)
Distributions: Cloudera,
HortonWorks, MapR,
SyncFusion, Amazon
EMR, Azure HDInsight,
Altiscale, Pachyderm,
Qubole, etc.
Data Warehouse
Lakeshore & Analytics
Qlik, Tableau, Tibco, Jethro,
Looker, IBM, SAP, BIME, etc.
Cassandra, Druid, DynamoDB,
MongoDB, Redshift, Google
BigQuery, etc.
Machine Learning:
BigML, Mahout,
Predicsys, Azure ML,
TensorFlow, H2O, etc.
Analytics App and Services
Copyright © William El Kaim 2016