What is Spark? fast and general engine for large-scale data processing Write applications quickly in Java, Scala, Python, R Combine SQL, streaming, and complex analytics. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
What is Kafka? distributed publish-subscribe system. developed by the Apache So ware Foundation. originally developed by LinkedIn. written in Scala. low-latency platform for handling real-time data feeds
Apache Flink Apache Top Level Project. open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.
Apache Flink API 1. DataStream API for unbounded streams embedded in Java and Scala, and 2. DataSet API for static data embedded in Java, Scala, and Python, 3. Table API with a SQL-like expression language embedded in Java and Scala.
SQL-on-Hadoop Control on Hadoop by SQL SQL-on-Hadoop has become a “must have” in the Hadoop toolkit and has entered the mainstream. Within the big data landscape there are multiple approaches to accessing, analyzing, and manipulating data in Hadoop. ANSI SQL completeness (and the ability to tolerate machine-generated SQL), developer and analyst skillsets, and architecture tradeoffs
What is Data Lake? large-scale storage repository. data lake provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs Include Unstructured data and structured data.
Data Lake level Data Swamps Raw Data Can’t find or use data Data Puddles Low variety of data and low adoption Strong technical skill set requirement Data Lake Right Interface
Artificial Intelligence That Imitate the human feelings and perceptions machine to let you learn the data audio and image and text to guess automatically. It is introduced with Deep Learning.
Microso Tay is funny Designed to Imitate the informal youth conversation. On Twitter reiterated an incendiary and racist remarks.. Sprinkle vomit rant like drug taking, getting in trouble. http://gigazine.net/news/20160329-reason-microso -tay- crazy/
What is Workflow Engine? so ware application that manages business processes. manages and monitors the state of activities in a workflow. can execute any arbitrary sequence of steps.
summary Data Lake are spoken well in the United States. SQL-on-Hadoop has become a must have. AI and Deep Learning going well. Kafka, Spark, Flink going well.