Elasticsearch, Hadoop, and Friends: Spark, Storm, and More

Elasticsearch, Hadoop & Friends: Spark, Storm and more Costin Leau,
@costinl

{ } CC-BY-ND 4.0 How to count words At scale;
In real-time!

{ } CC-BY-ND 4.0

{ } CC-BY-ND 4.0 Hadoop

{ } CC-BY-ND 4.0 Hadoop 0.20.x/1.x Compute Hadoop Distributed File
System (HDFS) Map / Reduce Framework Storage Machine Machine Machine Machine Machine

{ } CC-BY-ND 4.0 Map / Reduce overview

{ } CC-BY-ND 4.0 Hadoop 0.20.x/1.x Hadoop Distributed File System
(HDFS) Map / Reduce Framework

{ } CC-BY-ND 4.0 Hadoop 2.x / NextGen Compute Hadoop
Distributed File System (HDFS) Map / Reduce Framework Storage Machine Machine Machine Machine Machine YetAnotherResourceNegociator (YARN) Resource Mgmt. Other

{ } CC-BY-ND 4.0 Hadoop 2.x / NextGen Hadoop Distributed
File System (HDFS) YARN Map / Reduce Other

{ } CC-BY-ND 4.0 Elasticsearch Hadoop

{ } CC-BY-ND 4.0 Elasticsearch for Apache Hadoop™

{ } CC-BY-ND 4.0 Certified to work

{ } CC-BY-ND 4.0 Compute

{ } CC-BY-ND 4.0 Partition-to-partition architecture Node1
2P 1R Node2 1P 3R Node3 2R 3P

{ } CC-BY-ND 4.0 Dynamic runtime matching Node1

{ } CC-BY-ND 4.0 Failure handling Node1

{ } CC-BY-ND 4.0 Co-location Node1

{ } CC-BY-ND 4.0 Native integration - Map / Reduce
JobConf conf = new JobConf(); conf.setInputFormat(EsInputFormat.class); conf.set("es.resource", "radio/artists"); conf.set("es.query", "?q=me*"); JobClient.runJob(conf); JobConf conf = new JobConf(); conf.setOutputFormat(EsOutputFormat.class); conf.set("es.resource", "radio/artists"); JobClient.runJob(conf);

{ } CC-BY-ND 4.0 Native integration - Cascading Tap in
= new EsTap("radio/artists","?q=me*"); Tap out = new StdOut(new TextLine()); new LocalFlowConnector(). connect(in, out, new Pipe(“pipe")).complete(); JobClient.runJob(conf); Tap in = Lfs(new TextDelimited( new Fields("id", "name", "url", "picture")), "artists.dat"); Tap out = new EsTap("radio/artists", new Fields("name", "url", "picture")); new HadoopFlowConnector(). connect(in, out, new Pipe(“pipe")).complete();

{ } CC-BY-ND 4.0 Native integration - Apache Pig A
= LOAD 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?q=me*'); DUMP A; A = LOAD 'src/artists.dat' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray); B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links; STORE B INTO 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage();

{ } CC-BY-ND 4.0 Native integration - Apache Hive CREATE
EXTERNAL TABLE artists ( id BIGINT,name STRING, links STRUCT<url:STRING, picture:STRING>) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='radio/artists','es.query'='?q=me*'); SELECT FROM artists; CREATE EXTERNAL TABLE artists ( id BIGINT,name STRING, links STRUCT<url:STRING, picture:STRING>) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='radio/artists'); INSERT OVERWRITE TABLE artists SELECT s.name, named_struct('url', s.url, 'picture', s.pic) FROM source s;

{ } CC-BY-ND 4.0 Native integration - Apache Spark import
org.elasticsearch.spark._ val sc = new SparkContext(new SparkConf()) val rdd = sc.esRDD("radio/artists", "?me*") import org.elasticsearch.spark._ case class Artist(name: String, albums: Int) val u2 = Artist("U2", 12) val bh = Map("name"-‐>"Buckethead","albums" -‐> 95, "age" -‐> 45) sc.makeRDD(Seq(u2, h2)).saveToEs("radio/artists")

{ } CC-BY-ND 4.0 Native integration - Spark SQL
import org.elasticsearch.hadoop.mr._ val conf = new Configuration() conf.set("es.resource", "radio/artists") conf.set("es.query", "?q=me*") val mrNewApiRDD = sc.newAPIHadoopRDD(conf, classOf[EsInputFormat[Text, MapWritable]], classOf[Text], classOf[MapWritable])) val mrOldApiRDD = sc.hadoopRDD(conf, classOf[EsInputFormat[Text, MapWritable]], classOf[Text], classOf[MapWritable]))

{ } CC-BY-ND 4.0 Native integration - Spark SQL val
sql = new SQLContext... val df = sql.load("radio/artists", "org.elasticsearch.spark.sql") df.filter(df("age") > 40) val sql = new SQLContext... val table = sql.sql("CREATE TEMPORARY TABLE artists " + "USING org.elasticsearch.spark.sql " + "OPTIONS(resource=`radio/artists`) ") val names = sql.sql("SELECT name FROM artists")

{ } CC-BY-ND 4.0 Native integration - Apache Storm TopologyBuilder
builder = new TopologyBuilder(); builder.setBolt("esBolt", new EsBolt("twitter/tweets")); TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("esSpout",new EsSpout("twitter/tweets","?q=nfl*",5); Builder.setBolt("bolt“, new PrinterBolt()).shuffleGrouping("esSpout");

{ } CC-BY-ND 4.0 Resource Management

{ } CC-BY-ND 4.0 YARN support – In Beta Run
Elasticsearch on YARN* * YARN doesn’t support long-lived services: •  No provisioning •  No ip/network guarantees •  Data/node affinity Next YARN releases plan to address this

{ } CC-BY-ND 4.0 Storage

{ } CC-BY-ND 4.0 HDFS integration Use HDFS as a
shared storage Backup and recover data Works great with snapshot immutable data Snapshot / Restore HDFS as a File-System – not recommended / tread carefully Incomplete FS semantics (last-delete-on-close, fsync) NFSv3 (metadata issues) See Elasticsearch issue #9072

{ } CC-BY-ND 4.0 What’s next Beta 1 - Apache
Spark Java/Scala DSL Beta 2 - Apache Storm Beta 3 - YARN and SSL/TLS Beta 4 - Client-node routing, Spark Sources + Data Frame 2.1 – in development 2.2 Marvel integration Machine Learning – MLlib

{ } Thank you! @costinl

Elasticsearch, Hadoop, and Friends: Spark, Stor...

Elasticsearch, Hadoop, and Friends: Spark, Storm, and More

Elastic Co

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript

Elasticsearch, Hadoop & Friends: Spark, Storm and more Costin Leau,

{ } CC-BY-ND 4.0 How to count words At scale;

{ } CC-BY-ND 4.0

{ } CC-BY-ND 4.0 Hadoop

{ } CC-BY-ND 4.0 Hadoop 0.20.x/1.x Compute Hadoop Distributed File

{ } CC-BY-ND 4.0 Map / Reduce overview

{ } CC-BY-ND 4.0 Hadoop 0.20.x/1.x Hadoop Distributed File System

{ } CC-BY-ND 4.0 Hadoop 2.x / NextGen Compute Hadoop

{ } CC-BY-ND 4.0 Hadoop 2.x / NextGen Hadoop Distributed

{ } CC-BY-ND 4.0 Elasticsearch Hadoop

{ } CC-BY-ND 4.0 Elasticsearch for Apache Hadoop™

{ } CC-BY-ND 4.0 Certified to work

{ } CC-BY-ND 4.0 Compute

{ } CC-BY-ND 4.0 Partition-to-partition architecture Node1

{ } CC-BY-ND 4.0 Dynamic runtime matching Node1

{ } CC-BY-ND 4.0 Failure handling Node1

{ } CC-BY-ND 4.0 Co-location Node1

{ } CC-BY-ND 4.0 Native integration - Map / Reduce

{ } CC-BY-ND 4.0 Native integration - Cascading Tap in

{ } CC-BY-ND 4.0 Native integration - Apache Pig A

{ } CC-BY-ND 4.0 Native integration - Apache Hive CREATE

{ } CC-BY-ND 4.0 Native integration - Apache Spark import

{ } CC-BY-ND 4.0 Native integration - Spark SQL

{ } CC-BY-ND 4.0 Native integration - Spark SQL val

{ } CC-BY-ND 4.0 Native integration - Apache Storm TopologyBuilder

{ } CC-BY-ND 4.0 Resource Management

{ } CC-BY-ND 4.0 YARN support – In Beta Run

{ } CC-BY-ND 4.0 Storage

{ } CC-BY-ND 4.0 HDFS integration Use HDFS as a

{ } CC-BY-ND 4.0 What’s next Beta 1 - Apache

{ } Thank you! @costinl