Slide 1

Slide 1 text

Zeppelin (powered by Apache Spark) ਵ۽ ؘ੉ఠ ࠙ࢳೞӝ 2014-11-05 झࢎݽ (ೠҴ झ౵௼ ࢎਊ੗ ݽ੐) https://www.facebook.com/groups/sparkkoreauser/ ! ӣ࢚਋, VCNC(࠺౟ਦ) kevin@between.us

Slide 2

Slide 2 text

Apache Spark? • MapReduce ৬ ਬࢎೠ ੘স੉ оמ • ഛ੢ࢿ (Spark SQL, Spark Streaming, MLLib, GraphX) • MapReduceࠁ׮ ഻ঁ рױೠ ੋఠಕ੉झ, ߓ਋ӝ ए਑ (Scala, REPL) • ੘স ઙܨী ٮۄ MapReduce੄ 5ߓ~50ߓ ࡅܴ (In- Memory Data) • Hadoop Storage ഐജ (HDFS, HBase, S3, ..)

Slide 3

Slide 3 text

৵ ೙ਃೠо? • MapReduce, Hive (ӝઓ੄ ૑ߓ ӝٜࣿ) • ݒ਋ ъ۱ೞ૑݅, ੘স੉ ࠂ੟ೡࣻ۾ ࠺ബਯ੸੉׮. (઺ р Ѿҗܳ ҅ࣘ೧ࢲ HDFSী ੷੢) • APIо ࠂ੟ೞҊ, MR Job ৈ۞ѐܳ Chaining೧ࢲ ੘স ਸ ٜ݅য֬ਵݶ, ਬ૑ࠁࣻೞӝо য۵׮.

Slide 4

Slide 4 text

Spark Key Concept • RDD (Resilient Distributed Datasets) ‣ ௿۞झఠ ੹୓ীࢲ ҕਬغח ܻझ౟, ݫݽܻ࢚ী ৢۄо੓਺. (ݫݽܻ ࠗ઒ೠ ҃਋, ٣झ௼ী spill) ‣ map, reduce, count, filter, join ١ ׮নೠ ੘স оמ ‣ ৈ۞ ੘সਸ ࢸ੿೧فҊ, Ѿҗܳ ঳ਸ ٸ lazyೞѱ ҅࢑ • Scala ‣ ؘ੉ఠ ࠙ࢳ ೞӝী ই઱ જ਷ ঱য ‣ ъ۱ೠ expression, Java৬੄ ഐജࢿ ‣ Interactive Shell (REPL)

Slide 5

Slide 5 text

Spark਷ જ׮ • ࣻभ؀੄ Hadoop Cluster۽ ௾ ੘সਸ ج۰ঠ ೮؍ ҃਋, 10؀ ੉ೞ੄ Cluster۽ ؀୓ೡ ࣻ ੓׮ • ௿۞झఠ۽ ج۰ঠ ೞ؍ ੘সਸ 1~2؀۽ جܾ ࣻ ੓׮ • ࣻभ࠙ ӝ׮۰ঠ ೞ؍ ੘স੉ 1࠙݅ী ৮ܐػ׮ • MR ੘স ௏٘ ٜ݅Ҋ, ಁః૚ೞҊ, submitೞҊ ೞ؍ ࠂ੟ ೠ җ੿੉, shellীࢲ ௏٘ ೠ઴ ஖חѪਵ۽ ؀୓ػ׮ • ୊਺ ੽ೞח ࢎۈب ߓ਋ӝ औ׮

Slide 6

Slide 6 text

Code Examples (1) ! Word Count

Slide 7

Slide 7 text

Word Count val file = spark.textFile("hdfs://...") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")

Slide 8

Slide 8 text

Code Examples (2) ! Getting Between PC Ver. Download

Slide 9

Slide 9 text

Getting Download Data case class CloudFrontPcVerChart(val date: String, val country: String, val ip: String, val http_method: String, val ua: String) val cloudFrontPcVerLogs = "s3n://assets-between-pc-logs/*2014-10-*" val cloudFrontPcVerDownloadLogs = sc.textFile(cloudFrontPcVerLogs).filter(_ contains "/downloads/ setup.exe").map(x => x.split("\t")) cloudFrontPcVerDownloadLogs.first val cloudFrontPcVerDownloadChart = cloudFrontPcVerDownloadLogs.map(arr => CloudFrontPcVerChart(arr(0), IP2C.get(arr(4)), arr(4), arr(5), arr(10))) cloudFrontPcVerDownloadChart.registerAsTable("pc_ver_download")

Slide 10

Slide 10 text

Querying Data select country, count(1) value from pc_ver_download group by country order by value desc limit 10 Simple enough!

Slide 11

Slide 11 text

Result * Visualization powered by Zeppelin

Slide 12

Slide 12 text

ഛ੢ ೐۽ં౟ٜ • Spark SQL • Spark Streaming • MLlib • GraphX • SparkR (৘੿) • Zeppelin

Slide 13

Slide 13 text

Zeppelin • A web-based notebook for Apache Spark (http://zeppelin- project.org) • Open source (https://github.com/NFLabs/zeppelin)

Slide 14

Slide 14 text

Zeppelin • Early stage ೐۽ં౟ (Github 50 Star) • 1~2֙ ࢎ੉ী ষ୒ ਬݺ೧૕ ೐۽ં౟ • 10઴݅ ழ޿೧ب contributor ۽ ֍য઱ח જ਷ ೐۽ં౟ • ए਍ ࢸ஖, प೯ೞݶ Sparkਸ ղࠗীࢲ ڸਕષ (৻ࠗ Cluster৬ োѾب оמ)

Slide 15

Slide 15 text

Zeppelin Implementing dashboard via Zeppelin with few codes and queries

Slide 16

Slide 16 text

Zeppelin Spark & Zeppelin Live Demo

Slide 17

Slide 17 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ ETLࠗఠ ࠙ࢳ, visualisationө૑ ೞա੄ ో۽ ݽف ୊ܻ

Slide 18

Slide 18 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ ETLࠗఠ ࠙ࢳ, visualisationө૑ ೞա੄ ో۽ ݽف ୊ܻ

Slide 19

Slide 19 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ Interactive! ௏٘ա ௪ܻܳ ֍Ҋ Ѣ੄ ૊द Ѿҗо ա১

Slide 20

Slide 20 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ Spark SQLҗ Ѿ೤ೞৈ Visualisation ో۽ب ֫਷ оמࢿ

Slide 21

Slide 21 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ Spark SQLҗ Ѿ೤ೞৈ Visualisation ో۽ب ֫਷ оמࢿ

Slide 22

Slide 22 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ Spark SQLҗ Ѿ೤ೞৈ Visualisation ో۽ب ֫਷ оמࢿ рױೠ SQL Query۽ ؀एࠁ٘ܳ ࣽधрী ݅ٞ

Slide 23

Slide 23 text

Live Demoܳ Keynoteী ֍ӝо য۰ਕ झ௼ܽࢫਵ۽ ؀୓೤פ׮ Spark SQLҗ Ѿ೤ೞৈ Visualisation ో۽ب ֫਷ оמࢿ ਤ஖, և੉ ١ ઑ੺

Slide 24

Slide 24 text

Zeppelin • рױೞѱ ؘ੉ఠ ࠙ࢳਸ द੘೧ࠁ۰ח ࢎۈٜীѱ ୶ୌ • ޹୏ೞѱ ੉۠੷۠ ؘ੉ఠܳ ࢓ಝࠁҊ ࠙ࢳೞ۰ח ࢎۈٜীѱ ୶ ୌ • Dashboardਸ ࡅܰѱ ٜ݅Ҋ र਷ ࢎۈٜীѱ ୶ୌ • Hotೠ Open Sourceী ଵৈ೧ࠁҊ र਷ ࢎۈٜীѱ ୶ୌ • Sparkਸ ୊਺ ࢎਊೞח ҃਋ח Spark Shellਸ ݢ੷ ࢎਊ೧ࠁחѪ ਸ ୶ୌ (Zeppelin Code Editor੄ Auto Completionӝמ੉ ࠁъؼ ٸ ө૑)

Slide 25

Slide 25 text

хࢎ೤פ׮