Slide 1

Slide 1 text

ؘ੉ఠ ࠙ࢳਸ ਤೠ 2014-12-03 झࢎݽ (ೠҴ झ౵௼ ࢎਊ੗ ݽ੐) ӣ࢚਋, VCNC(࠺౟ਦ) [email protected]

Slide 2

Slide 2 text

द੘ೞӝ ੹ী 1. Scalaח ౠ੿ ࠙ঠী Ҵೠغ૑ ঋ਷ ߧਊ ೐۽Ӓې߁ ঱যੑפ׮. ࠄ ੗ܐীࢲח ؘ੉ఠ ࠙ࢳ ࠙ঠ ী ୡ੼ਸ ݏ୶য Scalaо ࢤࣗೠ ࢎۈٜਸ ਤ೧ Scala੄ ੌࠗܳ ࣗѐೞҊ ੓णפ׮. Scalaী ؀೧ ؊ ੗ࣁ൤ ঌইࠁҊ रਵन ࠙਷ Ҵղ ࢎਊ੗ Ӓܛੋ ‘ۄ झணۄ ௏٬ױ’ਸ ୶ୌ೤פ׮. 2. ੉ ੗ܐীࢲ ׮ܖҊ੗ ೞח ؘ੉ఠ ࠙ࢳ਷ R, Matlab١ਸ ࢎਊೞח Ҋә ࠙ࢳࠁ׮ח, ઱۽ ؀ਊ۝ ؘ੉ఠ੄ ࠙࢑ ୊ܻ ߂ ࠙ࢳ ࠙ঠੑפ׮.

Slide 3

Slide 3 text

public class WordCount { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } Word count in MapReduce (Java)

Slide 4

Slide 4 text

public class WordCount { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } val file = spark.textFile("hdfs://...") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") Word count in Spark(Scala)

Slide 5

Slide 5 text

Index • Scala ѐਃ • ৵ Scalaੋо? • Scala ӝୡ ݍࠁӝ • ખ݅ ؊ ౵ࠁӝ

Slide 6

Slide 6 text

Scala ѐਃ

Slide 7

Slide 7 text

Scalable Language! • рѾೠ ಴അҗ ъ۱ೠ ӝמਸ ా೧ ؊ ௾ ೐۽Ӓ۔ਸ ٜ݅ӝ ਤೠ ঱য • Scalaо о૓ ৈ۞о૑ ౠ૚ٜ੉ ؘ੉ఠ ࠙ࢳೞӝী જ਷ Ѫٜ੉ ݆׮

Slide 8

Slide 8 text

Scala • ই઱ рѾೠ ޙߨ (like, Python) • OOP, Functional Programming झఋੌ оמ • JVMীࢲ प೯, Java৬ ഐജ • જ਷ ࢿמ (== Java) • ੿੸ ఋੑ (!= Python, == Java) • REPL (Shell), Scripting * Ӓ ߆ীب જ਷ ౠ૚੉ ݆૑݅, ؘ੉ఠ ࠙ࢳ ࠙ঠ৬ ҙ۲ػ ౠ૚ ਤ઱۽ ঱әೞ৓णפ׮

Slide 9

Slide 9 text

рѾೠ ޙߨ (Java৬ ࠺Ү) public class Person { private String name; private String work; public void setName(String name) { this.name = name; } public String getName() { return name; } public void setWork(String work) { this.work = work; } public String getWork() { return work; } } Person.java Job.java public class Job { public void main(String[] args) { Person kevin = new Person(); kevin.setName("Kevin"); kevin.setWork("Between"); } } job.scala class Person(val name: String, val work: String) val kevin = new Person("Kevin", "Between") ஢੉ ݽ੗ۄ.. GOOD

Slide 10

Slide 10 text

OOP & Functional Programming • য়೧: OOP৬ Functional Programming਷ ߈؀݈੉׮? (X) • Scalaח Pure OOP class Person(val name: String, val work: String) val kevin = new Person("Kevin", "Between") • Scalaח Functional Programming੉ оמ val list = List(1, 2, 3) def aMultiplyFunction(x: Int) = { x * 2 } val result = list.map(aMultiplyFunction) ೣࣻо 1st-class citizen! ೣࣻܳ ؘ੉ఠ۽ р઱ೞҊ, ੋ੗۽ ֈӝח ١੄ ೯ਤо оמ

Slide 11

Slide 11 text

JVMীࢲ प೯, Java৬ ഐജ • Scala ௏٘ܳ ஹ౵ੌೞݶ Java৬ ݃ଲо૑۽ .class ౵ੌ੉ ա১ • JVMীࢲ प೯, Java৬ Ѣ੄ زੌೠ प೯ ࢿמਸ о૗ • Java Class Importೞৈ ࢎਊ оמ • Java fileҗ Scala fileਸ ഒਊೞৈ ஹ౵ੌب оמ

Slide 12

Slide 12 text

੿੸ ఋੑ ঱য • ੿੸ ఋੑ vs ز੸ ఋੑ? • ࢲ۽ ੢ױ੼੉ ڢ۶ೣ • ੿੸ ఋੑ ঱য੄ ੢੼: ஹ౵ੌद ఋੑ ୓ఊ, જ਷ ࢿמ • ز੸ ఋੑ ঱য੄ ੢੼: рಞೠ ௏٘੘ࢿ, ӭՔೠ ௏٘ • Scalaח ੿੸ ఋੑ ঱য • ஹ౵ੌद ఋੑ୓௼, type safety, જ਷ ࢿמ • ࠺Ү੸ ӭՔೠ type interface - ఋੑਸ ୶ۿ(type inference)ೞৈ ֍যષ • ௏٘ܳ ױࣽೞѱ ਬ૑ೞӝ ਤೠ implicit conversion١੄ ੢஖

Slide 13

Slide 13 text

৵ Scalaੋо?

Slide 14

Slide 14 text

৵ Scalaੋо? • рѾೠ ޙߨҗ ъ۱ೠ expression • Functional Programming • Java৬ ഐജ (= Hadoop ഐജ!) • REPL, Scripting • Apache Spark • Collection library, Pattern matching, Ӓ ৻ ݧ૓ بҳٜ

Slide 15

Slide 15 text

рѾೠ ޙߨ, ъ۱ೠ ಴അ۱ • (׼োೞѱب) ޙߨ੉ рѾೞݶ જ׮. • if-else࠙ӝ ഑਷ try-catch ١੉ ݽف expression੐ // if statement is an expression! println(if (a == "A") "It's A!" else "It's not A") // try catch is an expression! val value = try { doSomeDangerousOperation } catch { case _ => "some value" } val file = spark.textFile("hdfs://...") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")

Slide 16

Slide 16 text

рѾೠ ޙߨ, ъ۱ೠ ಴അ۱ • ੌҙࢿ ੓ח operatorٜ // Java "A".equals("B") // Scala "A" == "B" case class Person(name: String, work: String) val kevin = Person("Kevin", "Between") val anotherKevin = Person("Kevin", “Between”) kevin == anotherKevin // true case class੄ ࢤࢿীח new о ೙ਃ হ׮ • ೤ܻ੸ੋ class equality

Slide 17

Slide 17 text

Functional Programming • ӝઓ੄ ೐۽Ӓ۔ীࢲ੄ ೣࣻо ইצ, ࣻ೟੸ੋ ੄޷ীࢲ੄ ೣࣻܳ ࢤп೧ ࠇद׮! • y = sin(x) : Side effectо হ਺. যڃ ࢚ടীࢲب x ܳ ֍ਵݶ Ӓ ী ݏח yо ա১ • tan(x) = sin(x) / cos(x) : ೣࣻܳ ؘ੉ఠ୊ۢ ࢤпೞৈ, ౵ۄݫఠ ۽ ֈӝѢա ઑ೤ೞח ١੄ ੘স੉ оמ • y = sin(x) : yח xо ೠߣ ੿೧૑ݶ ߸ೞ૑ ঋ਺. ’߸ࣻ’ о হѱ! • ߸ٜࣻਸ immutableೞѱ ٜ݅੗! * ৘ઁ ߂ ੌࠗ ࢸݺਸ Programming Scala ଼ীࢲ ରਊ೮णפ׮.

Slide 18

Slide 18 text

FP੄ ੉۞ೠ ౠࢿٜ੉ ৵ જ਷о? • ߡӒܳ ઴ৈળ׮ (߸ࣻী ੄೧ ৘ӝ஖ޅೠ ز੘ী ࡅ૑חѪਸ) • ೠߣ ٜ݅য֬਷ ೣࣻܳ ޺ਸ ࣻ ੓׮ (no side effect!) • immutable ߸ࣻח ޙઁܳ ױࣽച೧ળ׮ (data share, parallelismী ъೣ)

Slide 19

Slide 19 text

Java৬੄ ഐജࢿ • JVMীࢲ ҳز -> ݆਷ ন੄ ؘ੉ఠ ୊ܻೡ ٸ ࢿמ જ਺! • Java libraryٜਸ Ӓ؀۽ ഝਊ оמ • Hadoop eco-system੄ Java ௏ٜ٘ਸ Ӓ؀۽ ࢎਊೡ ࣻ ੓׮! • ৘੹ী ઓ੤ೞ؍ ௏٘ܳ ੸਷ ֢۱ਵ۽ convert೧ࢲ ࢎਊ оמ • Java ௏٘৬ ഒਊ೧ࢲ ஹ౵ੌ оמ • src/java/…, src/scala/…

Slide 20

Slide 20 text

REPL • Read–Eval–Print Loop (aka Shell) • ࢜۽਍ ঱যܳ ࡅܰѱ ߓ਋Ҋ, द೷ೡ ࣻ ੓׮! • ؘ੉ఠܳ ٜৈ׮ ࠅ ҃਋, step-by-stepਵ۽ ੘স੉ оמ೧ࢲ જ׮ ী۞о աب ૊п ঌࣻ ੓׮ ؘ੉ఠܳ ׮ܖח җ੿੉ interactive೧૗!

Slide 21

Slide 21 text

Apache Spark • ݫݽܻ ӝ߈ Ҋࢿמ ࠙࢑ ؘ੉ఠ ୊ܻ दझమ (ӝઓ੄ 10~100ߓ) • Scala۽ ॳৈ૗. Scala੄ collection library৬ ਬࢎೠ ੋఠಕ੉झ • Scala shellী ӝמਸ ୶оೠ Spark shell ઁҕ • ߧਊ੸ਵ۽ ࢎਊೞӝ ਤೠ ׮নೠ োҙ ೐۽ં౟ ઓ੤ • SQL, Machine Learning, Graph Analysis.. ١١ • ૑Әب ࡅܰѱ ѐߊغҊ ੓Ҋ ݆਷ ࢎۈٜ੄ ҙबਸ ߉Ҋ ੓਺

Slide 22

Slide 22 text

Ӓ ߆ী.. • Collection library • Pattern matching • implicitэ਷ ਋ইೠ بҳٜ • ّࠗ࠙ীࢲ ؊ ੗ࣁ൤ ׮ܙ ৘੿

Slide 23

Slide 23 text

Scala ӝୡ ݍࠁӝ *ؘ੉ఠ৬ ҙ۲ػ ࠗ࠙݅*

Slide 24

Slide 24 text

ؘ੉ఠ ҳઑ • List, Map, Set ١੄ collection ٜ • List(1, 2, 3), Map(1 -> “a”, 2 -> “b”), Set(1, 2) • Tuple • val sparkTechTalk = (“2014-12-03”, 50) • sparkTechTalk._1 • case (key, value) => println(key) • Option • ч੉ হਸ ٸ, null ؀न! (؊ ಞೞҊ, উ੹ೠ ೐۽Ӓې߁) • a = 1, a = null (ӝઓ) a = Some(1) a = None (Optionഝਊ) • a.nonEmpty, a.getOrElse(0) • Range • for (i <- 0 to 10) println(i) • (0 to 10).foreach(println) • (0 until 10) (0 to 10) (0 to -10 by -1)

Slide 25

Slide 25 text

Collections

Slide 26

Slide 26 text

Collection ׮ܖӝ • (n), head, tail, last, contains, distinct, drop, … • Functional Combinators • map: elementী ೣࣻܳ ੸ਊೞৈ ׮ܲ ഋక۽ ߸ജ • filter: elementܳ true/false ౸߹ ೣࣻ ੸ਊ റ trueੋ ೦ݾ݅ թӣ • foreach: mapҗ ࠺त, ׮ܲഋక۽ ߸ജೞ૑ ঋҊ iteration݅ ࣻ೯ • foldLeft (foldRight, reduce): ৽ଃ੄ elementࠗఠ द੘ೞৈ ೞա ۽ ೤ஜ • ّࠗ࠙ী ࢎਊ ৘ܳ ࠇद׮

Slide 27

Slide 27 text

Function Literal val list = List(1, 2, 3, 4) list.filter((x: Int) => x < 3) val testNumber1 = (x: Int) => x < 3 // function as a 1st-class object! list.filter(testNumber1) list.filter((x) => x < 3) // target typing list.filter(x => x < 3) list.filter(_ < 3) // placeholder def testNumber2(x: Int) = x < 3 // function list.filter(x => testNumber2(x)) list.filter(testNumber2(_)) list.filter(testNumber2 _) list.filter(testNumber2) ݆਷ ࠗ࠙ਸ ୷ড оמ! ࣻৌীࢲ 3 ޷݅ੋ ч ҳೞӝ

Slide 28

Slide 28 text

val input1 = "three" case class Chart(date: String, count: Int) val input2 = Chart("2014-12-02", 50) val input3 = ("spark-techtalk", 100) def matchTest(x: Any): Any = { x match { case 1 => "one" case "two" => 2 case (key, value) => s"key: $key, value: $value" case Chart(date, count) => s"date: $date, count: $count" case _ => "others" } } matchTest(input1) res0: Any = others matchTest(input2) res1: Any = date: 2014-12-02, count: 50 matchTest(input3) res2: Any = key: spark-techtalk, value: 100 Pattern Matching & Case Class • Java੄ switch ~ case ৬ ࠺तೞ૑݅, ഻ঁ ъ۱ೠ بҳ ׮ܲ ઙܨ੄ ఋੑ੉ۄب ݒ஖ оמ case ഑਷ case class ഝਊೞݶ ؊਌ ಞܻ case class: ؘ੉ఠ ҳઑചী ಞܻ

Slide 29

Slide 29 text

ӝୡ ޙߨٜ੉ա, ؊ ੗ࣁೠ ੉ۿ੸ ղਊ਷ ଼ਸ ଵҊ೤द׮. ୶ୌبࢲ: Programming in Scala (ೠҴয౸ ੓਺)

Slide 30

Slide 30 text

৘ઁ: ۽Ӓীࢲ рױೠ ૑಴ ҳೞӝ // load log file val logFile = new java.io.File(path + "example_log.txt") val log = scala.io.Source.fromFile(logFile).getLines().toList // parse log and get sign up numbers case class LogEntry(dateTime: String, action: String, id: String) val logEntries = log.map(csv => csv.split(",")).map(arr => LogEntry(arr(0), arr(1), arr(2))).toList // get sign up val logEntriesToday = logEntries.filter(_.dateTime.contains("2014-12-04")) val signUp = logEntriesToday.filter(_.action == "SIGN_UP").size // active user val userIds = logEntriesToday.map(_ id) val activeUser = userIds.distinct.size

Slide 31

Slide 31 text

Bonus: Spark Version // load log file val log = sc.textFile("file:///example_log.txt") // parse log and get sign up numbers case class LogEntry(dateTime: String, action: String, id: String) val logEntries = log.map(csv => csv.split(",")).map(arr => LogEntry(arr(0), arr(1), arr(2))) // get sign up val logEntriesToday = logEntries.filter(_.dateTime.contains("2014-12-04")) val signUp = logEntriesToday.filter(_.action == "SIGN_UP").count // active user val userIds = logEntriesToday.map(_ id) val activeUser = userIds.distinct.count Scala collection API৬ Ѣ੄ ৮੹൤ زੌ!

Slide 32

Slide 32 text

ખ݅ ؊ ౵ࠁӝ Implicits

Slide 33

Slide 33 text

Implicit Conversion • ӝמ੄ ഛ੢ਸ ಞೞѱ ೞҊरਸٸ • ৘࢚غח ఋੑਵ۽ ߸ജೞח ೣࣻܳ ੿੄೧֬Ҋ, ੗زਵ۽ ੸ਊ implicit def stringToInt(number: String): Int = { number match { case "one" => 1 case "two" => 2 } } def printNumber(n: Int) = println(n) printNumber("one") ਗې؀۽ۄݶ, compile error. implicit conversion੉ ࢶ঱غয ੓ਵ޲۽, String => Int ۽ ੗ز ߸ജ੉ ੌযթ

Slide 34

Slide 34 text

Implicit Conversion ഝਊ DateParser.parse("2014-12-03") // java style "2014-12-03".toDateTime // better solution using implicit conversion object DateParser { def parse(dateString :String) = new java.util.Date } DateParser.parse("2014-12-03") class DateConverter(val s: String) { def toDateTime = DateParser.parse(s) } implicit def string2DateConverter(s: String) = new DateConverter(s) "2014-12-03".toDateTime ؊ ૒ҙ੸੉Ҋ ੌҙࢿ ੓ח ௏٘ܳ ٜ݅ ࣻ ੓׮!

Slide 35

Slide 35 text

Implicit Parameter • ߈ࠂ ੸ਊغח ౵ۄݫఠܳ рױೞѱ ٜ݅Ҋ रਸٸ val date = "2014-12-03" calculateSignUp(date) calculateActiveUser(date) calculateActionCount(date) def calculateSignUp(implicit date: String) = ... implicit val date = "2014-12-03" calculateSignUp calculateActiveUser calculateActionCount(date) • ױ, implicitਸ թߊೞݶ ൨ٜয૓׮!

Slide 36

Slide 36 text

੿ܻ • Scalaח ؘ੉ఠ ࠙ࢳೞӝী જ਷ ঱য (׮ܲ ਊب۽ب જইਃ) • рѾೠ ಴അ, જ਷ ࢿמ, Functional Programming • REPL, Scriptingоמ • ਋ইೠ ߑधਵ۽ ਗೞח ѐ֛ਸ ҳഅೡ ࣻ ੓਺

Slide 37

Slide 37 text

хࢎ೤פ׮

Slide 38

Slide 38 text

ଵҊೡ݅ೠ ੗ܐ • Scala 5࠙݅ী ߓ਋ӝ http://learnxinyminutes.com/docs/scala/ • Coursera Scala ъ੄ https://www.coursera.org/course/progfun • Scala ߓ਋ӝ (࠶۽Ӓ) http://joelabrahamsson.com/learning-scala/ • Scala School (౟ਤఠ) http://twitter.github.io/scala_school/ko/ • Programming in Scala (ೠҴয౸) Scala੄ ହद੗ੋ ݃౯ য়؊झఃо ૒੽ ੷ࣿ, ੹Ҵ ࢲ੼ীࢲ ҳݒ оמ