Slide 1

Slide 1 text

Kotlin ❤ Data Science?* Preslav Rachev @ KI labs // 28.01.2019 * Data science and data engineering @preslavrachev / (https://preslav.me), 2019 1

Slide 2

Slide 2 text

Who am I? — A software engineer, working at KI labs. — Passionate about Kotlin and data. — A genuinely curious individual who loves writing. ✏ Also, an inventor of funny faces ! " ✏ https://preslav.me @preslavrachev / (https://preslav.me), 2019 2

Slide 3

Slide 3 text

@preslavrachev / (https://preslav.me), 2019 3

Slide 4

Slide 4 text

KotlinConf 2018 @preslavrachev / (https://preslav.me), 2019 4

Slide 5

Slide 5 text

The IT Reality of 2019 — "AI" has become a favourite topic among business managers and software engineers, when discussing company innovation strategies. — Tech media is only making it worse. — Data is everywhere, but getting useful knowledge is far different from what management and engineering imagine. @preslavrachev / (https://preslav.me), 2019 5

Slide 6

Slide 6 text

AI, ML, DS?!? @preslavrachev / (https://preslav.me), 2019 6

Slide 7

Slide 7 text

AI, ML, DS?!? — AI is what brings the VC Money in. — ML (a.k.a sophisticated brute-force) is what gets the job done. — ML models are very limited to a given domain. — DS is the craft of finding which ML model works for a particular case, and which doesn't. ! @preslavrachev / (https://preslav.me), 2019 7

Slide 8

Slide 8 text

Data Science Definition Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science. — Wikipedia @preslavrachev / (https://preslav.me), 2019 8

Slide 9

Slide 9 text

Data Science Workflow 1. Form a hypothesis 2. Load data from various sources 3. Clean, transform and unify 4. Extract features 5. Use the features to run a model 6. Visualize and report findings 7. Support or refute the hypothesis @preslavrachev / (https://preslav.me), 2019 9

Slide 10

Slide 10 text

Motivation There is a mismatch between software engineering and data science practices: — Software engineering works best when building well-defined systems — Requirements rarely change entirely, but evolve over time — Data science deals with supporting and refuting hypotheses. — Lots of uncertainty, which requires seamless exploration and visualisation @preslavrachev / (https://preslav.me), 2019 10

Slide 11

Slide 11 text

Motivation Due to this mismatch, systems often end up becoming complex tech-stack mash- ups, where each side treats the other as some sort of a black box. - Difficult to maintain and requires lots of different skills and practices The question arises: Could there be a single tech stack that allows software engineers and data scientists work in peace, but also directly contribute to a single codebase? @preslavrachev / (https://preslav.me), 2019 11

Slide 12

Slide 12 text

@preslavrachev / (https://preslav.me), 2019 12

Slide 13

Slide 13 text

Kotlin == the missing link? — A multi-paradigm programming language with a fluent syntax — Strong community and enterprise backing — Access to the entire universe of JVM knowledge and libraries But there are a few important pieces, which are not quite there yet: — Fully integrated scripting capabilities — Playground environments (e.g. notebooks) — Data wrangling and visualization libraries that take advantage of the above @preslavrachev / (https://preslav.me), 2019 13

Slide 14

Slide 14 text

Kotlin Top 10 Features Kotlin is a multi-paradigm programming language, equally easy to learn by both Java and Python programmers. My personal Top 10: 1. Static typing 2. Immutability and Null-safety 3. Higher-order functions 4. Chain-able sequences 5. Data classes 6. Extension methods 7. Sealed classes 8. Coroutines 9. Default and named arguments 10. Multi-platform support @preslavrachev / (https://preslav.me), 2019 14

Slide 15

Slide 15 text

Static Typing List nums = new List<>(Arrays.asList(1, 2, 3)); // Java val nums = listOf(1, 2, 3) // Kotlin Immutability and Null-Safety // Every variable in Kotlin must be assigned a value, // unless explicitly declared with `lateinit` val x = 100 // cannot be changed, ever! var y = 200 // this one can lateinit var z // A value will be provided later @preslavrachev / (https://preslav.me), 2019 15

Slide 16

Slide 16 text

Higher-Order Functions and DSL support A higher-order function is a function that takes functions as parameters, or returns a function. route("/portal") { route("articles") { … } route("admin") { intercept(ApplicationCallPipeline.Features) { … } // verify admin privileges route("article/{id}") { … } // manage article with {id} route("profile/{id}") { … } // manage profile with {id} } } @preslavrachev / (https://preslav.me), 2019 16

Slide 17

Slide 17 text

Data classes and Chain-able Sequences data class Person(val name: String, val age: Int) val people = listOf(Person("Chris Martin", 31), Person("Will Champion", 32), Person("Jonny Buckland", 33), Person("Guy Berryman", 34), Person("Mhris Cartin", 30)) println(people .asSequence() // convert to sequence .filter { it.age > 30 } // lazy eval (intermediate op) .map { it.name.split(" ").map {it[0]}.joinToString("") } // lazy eval (intermediate op) .map { it.toUpperCase() } // lazy eval (intermediate op) .toList()) // terminal operation Tip: Combine these with coroutines to construct declarative data pipelines. @preslavrachev / (https://preslav.me), 2019 17

Slide 18

Slide 18 text

Sealed Classes sealed class ArithmeticOperation class Add(var a: Int, var b: Int): ArithmeticOperation() class Subtract(var a: Int, var b: Int): ArithmeticOperation() class Multiply(var a: Int, var b: Int): ArithmeticOperation() class Divide(var a: Int, var b: Int): ArithmeticOperation() fun execute(op: ArithmeticOperation) = when (op) { is Add -> op.a + op.b is Subtract -> op.a - op.b is Multiply -> op.a * op.b is Divide -> op.a / op.b } @preslavrachev / (https://preslav.me), 2019 18

Slide 19

Slide 19 text

Extension Methods fun String.underscore() : String { return this .replace(" ", "_") } print("hello word".underscore()) // "hello_world" Infix support infix fun Number.toPowerOf(exponent: Number): Double { return Math.pow(this.toDouble(), exponent.toDouble()) } 3 toPowerOf 2 // 9 9 toPowerOf 0.5 // 3 @preslavrachev / (https://preslav.me), 2019 19

Slide 20

Slide 20 text

The Ecosystem No programming language in the world will do the job, without an abundant library ecosystem to choose and pick from. — The Kotlin Standard Library will be your first choice. Yet, by far not the only one. — Kotlin is stepping on the shoulders of giants (e.g. the JVM) — The future prospects of integrating low-level libraries together with Kotlin Native are even more promising @preslavrachev / (https://preslav.me), 2019 20

Slide 21

Slide 21 text

What the JVM has to offer... Library Functionality Apache Hadoop Batch Processing Apache Spark Data Streaming ND4J scientific computing (similar to NumPy) Apache Commons Math Math and computing utils Weka ML/NLP (similar to SciPy) Tablesaw Visualization (similar to Matplotlib and Plot.ly) TensorFlow for Java Deep ML Deeplearning4j Deep ML And many, many more... @preslavrachev / (https://preslav.me), 2019 21

Slide 22

Slide 22 text

...besides, a young ecosystem of libs targeting Kotlin's unique features: Library Functionality Krangl Data wrangling (similar to Pandas) Kravis Visualisation (similar to Matplotlib and Plot.ly) Koma scientific computing (similar to SciPy) kotlin-statistics scientific computing and statistics komputation neural network for the JVM written in Kotlin and CUDA C Still, no real Pandas yet ! @preslavrachev / (https://preslav.me), 2019 22

Slide 23

Slide 23 text

Many of the above libraries use standards for communicating input data (e.g. CSV) or results (e.g. trained ML models, reports, aggregated data sets, visualisations, etc). — At the very least, this means that one can create an environment, in which data scientists keep using their favourite tools, and communicate their findings with the software engineers, using those standards. — Kotlin can become a mutual ground of code understanding OK, but can we get one step further from there? @preslavrachev / (https://preslav.me), 2019 23

Slide 24

Slide 24 text

@preslavrachev / (https://preslav.me), 2019 24

Slide 25

Slide 25 text

The ultimate data scientist peace requires three more things: 1. Better Kotlin scripting support 2. A solid REPL (Read-Eval-Print Loop) console 3. Tools that encourage experimentation and interactive programming @preslavrachev / (https://preslav.me), 2019 25

Slide 26

Slide 26 text

Scripting Support A large portion of the work of the data team involves the use and deployment of executable scripts. This is one field where Python excels off the charts Kotlin Script is unfinished, slow and painful to work with ! KEEP-75 @preslavrachev / (https://preslav.me), 2019 26

Slide 27

Slide 27 text

KScript Is an open-source project that tries to improve the performance of Kotlin scripts, and reduce the friction when working with 3rd-part libs: #!/usr/bin/env kscript @file:DependsOn("de.mpicbg.scicomp:kutils:0.4") import de.mpicbg.scicomp.bioinfo.openFasta if (args.size != 1) { System.err.println("Usage: CountRecords ") kotlin.system.exitProcess(-1) } val records = openFasta(java.io.File(args[0])) println(records.count()) @preslavrachev / (https://preslav.me), 2019 27

Slide 28

Slide 28 text

REPL Kotlin has a REPL (Read-Eval-Print Loop), but it is a tough beast. IntelliJ extends the Kotlin REPL and makes it a bit nicer to work with. Check out KShell as an alternative. @preslavrachev / (https://preslav.me), 2019 28

Slide 29

Slide 29 text

Interactive Programming Also known as notebooks or playgrounds, tools like Jupyter allow for a unique mix of narrative and code. — Let programmers play around with data and libs in a visual, REPL-like environment — Great for sharing and explaining difficult concepts Kotlin Jupyter Kotlin Playground @preslavrachev / (https://preslav.me), 2019 29

Slide 30

Slide 30 text

What did we learn? — Kotlin is a great language with a mature library ecosystem. — It lacks some of the tooling that data scientists need. — The community and JetBrains are working hard to fill the gaps. — We wouldn't have reached this far, it weren't for these folks: @ligee, @thomasnield9727, @holgerbrandl and many more around the #datascience community on Slack. @preslavrachev / (https://preslav.me), 2019 30

Slide 31

Slide 31 text

Thank You! ! Short Demo Questions? @preslavrachev / (https://preslav.me), 2019 31

Slide 32

Slide 32 text

Links — The Connection Between Data Science, Machine Learning and Artificial Intelligence — Awesome Kotlin - a curated list of libraries and resources @preslavrachev / (https://preslav.me), 2019 32

Slide 33

Slide 33 text

Why Kotlin and not Scala? @preslavrachev / (https://preslav.me), 2019 33