Slide 1

Slide 1 text

Flink Forward 2023 © State of Scala API in Apache Flink Alexey Novakov, Solution Architect @ Ververica

Slide 2

Slide 2 text

Flink Forward 2023 © Contents 1. Why use Scala 2. Usage of Scala in Apache Flink 3. Apache Flink Scala API 4. Scala tools for Flink jobs

Slide 3

Slide 3 text

Flink Forward 2023 © Why use Scala ○ Expressive and concise syntax. Support of scripting ○ Unique language features with support of FP and OOP ○ Compiles to JVM, JavaScript and Native code ○ Spark, Flink, Akka, Kafka: all are using Scala Scala is more than 15 years old programming language with mature eco-system of tools, libraries and many books @main def hello() = println("Hello, World!")

Slide 4

Slide 4 text

Flink Forward 2023 © 1. Editors: VSCode with Metals plugin, IntelliJ Idea with Scala plugin 2. REPL: console, Ammonite 3. CLI: scala-cli 4. Build tools: Mill 5. Libraries/Frameworks: scalatest, ZIO, Cats, Akka HTTP, Spark, Play, fs2, Slick, and more 6. Library Registry: https://index.scala-lang.org/ Scala Tools & Libraries

Slide 5

Slide 5 text

Flink Forward 2023 © Scala Versions Scala 2.12 released on Oct 28, 2016 Scala 2.13 released on Jun 7, 2019 Scala 3.0 released on May 21, 2021 Binary compatible Flink Scala API is still on 2.12

Slide 6

Slide 6 text

Flink Forward 2023 © Dependency Tree: before Flink 1.15 Scala is coupled Scala 2.11, 2.12 std. library Flink Modules in Java/Scala DataStream Scala/Java API Compile-time dependency Scala 2.13, 3.x std. library Flink Job in Scala Scala 2.11, 2.12 std. library Switch to new Scala is not possible Apache Flink modules User app modules implies

Slide 7

Slide 7 text

Flink Forward 2023 © Dependency Tree: since Flink 1.15 Scala is no longer tightly coupled Shaded Scala 2.12 std. library Flink Modules in Java/Scala DataStream Java API Compile-time dependency Scala 2.13, 3.x std. library Flink Job in Scala Apache Flink modules User app modules Switch to newer Scala is possible

Slide 8

Slide 8 text

Flink Forward 2023 © Since Flink 1.15 ● Flink’s Scala version is “shaded” and does not clash with user’s Scala ● To use Scala 2.13 or 3.x remove flink-scala JAR from the Flink distribution: ● Then use Java API from your Scala code $ rm flink-dist/lib/flink-scala* @main def job = val env = StreamExecutionEnvironment.getExecutionEnvironment env .fromElements(1, 2, 3, 4, 5, 6) .filter(_ % 2 == 1).map(i => i * i).print() env.execute() However, users have to provide Scala serializers. See solution further

Slide 9

Slide 9 text

Flink Forward 2023 © Flink PMCs Decision Background: attempt to add support of Scala 2.13 was failed (see FLINK-13414 Jira) 1. Users to develop in Scala further via Java API - Pros: freedom of choice of any Scala version - Cons: it requires to define your own serializers 2. All Flink Scala APIs are deprecated and will be removed in future Flink versions 3. Flink Internal Scala modules will be kept or rewritten in Java (if possible)

Slide 10

Slide 10 text

Flink Forward 2023 © Apache Flink Scala API

Slide 11

Slide 11 text

Flink Forward 2023 © Official Scala API Extension Add special “import” for the DataStream API import org.apache.flink.api.scala._ object Main extends App { val env = ExecutionEnvironment.getExecutionEnvironment val text = env.fromElements( "To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles,") val counts = text .flatMap(value => value.split("\\s+")) .map(value => (value,1)) .groupBy(0) .sum(1) counts.writeAsCsv("output.txt", "\n", " ") env.execute("Scala WordCount Example") } https://index.scala- lang.org/apache/flink/artifacts/flink- streaming-scala/1.17.1?binary- version=_2.12 2.12

Slide 12

Slide 12 text

Flink Forward 2023 © Ways to use new Scala with Flink 1. flink-extended/flink-scala-api: a fork of Flink Scala bindings originally created by Findify (great effort of Roman Grebennikov) 2. ariskk/flink4s: Scala 3.x wrapper for Apache Flink 3. Direct* usage of Flink Java API "org.apache.flink" % "flink-streaming-java" % ”x.y.z” *Caution: you need to bring own type serializers

Slide 13

Slide 13 text

Flink Forward 2023 © Migration to flink-scala-api // flink-scapa-api imports import org.apache.flinkx.api.* import org.apache.flinkx.api.serializers.* // original API import import org.apache.flink.streaming.api.scala.* libraryDependencies += "org.flinkextended" %% "flink-scala-api" % "1.16.2_1.1.0" "1.17.1_1.1.0" "1.15.4_1.1.0" // build.sbt Choose your version

Slide 14

Slide 14 text

Flink Forward 2023 © Example Job (flink-extended/flink-scala-api) import org.apache.flinkx.api.* import org.apache.flinkx.api.serializers.* @main def socketWordCount(hostName: String, port: Int) = val env = StreamExecutionEnvironment.getExecutionEnvironment env .socketTextStream(hostName, port) .flatMap(_.toLowerCase.split("\\W+").filter(_.nonEmpty)) .map((_, 1)) .keyBy(_._1) .sum(1).print() env.execute("Scala socketWordCount Example") Connecting to server socket localhost:9999 [info] 3> (hello,1) [info] 8> (flink,1) [info] 1> (scala,1) [info] 1> (api,1) % nc –lk 9999 hello flink scala api

Slide 15

Slide 15 text

Flink Forward 2023 © Serializer Derivation import org.apache.flinkx.api.serializers.* case class Foo(x: Int) { def inc(a: Int) = copy(x = x + a) } // defined explicitly for caching purpose on compilation. // If not defined, then it is derived automatically implicit lazy val fooTypeInfo: TypeInformation[Foo] = deriveTypeInformation[Foo] env .fromElements(Foo(1),Foo(2),Foo(3)) .map(x => x.inc(1)) // taken as an implicit .map(x => x.inc(2)) // again, no re-derivation

Slide 16

Slide 16 text

Flink Forward 2023 © Main Features - A u t o m a t i c c o m p i l e - t i m e d e r i v a t i o n o f F l i n k s e r i a l i z e r s f o r s i m p l e S c a l a a n d A l g e b r a i c D a t a T y p e s - Z e r o r u n t i m e r e f l e c t i o n - N o s i l e n t f a l l b a c k t o K r y o s e r i a l i z a t i o n ( c o m p i l e e r r o r ) - E x t d e n d a b l e w i t h c u s t o m s e r i a l i z e r s f o r d e e p l y - n e s t e d t y p e s - E a s y t o m i g r a t e : m i m i c s o l d S c a l a A P I - S c a l a 3 s u p p o r t flink-extended/flink-scala-api

Slide 17

Slide 17 text

Flink Forward 2023 © Scala tools for Flink Jobs development

Slide 18

Slide 18 text

Flink Forward 2023 © sbt assembly plugin To build a fat-jar: // project/plugins.sbt // build.sbt addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0") lazy val root = (project in file(".")) .settings( // optionally define a main class in case there are multiple assembly / mainClass := Some("org.example.MyMainClass"), … ) > sbt assembly > ls target/scala-3*/*.jar target/scala-3.3.0/my-flink-project-0.1.jar

Slide 19

Slide 19 text

Flink Forward 2023 © scala-cli It can compile, run, package and more multisetToString.scala: //> using scala "3" //> using dep "org.apache.flink:flink-table-api-java:1.15.4" import org.apache.flink.table.functions.ScalarFunction import org.apache.flink.table.annotation.DataTypeHint import java.util.{Map => JMap} class MultisetToString extends ScalarFunction: def eval( @DataTypeHint("MULTISET") mset: JMap[ Integer, String ] ) = mset.toString scala-cli package --jvm 11 \ multisetToString.scala \ -o udfs.jar \ --library -f J u s t o n e f i l e a n d s i n g l e c o m m a n d p a c k a g e s a U D F i n t o a J A R

Slide 20

Slide 20 text

Flink Forward 2023 © Ammonite REPL A d d d e p e n d e n c i e s L o c a l m o d e R e s u l t See more at https://ammonite.io @ import $ivy.`org.flinkextended::flink-scala-api:1.16.2_1.0.0` @ import $ivy.`org.apache.flink:flink-clients:1.16.2` @ import org.apache.flinkx.api.* @ import org.apache.flinkx.api.serializers.* @ val env = StreamExecutionEnvironment.getExecutionEnvironment env: StreamExecutionEnvironment = org.apache.flink.api.StreamExecutionEnvironment@1e226bcd @ env.fromElements(1, 2, 3, 4, 5, 6).filter(_ % 2 == 1).map(i => i * i).print() res5: org.apache.flink.streaming.api.datastream.DataStreamSink[Int] = org.apache.flink.streaming.api.datastream.DataStreamSink@71e2c6d8 @ env.execute() 4> 1 8> 25 6> 9 res6: common.JobExecutionResult = Program execution finished Job with JobID 5a947a757f4e74c2a06dcfe80ba4fde8 has finished. Job Runtime: 345 ms

Slide 21

Slide 21 text

Flink Forward 2023 © Jupyter Notebook with Scala kernel Jupyter+Almond provides similar user experience as Apache Zeppelin Almond A Scala kernel for Jupyter https://almond.sh/

Slide 22

Slide 22 text

Flink Forward 2023 © Flink Job Template Install SBT first, then run: > sbt new novakov-alexey/flink-scala-api.g8 Above command generates “WordCount” Flink job in Scala 3 name [My Flink Scala Project]: new-flink-app flinkVersion [1.17.1]: // press enter to use 1.17.1 Template applied in /Users/myhome/dev/git/./new-flink-app new-flink-app ├── build.sbt ├── project │ └── build.properties └── src └── main └── scala └── com └── example └── WordCount.scala

Slide 23

Slide 23 text

Flink Forward 2023 © Summary You can use latest Scala in your Flink jobs. There are 2 community wrappers available Scala eco-system provides better tools for Flink jobs development, debug and deployment: - Coursier, Scala-CLI, Ammonite, SBT, Scastie Large code-based in Scala remain maintainable unlike in Java FP paradigm allows to compose your Flink jobs easily Try to develop your next job with flink-scala-api More information on: https://www.scala-lang.org/ https://flink.apache.org/2022/02/22/scala-free-in-one-fifteen/ https://github.com/novakov-alexey/flink-sandbox

Slide 24

Slide 24 text

Flink Forward 2023 © Thank you Contact info ● https://novakov-alexey.github.io/ ● alexey at ververica.com