concise syntax. Support of scripting ◦ Unique language features with support of FP and OOP ◦ Compiles to JVM, JavaScript and Native code ◦ Spark, Flink, Akka, Kafka: all are using Scala Scala is more than 15 years old programming language with mature eco-system of tools, libraries and many books @main def hello() = println("Hello, World!")
is coupled Scala 2.11, 2.12 std. library Flink Modules in Java/Scala DataStream Scala/Java API Compile-time dependency Scala 2.13, 3.x std. library Flink Job in Scala Scala 2.11, 2.12 std. library Switch to new Scala is not possible Apache Flink modules User app modules implies
is no longer tightly coupled Shaded Scala 2.12 std. library Flink Modules in Java/Scala DataStream Java API Compile-time dependency Scala 2.13, 3.x std. library Flink Job in Scala Apache Flink modules User app modules Switch to newer Scala is possible
version is “shaded” and does not clash with user’s Scala • To use Scala 2.13 or 3.x remove flink-scala JAR from the Flink distribution: • Then use Java API from your Scala code $ rm flink-dist/lib/flink-scala* @main def job = val env = StreamExecutionEnvironment.getExecutionEnvironment env .fromElements(1, 2, 3, 4, 5, 6) .filter(_ % 2 == 1).map(i => i * i).print() env.execute() However, users have to provide Scala serializers. See solution further
add support of Scala 2.13 was failed (see FLINK-13414 Jira) 1. Users to develop in Scala further via Java API - Pros: freedom of choice of any Scala version - Cons: it requires to define your own serializers 2. All Flink Scala APIs are deprecated and will be removed in future Flink versions 3. Flink Internal Scala modules will be kept or rewritten in Java (if possible)
“import” for the DataStream API import org.apache.flink.api.scala._ object Main extends App { val env = ExecutionEnvironment.getExecutionEnvironment val text = env.fromElements( "To be, or not to be,--that is the question:--", "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune", "Or to take arms against a sea of troubles,") val counts = text .flatMap(value => value.split("\\s+")) .map(value => (value,1)) .groupBy(0) .sum(1) counts.writeAsCsv("output.txt", "\n", " ") env.execute("Scala WordCount Example") } https://index.scala- lang.org/apache/flink/artifacts/flink- streaming-scala/1.17.1?binary- version=_2.12 2.12
Flink 1. flink-extended/flink-scala-api: a fork of Flink Scala bindings originally created by Findify (great effort of Roman Grebennikov) 2. ariskk/flink4s: Scala 3.x wrapper for Apache Flink 3. Direct* usage of Flink Java API "org.apache.flink" % "flink-streaming-java" % ”x.y.z” *Caution: you need to bring own type serializers
Foo(x: Int) { def inc(a: Int) = copy(x = x + a) } // defined explicitly for caching purpose on compilation. // If not defined, then it is derived automatically implicit lazy val fooTypeInfo: TypeInformation[Foo] = deriveTypeInformation[Foo] env .fromElements(Foo(1),Foo(2),Foo(3)) .map(x => x.inc(1)) // taken as an implicit .map(x => x.inc(2)) // again, no re-derivation
o m a t i c c o m p i l e - t i m e d e r i v a t i o n o f F l i n k s e r i a l i z e r s f o r s i m p l e S c a l a a n d A l g e b r a i c D a t a T y p e s - Z e r o r u n t i m e r e f l e c t i o n - N o s i l e n t f a l l b a c k t o K r y o s e r i a l i z a t i o n ( c o m p i l e e r r o r ) - E x t d e n d a b l e w i t h c u s t o m s e r i a l i z e r s f o r d e e p l y - n e s t e d t y p e s - E a s y t o m i g r a t e : m i m i c s o l d S c a l a A P I - S c a l a 3 s u p p o r t flink-extended/flink-scala-api
fat-jar: // project/plugins.sbt // build.sbt addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0") lazy val root = (project in file(".")) .settings( // optionally define a main class in case there are multiple assembly / mainClass := Some("org.example.MyMainClass"), … ) > sbt assembly > ls target/scala-3*/*.jar target/scala-3.3.0/my-flink-project-0.1.jar
and more multisetToString.scala: //> using scala "3" //> using dep "org.apache.flink:flink-table-api-java:1.15.4" import org.apache.flink.table.functions.ScalarFunction import org.apache.flink.table.annotation.DataTypeHint import java.util.{Map => JMap} class MultisetToString extends ScalarFunction: def eval( @DataTypeHint("MULTISET<INT>") mset: JMap[ Integer, String ] ) = mset.toString scala-cli package --jvm 11 \ multisetToString.scala \ -o udfs.jar \ --library -f J u s t o n e f i l e a n d s i n g l e c o m m a n d p a c k a g e s a U D F i n t o a J A R
e p e n d e n c i e s L o c a l m o d e R e s u l t See more at https://ammonite.io @ import $ivy.`org.flinkextended::flink-scala-api:1.16.2_1.0.0` @ import $ivy.`org.apache.flink:flink-clients:1.16.2` @ import org.apache.flinkx.api.* @ import org.apache.flinkx.api.serializers.* @ val env = StreamExecutionEnvironment.getExecutionEnvironment env: StreamExecutionEnvironment = org.apache.flink.api.StreamExecutionEnvironment@1e226bcd @ env.fromElements(1, 2, 3, 4, 5, 6).filter(_ % 2 == 1).map(i => i * i).print() res5: org.apache.flink.streaming.api.datastream.DataStreamSink[Int] = org.apache.flink.streaming.api.datastream.DataStreamSink@71e2c6d8 @ env.execute() 4> 1 8> 25 6> 9 res6: common.JobExecutionResult = Program execution finished Job with JobID 5a947a757f4e74c2a06dcfe80ba4fde8 has finished. Job Runtime: 345 ms
then run: > sbt new novakov-alexey/flink-scala-api.g8 Above command generates “WordCount” Flink job in Scala 3 name [My Flink Scala Project]: new-flink-app flinkVersion [1.17.1]: // press enter to use 1.17.1 Template applied in /Users/myhome/dev/git/./new-flink-app new-flink-app ├── build.sbt ├── project │ └── build.properties └── src └── main └── scala └── com └── example └── WordCount.scala
in your Flink jobs. There are 2 community wrappers available Scala eco-system provides better tools for Flink jobs development, debug and deployment: - Coursier, Scala-CLI, Ammonite, SBT, Scastie Large code-based in Scala remain maintainable unlike in Java FP paradigm allows to compose your Flink jobs easily Try to develop your next job with flink-scala-api More information on: https://www.scala-lang.org/ https://flink.apache.org/2022/02/22/scala-free-in-one-fifteen/ https://github.com/novakov-alexey/flink-sandbox