Slide 1

Slide 1 text

@hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 2

Slide 2 text

Talk's Origin • my side project • Kryptonite for Kafka ! • bit.ly/k4k-repo @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 3

Slide 3 text

My Use Case • cryptographic functions for Flink SQL / Table API • ! encrypt / decrypt " individual fields SELECT ssn_enc FROM ( VALUES (ENCRYPT_UDF('123-45-6789')) ) as t(ssn_enc); tEnv.fromValues(/*...*/).select(call("DECRYPT_UDF", $("ssn_enc"), "").as("ssn")); • status: ! experimental " @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 4

Slide 4 text

UDF Implementation Aspects @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 5

Slide 5 text

Java Versions • JDK 8 ! DON'T " - deprecated since Flink 1.15 • JDK 11 - since Flink 1.10 • JDK 17 - since Flink 1.18 • JDK 21 - beta with Flink 1.19 @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 6

Slide 6 text

Java Versions • UDF compiled for newer runtime (e.g. JDK 17) • Flink cluster uses older runtime (e.g. JDK 11) ⚡ java.lang.UnsupportedClassVersionError Exception in thread "..." java.lang.UnsupportedClassVersionError: com.g.h.f.talk.HelloUdf has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0 @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 7

Slide 7 text

Java Versions ! UDF works with JDK 11 ✅ # UDF fails with JDK 17 ❌ • JDK17+ enforces strong encapsulation • use --add-opens / --add-exports flags if applicable @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 8

Slide 8 text

Know your Java runtime(s) ! @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 9

Slide 9 text

Type Inference • Flink table ecosystem / SQL -> strongly typed • type mapping for UDF params & return values @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 10

Slide 10 text

Type Inference - Reflection • UDF input/output types automatically inferred public class EncryptUdf extends ScalarFunction { // -> input param type STRING // <- output type STRING public String eval(final String data) { // encryption and Base64 encoding of ciphertext here... } } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 11

Slide 11 text

Type Inference - Annotations • @DataTypeHint, @FunctionHint, @ArgumentHint public class EncryptUdf extends ScalarFunction { // -> input param type ANY // <- output type STRING public String eval( @DataTypeHint(inputGroup = InputGroup.ANY) final Object data) { // encryption and Base64 encoding of ciphertext here... } } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 12

Slide 12 text

Type Inference - Programmatic • define type inference in code • supports custom logic to derive I/O types • ! disables ! other inference mechanisms @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 13

Slide 13 text

Overloading • UDFs must provide evaluation methods • convention-based: •scalar function: 1+ public eval(...) method(s) •fails at runtime if violated -> ValidationException @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 14

Slide 14 text

Overloading Caveat • be careful with "extensive overloading" • a dozen+ overloadings can be problematic // ! just 14 basic overloadings caused a hang > 1 min for me public class MyOverloadedUdf extends ScalarFunction { public String eval(String myString) {/* ... */} //... 12 more overloadings public String eval(Double myDouble1, Double myDouble2) {/* ... */} } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 15

Slide 15 text

Determinism • UDFs signal whether or not to be deterministic • "same UDF input -> same UDF output" • determinism is assumed by default @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 16

Slide 16 text

Consequence of Determinism Deterministic UDF with constant params? -> UDF might only be called once during planning @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 17

Slide 17 text

Non-Determinism ? • override isDeterministic method if necessary public class MyNonDeterministicUdf extends ScalarFunction { //NOTE: defaults to true @Override public boolean isDeterministic() { return false } //some non-deterministic eval method(s) here... } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 18

Slide 18 text

Configuration more elborate UDFs might need: • initialization •override open(FunctionContext ctx) method • configuration •Flink job params •environment variables @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 19

Slide 19 text

! TEST, TEST, ... and TEST again ! @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 20

Slide 20 text

Unit Testing UDFs @ParameterizedTest(name = /* ... */) @CsvSource({ "developers, ES, Hola developers!", "Current'24, EN, Hello Current'24!" }) @DisplayName(/* ... */) void testHelloUdfWithTwoParams(String who, String lang, String result) { var udf = new HelloUdf(); assertEquals(result, udf.eval(who,lang)); } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 21

Slide 21 text

Integration Testing UDFs • use mini cluster via JUnit5 Extension @RegisterExtension public static final MiniClusterExtension MINI_CLUSTER_RESOURCE = new MiniClusterExtension( new MiniClusterResourceConfiguration.Builder() .setNumberTaskManagers(1) .setNumberSlotsPerTaskManager(1) .build() ); @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 22

Slide 22 text

Integration Testing UDFs • prepare environment and register UDF @BeforeAll static void setUp(@InjectMiniCluster MiniCluster miniCluster) { ENV = StreamExecutionEnvironment.getExecutionEnvironment(); T_ENV = /* ... */ T_ENV.createTemporaryFunction("HELLO_UDF", HelloUdf.class); } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 23

Slide 23 text

Integration Testing UDFs • create table data, execute query, and verify results @Test public void testHelloUdfWithValidInputs() throws Exception { T_ENV.createTemporaryView("input_table",T_ENV.fromValues(/* ... */)); var outputTable = T_ENV.sqlQuery(""" SELECT HELLO_UDF(who,lang) AS udf_output FROM input_table """); /* ... */ assertThat(results, containsInAnyOrder(/* ... */); } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 24

Slide 24 text

E2E Testing UDFs • @Testcontainers + @Container with ComposeContainer @Testcontainers public class MyEndtoEndUdfsTest { @Container static ComposeContainer COMPOSE_CONTAINER = new ComposeContainer(new File("compose.yaml")) .withExposedService("jobmanager", 8081, Wait.forListeningPort()); /* ... */ } @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 25

Slide 25 text

Code Repo • basic UDF examples for the discussed aspects @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 26

Slide 26 text

! Let's chat @ the Decodable booth @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 27

Slide 27 text

Wanna learn more? @hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX

Slide 28

Slide 28 text

@hpgrahsl — decodable.co | Current 2024 | Sept. 18th | Austin, TX