Slide 1

Slide 1 text

M3: Semantic API Migrations ASE 2020 Bruce Collie, Philip Ginsbach†, Jackson Woodruff, Ajitha Rajan & Michael O’Boyle University of Edinburgh †GitHub Software UK [email protected] baltoli.github.io

Slide 2

Slide 2 text

API Migration f() { X::a(); X::b(); } f'() { Y::c(); } CODE TRANSFORMATION PRESERVES BEHAVIOUR [[ f ]] = [[ f’ ]] USES LIBRARY X ⟶ USES LIBRARY Y

Slide 3

Slide 3 text

Pattern-Based public class StringIsEmpty { @BeforeTemplate boolean equalsEmptyString(String string) { return string.equals(""); } @BeforeTemplate boolean lengthEquals0(String string) { return string.length() == 0; } @AfterTemplate @AlsoNegation boolean optimizedMethod(String string) { return string.isEmpty(); } } CODE: REFASTER (errorprone.info) ● Code as template ● Requires expert knowledge ● Migrations are trusted

Slide 4

Slide 4 text

● Changes as template ● Requires training data ● Migrations are approximate Change-Based GITHUB SEARCH (“replace library”)

Slide 5

Slide 5 text

Problem Statement ● Perform API migration ○ … without expert knowledge ○ … without large training datasets ○ … without the library source code How? Use library semantics to identify migrations

Slide 6

Slide 6 text

Example strncpy(buf, arg, n); buf[n-1] = '\0'; strlcpy(buf, arg, n); for(int i = 0; i < n; ++i) { buf[i] = arg[i]; } buf[n-1] = '\0'; CWE-126 (BUFFER OVER-READ) ERROR-PRONE API MANUALLY WRITTEN IMPROVED API

Slide 7

Slide 7 text

Approach M3 MODEL ⟶ MATCH ⟶ MIGRATE

Slide 8

Slide 8 text

M3: Model f() { X::a(); X::b(); } X::a() { a 1 (); a 2 (); } X::b() { while(b 1 ()) b 2 (); } f() { a 1 (); a 2 (); while(b 1 ()) b 2 (); } ORIGINAL CODE FULLY INLINED RECOVERED CODE MODEL

Slide 9

Slide 9 text

M3: Match f() { a 1 (); a 2 (); while(b 1 ()) b 2 (); } SYNTHESIZED IMPLEMENTATION Y::c() { a 2 (); while(b 1 ()) b 2 (); } FULLY INLINED FUNCTION MATCH REGIONS

Slide 10

Slide 10 text

M3: Migrate f() { a 1 (); Y::c(); } SYNTHESIZED IMPLEMENTATION Y::c() { a 2 (); while(b 1 ()) b 2 (); } MIGRATED FUNCTION REPLACE BODY

Slide 11

Slide 11 text

Model: I/O Examples X::a(i 0 , p 0 ) ⟶ r 0 X::a(i 1 , p 1 ) ⟶ r 1 X::a(i n , p n ) ⟶ r n ... float X::a(int p, float *q); COLLECT EXAMPLES LIBRARY FUNCTION + SIGNATURE ● I/O examples specify behaviour ● Allows for automated testing ● Only observed behaviour ○ Branch coverage experiments

Slide 12

Slide 12 text

Model: Sketching Loop(p,q) { Index(q) Instr() Instr() } float X::a(int p, float *q); LIBRARY FUNCTION + SIGNATURE FRAGMENTS Loop() Index() Seq() If() Instr() COMPOSE SKETCH

Slide 13

Slide 13 text

Model: Synthesis float X::a(int p, float *q); LIBRARY FUNCTION + SIGNATURE SKETCH PROGRAM ● Enumerative search for solutions ● Constraints on type propagation ● Test results using I/O examples define void @gemv(i32 %M, i32 %N, float %alpha, fl float %beta, float* %y) { entry: br label %header exit: ret void header: br label %loop_check body_pre: %0 = getelementptr float, float* %y, i32 %iter %1 = load float, float* %0 %2 = fmul float %beta, %1 %3 = fsub float %alpha, %1 %4 = fmul float %3, %9 br label %header1 body_post: %5 = getelementptr float, float* %y, i32 %iter %6 = fmul float %1, %1 %7 = fsub float %1, %beta %8 = fmul float %3, %20 store float %20, float* %5 br label %loop_check loop_exit: br label %exit loop_check: %body_post %9 = phi float [ %alpha, %header ], [ %2, %body_ %iter = phi i32 [ 0, %header ], [ %next_iter, %b %next_iter = add i32 %iter, 1 %10 = icmp slt i32 %iter, %M br i1 %10, label %body_pre, label %loop_exit header1: br label %loop_check5 body_pre2: %11 = getelementptr float, float* %x, i32 %iter6 %12 = load float, float* %11 %13 = mul i32 %iter, %N %14 = add i32 %iter6, %13 %15 = getelementptr float, float* %A, i32 %14 %16 = load float, float* %15 %17 = fmul float %alpha, %12 %18 = fmul float %17, %16 %19 = fadd float %18, %20 br label %body_post3 body_post3: br label %loop_check5 loop_exit4: br label %body_post

Slide 14

Slide 14 text

Match: CAnDL ● Pattern-based search over LLVM code ● Existing toolchain ● SMT solver for efficient search EXAMPLES CANDL PATTERN GENERALISE

Slide 15

Slide 15 text

Match { "entry": "loop.ph", "loop" : { "iter": "%0", "exit": "%bb.0" }, ... } MATCH RESULT ● Key-value mapping IDs to SSA IR values ● Function parameter / return values identified specially ● Specifies a region that can be migrated

Slide 16

Slide 16 text

Migrate “Insert call to function Y::c with arguments p (line 5), q (line 6)” { "entry": "loop.ph", "loop" : { "iter": "%0", "exit": "%bb.0" }, ... } MATCH RESULT Automated replacement (using a compiler pass)

Slide 17

Slide 17 text

Research Questions ● (RQ1) FEASIBILITY OF SYNTHESIS ● (RQ2) CORRECTNESS OF SYNTHESIS ● (RQ3) ACCURACY OF MATCH ● (RQ4) USEFULNESS OF MIGRATE

Slide 18

Slide 18 text

Evaluation / Experiments SYNTHESISE INLINE FUNCTIONS TEST MATCH TEST MIGRATE

Slide 19

Slide 19 text

Evaluation Corpora Library Domain string.h String manipulation StrSafe GLM Graphics / mathematical MathFu BLAS Ti DSP Signal Processing ARM DSP Application Domain Lines ffmpeg Video / media 1,061,655 TeXinfo Typesetting 76,755 XRDP Network 75,921 Coreutils Utility 66,355 GraphicsGems Graphics 46,619 Darknet Deep Learning 21,299 Caffepresso Deep Learning 14,602 Nanvix Operating Sys. 11,226 ETR Game 2,399 Android FS* Operating Sys. 1,840

Slide 20

Slide 20 text

Results: Synthesis ● I/O examples adequately test synthesised code ● Synthesis performance improves on existing techniques

Slide 21

Slide 21 text

Results: Match f() { X::a(); X::b(); } f() { a 1 (); a 2 (); while(b 1 ()) b 2 (); } ORIGINAL CODE FULLY INLINED 100% ● Match recovers inlined code accurately ● … as well as 178 instances of user code matching APIs

Slide 22

Slide 22 text

Results: Migrate Application Migrations Category Library ⟶ L’ Code ⟶ L’ L + C ⟶ L’ N 2,247 2,025 178 44 ● Original and new libraries differ in all cases ● 200+ cases with contextual semantics

Slide 23

Slide 23 text

Results: Summary ● (RQ1) STATE-OF-THE ART SYNTHESIS ● (RQ2) TESTING STRATEGY VALID ● (RQ3) INLINING PROCESS SOUND ● (RQ4) 2,000+ MIGRATIONS FOUND

Slide 24

Slide 24 text

Future work ● Extended type system & synthesis library ● User studies for tool usability ● Incomplete migrations as seeds ● Integration with other migration tools