Upgrade to Pro — share decks privately, control downloads, hide ads and more …

M3: Semantic API Migrations

Avatar for Bruce Collie Bruce Collie
September 18, 2020
220

M3: Semantic API Migrations

Avatar for Bruce Collie

Bruce Collie

September 18, 2020
Tweet

Transcript

  1. M3: Semantic API Migrations ASE 2020 Bruce Collie, Philip Ginsbach†,

    Jackson Woodruff, Ajitha Rajan & Michael O’Boyle University of Edinburgh †GitHub Software UK [email protected] baltoli.github.io
  2. API Migration f() { X::a(); X::b(); } f'() { Y::c();

    } CODE TRANSFORMATION PRESERVES BEHAVIOUR [[ f ]] = [[ f’ ]] USES LIBRARY X ⟶ USES LIBRARY Y
  3. Pattern-Based public class StringIsEmpty { @BeforeTemplate boolean equalsEmptyString(String string) {

    return string.equals(""); } @BeforeTemplate boolean lengthEquals0(String string) { return string.length() == 0; } @AfterTemplate @AlsoNegation boolean optimizedMethod(String string) { return string.isEmpty(); } } CODE: REFASTER (errorprone.info) • Code as template • Requires expert knowledge • Migrations are trusted
  4. • Changes as template • Requires training data • Migrations

    are approximate Change-Based GITHUB SEARCH (“replace library”)
  5. Problem Statement • Perform API migration ◦ … without expert

    knowledge ◦ … without large training datasets ◦ … without the library source code How? Use library semantics to identify migrations
  6. Example strncpy(buf, arg, n); buf[n-1] = '\0'; strlcpy(buf, arg, n);

    for(int i = 0; i < n; ++i) { buf[i] = arg[i]; } buf[n-1] = '\0'; CWE-126 (BUFFER OVER-READ) ERROR-PRONE API MANUALLY WRITTEN IMPROVED API
  7. M3: Model f() { X::a(); X::b(); } X::a() { a

    1 (); a 2 (); } X::b() { while(b 1 ()) b 2 (); } f() { a 1 (); a 2 (); while(b 1 ()) b 2 (); } ORIGINAL CODE FULLY INLINED RECOVERED CODE MODEL
  8. M3: Match f() { a 1 (); a 2 ();

    while(b 1 ()) b 2 (); } SYNTHESIZED IMPLEMENTATION Y::c() { a 2 (); while(b 1 ()) b 2 (); } FULLY INLINED FUNCTION MATCH REGIONS
  9. M3: Migrate f() { a 1 (); Y::c(); } SYNTHESIZED

    IMPLEMENTATION Y::c() { a 2 (); while(b 1 ()) b 2 (); } MIGRATED FUNCTION REPLACE BODY
  10. Model: I/O Examples X::a(i 0 , p 0 ) ⟶

    r 0 X::a(i 1 , p 1 ) ⟶ r 1 X::a(i n , p n ) ⟶ r n ... float X::a(int p, float *q); COLLECT EXAMPLES LIBRARY FUNCTION + SIGNATURE • I/O examples specify behaviour • Allows for automated testing • Only observed behaviour ◦ Branch coverage experiments
  11. Model: Sketching Loop(p,q) { Index(q) Instr() Instr() } float X::a(int

    p, float *q); LIBRARY FUNCTION + SIGNATURE FRAGMENTS Loop() Index() Seq() If() Instr() COMPOSE SKETCH
  12. Model: Synthesis float X::a(int p, float *q); LIBRARY FUNCTION +

    SIGNATURE SKETCH PROGRAM • Enumerative search for solutions • Constraints on type propagation • Test results using I/O examples define void @gemv(i32 %M, i32 %N, float %alpha, fl float %beta, float* %y) { entry: br label %header exit: ret void header: br label %loop_check body_pre: %0 = getelementptr float, float* %y, i32 %iter %1 = load float, float* %0 %2 = fmul float %beta, %1 %3 = fsub float %alpha, %1 %4 = fmul float %3, %9 br label %header1 body_post: %5 = getelementptr float, float* %y, i32 %iter %6 = fmul float %1, %1 %7 = fsub float %1, %beta %8 = fmul float %3, %20 store float %20, float* %5 br label %loop_check loop_exit: br label %exit loop_check: %body_post %9 = phi float [ %alpha, %header ], [ %2, %body_ %iter = phi i32 [ 0, %header ], [ %next_iter, %b %next_iter = add i32 %iter, 1 %10 = icmp slt i32 %iter, %M br i1 %10, label %body_pre, label %loop_exit header1: br label %loop_check5 body_pre2: %11 = getelementptr float, float* %x, i32 %iter6 %12 = load float, float* %11 %13 = mul i32 %iter, %N %14 = add i32 %iter6, %13 %15 = getelementptr float, float* %A, i32 %14 %16 = load float, float* %15 %17 = fmul float %alpha, %12 %18 = fmul float %17, %16 %19 = fadd float %18, %20 br label %body_post3 body_post3: br label %loop_check5 loop_exit4: br label %body_post
  13. Match: CAnDL • Pattern-based search over LLVM code • Existing

    toolchain • SMT solver for efficient search EXAMPLES CANDL PATTERN GENERALISE
  14. Match { "entry": "loop.ph", "loop" : { "iter": "%0", "exit":

    "%bb.0" }, ... } MATCH RESULT • Key-value mapping IDs to SSA IR values • Function parameter / return values identified specially • Specifies a region that can be migrated
  15. Migrate “Insert call to function Y::c with arguments p (line

    5), q (line 6)” { "entry": "loop.ph", "loop" : { "iter": "%0", "exit": "%bb.0" }, ... } MATCH RESULT Automated replacement (using a compiler pass)
  16. Research Questions • (RQ1) FEASIBILITY OF SYNTHESIS • (RQ2) CORRECTNESS

    OF SYNTHESIS • (RQ3) ACCURACY OF MATCH • (RQ4) USEFULNESS OF MIGRATE
  17. Evaluation Corpora Library Domain string.h String manipulation StrSafe GLM Graphics

    / mathematical MathFu BLAS Ti DSP Signal Processing ARM DSP Application Domain Lines ffmpeg Video / media 1,061,655 TeXinfo Typesetting 76,755 XRDP Network 75,921 Coreutils Utility 66,355 GraphicsGems Graphics 46,619 Darknet Deep Learning 21,299 Caffepresso Deep Learning 14,602 Nanvix Operating Sys. 11,226 ETR Game 2,399 Android FS* Operating Sys. 1,840
  18. Results: Synthesis • I/O examples adequately test synthesised code •

    Synthesis performance improves on existing techniques
  19. Results: Match f() { X::a(); X::b(); } f() { a

    1 (); a 2 (); while(b 1 ()) b 2 (); } ORIGINAL CODE FULLY INLINED 100% • Match recovers inlined code accurately • … as well as 178 instances of user code matching APIs
  20. Results: Migrate Application Migrations Category Library ⟶ L’ Code ⟶

    L’ L + C ⟶ L’ N 2,247 2,025 178 44 • Original and new libraries differ in all cases • 200+ cases with contextual semantics
  21. Results: Summary • (RQ1) STATE-OF-THE ART SYNTHESIS • (RQ2) TESTING

    STRATEGY VALID • (RQ3) INLINING PROCESS SOUND • (RQ4) 2,000+ MIGRATIONS FOUND
  22. Future work • Extended type system & synthesis library •

    User studies for tool usability • Incomplete migrations as seeds • Integration with other migration tools