Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MEET DB2 (Declarative Code Analysis) @ VLDB 2010

Reynold Xin
September 01, 2010

MEET DB2 (Declarative Code Analysis) @ VLDB 2010

Meet DB2 is a standalone tool used to estimate the level of compatibility between the server side parts of an application (the DB schema) and DB2 for LUW. It has been widely used by IBM sales and partners.

Reynold Xin

September 01, 2010
Tweet

More Decks by Reynold Xin

Other Decks in Programming

Transcript

  1. MEET DB2 Automated Database Migration Evaluation presented by Peter J.

    Haas on behalf of Reynold S. Xin Patrick Dantressangle Sam Lightstone William McLaren Steve Schormann Maria Schwenger IBM
  2. Agenda • Problem Definition and Motivation • MEET DB2 Design

    Overview ◦ Design goals ◦ Components • Experiments • Conclusions and Future Work
  3. Migration Evaluation • "The problem with standards: there are so

    many of them" ◦ ANSI SQL and ANSI SQL/PSM ◦ Oracle SQL and PL/SQL ◦ Sybase and Microsoft TSQL • User may want to switch vendors ◦ Price, performance. features, partnership • Question: How hard would it be to migrate?
  4. • Painful and (semi-)manual • Long questionnaires • Subjective, approximate,

    incomplete • Inaccurate evaluations • Needs skills in source and target DBMS Traditional Process
  5. • MEET DB2 ◦ Migration Evaluation Estimation Tool • Input:

    ◦ DDL + procedural logic • Output ◦ Summary report ◦ Detailed report ▪ Tables ▪ Triggers ▪ Indexes ▪ Views ▪ Functions ▪ Procedures ▪ Packages ▪ ... ◦ For each issue: ▪ # occurrences ▪ Location ▪ Suggested fix
  6. Agenda • Problem Definition and Motivation • MEET DB2 Design

    Overview ◦ Design goals ◦ Components • Experiments • Conclusions and Future Work
  7. Design Overview • Design Goals ◦ Dramatic reduction in evaluation

    time ◦ Provide both summary statistics and detailed list of issues ◦ Usable by technical sales staff ◦ Easy extensibility • Components ◦ Preprocessor: Deals with irrelevant code and substitutions ◦ Parser: Generates "Abstract Syntax Tree" (AST) in XML format ◦ Knowledge base: Catalog of known issues in XQuery format ◦ Analyzer: Executes XQueries to find problem instances ◦ Report generator: Transforms XML report into HTML report Note: Actual source and target DBs not needed
  8. Preprocessor • Strips irrelevant DB commands ◦ E.g., connect to

    myDB • Handles basic substitutions: > sqlplus scott/f00bar @myScript.sql 'test parameter' myTable select '&1' from &2; exit; @myScript.sql: define tableNm = 'customers2008' select * from &tableNm; Or
  9. Parser • Generates abstract syntax tree (AST) in XML format

    • Simplified AST XML schema FUNCTION is_greater_than( left NUMBER (20); right NUMBER (20); ) BEGIN IF left > right THEN RETURN true; ELSE RETURN false; END IF; END; <node type="function"> <name>is_greater_than</name> <node type="list" name="arguments"> <node type="argument"> <field name="type">int</field> <field name="name">left</field> </node> ... </node> <node type="list" name="statements"> <node type="if"> <field name="expr">left > right</field> <node type="list" name="statements"> <node type="return"> <expression>true</expression> </node> </node> ... ...
  10. Why XML? • XML is inherently tree based; models AST

    (tree) well • AST can be queried using XQuery ◦ Well suited to tree structures ◦ Dynamic execution of new XQueries ▪ No MEET re-compilation needed when knowledge base expanded ◦ Can leverage existing high performance XQuery engines • Can easily format reports via XSL transformations ◦ Potential connection to migration tools
  11. Analyzer and Knowledge Base • Knowledge Base is a catalog

    of known migration issues, centrally maintained • Defines an issue as XQuery, with a small set of augmented APIs <problem name="function_in_java"> { reportNode($ast//node[@type="function" and @language="java"]) } </problem>
  12. Queries in Three Patterns • Basic: a node whose attribute

    matches ... • Child: a node whose children match ... <problem name="update_in_before_trigger"> { for $trigger in $ast//node[@type="trigger"] for $sql_statement in $trigger//node[@type="sql"] where $trigger/point/text()="before" and $sql_statement/keyword/text()= ("update", "insert", "delete") return reportNode($sql_statement) } </problem> • Aggregate:a node whose attributes/children aggregated matches ... <stat name="avg_loc"> { return avg( lineOfCode( ($ast//node[@type="function"]))) } </stat>
  13. Agenda • Problem Definition and Motivation • MEET DB2 Design

    Overview ◦ Design goals ◦ Components • Experiments • Conclusions and Future Work
  14. Experiments • Extensibility Evaluation ◦ Measure time to update knowledge

    base (by junior and senior DBA) • Performance Evaluation ◦ Measure evaluation time (MEET vs expert) on 18 test cases • Accuracy Evaluation ◦ Measure report accuracy (MEET vs expert) on 18 test cases
  15. Extensibility Evaluation • Three Steps ◦ Step 1: identify the

    pattern of the problem ◦ Step 2: implement the query to match the pattern ◦ Step 3: test and re-iterate • Four Tasks (1-yr Junior DBA vs 15-yr Senior DBA) ◦ Task 1: modifying an existing query ◦ Task 2: adding a query by expanding an existing query ◦ Task 3: designing a new query from scratch ◦ Task 4: removing a query
  16. Conclusions and Future Work • MEET DB2 brings benefits in

    multiple dimensions ◦ Dramatically reduced time for evaluation ◦ Improved accuracy over human analysis ◦ Reduced skill requirements for migration analysis ◦ Generates useful information for product planning • Extensibility ◦ Can be easily extended ◦ Currently covers only DDL and procedural logic ◦ Future work: extend to support application logic (Java, C++) • Cost or "effort" estimation ◦ Currently coarse grained ◦ Future work: explore alternative cost models ▪ E.g. qualitative vs quantitative