Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing Complex Applications for PHP7

Adam Baratz
February 17, 2017

Testing Complex Applications for PHP7

Wayfair is one of the world's largest online destinations for the home. Our storefront is a very large PHP application -- 3.5M LoC interacting with a wide array of extensions -- that serves 2M daily visitors. So we were delighted when our upgrade to PHP7 went without a hitch. It worked so well because of a test plan that covered a wide range of strategies and tools. This case study will combine a walkthrough of this project with a practical tour of PHP testing tools, from PHPUnit to GDB.

Adam Baratz

February 17, 2017
Tweet

Other Decks in Technology

Transcript

  1. Wayfair and PHP7 • Page execution time dropped by about

    50% • CPU utilization dropped by about 30% For the details on why: https://nikic.github.io/2015/05/05/Internal-value-representation-in-PHP-7-part-1.html https://nikic.github.io/2015/06/19/Internal-value-representation-in-PHP-7-part-2.html
  2. Wayfair Before PHP7 • 3.5M LoC over 28K files, mostly

    our own code, but some third-party • Coding conventions spanning several versions of PHP • 66 PHP extensions • Many officially supported • Some third-party • Some modified third-party • Some totally custom
  3. What We’ll Cover • Risk as “what you don’t know”

    • Identifying risk in complex applications • Common risk for infrastructure changes • Testing as risk management • Manual and automated tools • AKA software engineering as information gathering • Product management strategies • Identifying goals • Kicking off and running big projects • Choosing the most important thing to work on in service of that goal • AKA delivering value to the business
  4. Common Risk for Infrastructure Changes • Focused more on the

    desired outcome than on the path there • Divided attention among collaborators • Yours may not be the only such project in flight • Your systems will have in-between states • Something doesn’t work, it works once • You’ll live and die by your ability to monitor • Might not be worth it • Disaster is more likely than success
  5. Managing Common Risk • Always communicate in terms of value

    to the business • “Performance saves money,” not “spaceship operators!” • Don’t promise too much, too soon • Be in touch with broader org to understand potential collisions • Work the room • Share successes • Reinforce idea that project is headed in the right direction • Identify allies at different levels of the org • Harvest byproducts, opportunistically
  6. Planning • Never expect to account for everything • Map

    out the moving parts and dependencies • Distinguish the knowns from the unknowns • Guide future thinking
  7. Planning Tools • phpinfo() / php -i • Documentation •

    https://github.com/php/php-src/blob/PHP-7.0.0/UPGRADING • http://php.net/manual/en/migration70.incompatible.php • https://wiki.php.net/rfc/remove_deprecated_functionality_in_php7 • https://github.com/gophp7/gophp7-ext/wiki/extensions-catalog
  8. Todo (30 October 2015) • Investigate apc extension. • Investigate

    cgi-fcgi extension. • Investigate whether bug with dba extension will affect us. • Remove dependency on ereg extension. Find repo references via /\b(ereg_replace|ereg|eregi_replace|eregi|split|spliti|sql_regcase)(/. As of Oct 30, there were references to ereg() and split(). • Update ketama extension. • Investigate memcache extension. • Remove mhash from puppet config. The only places in the php repo which could use it favor using the hash extension. • Investigate status of update to mongo extension. • Remove dependency on mssql extension. • Remove dependency on mysql extension. • Investigate whether open bugs with openssl extension will affect us. • Merge our PDO patches into master. It's not clear how much pdo_dblib has been tested by core maintainers, so we should test closely. Alternatively, SQL Relay may have supplanted pdo_dblib by the time we're ready to move forward with PHP7. • Investigate Zend OPcache extension. • Approve and commit compatibility updates for wfstring.
  9. Don’t Prematurely Discard Risk • Start simple, start naïve •

    Don’t assume anything until you do the research • Start big, start placeholder • You can reduce “Investigate apc extension” later • Start by overexplaining • Link to sources in case anyone needs to work backwards
  10. Prioritizing: ereg vs. mssql ereg • 7 functions • One-to-one

    replacements essentially in place via PCRE and string functions • No more than a dozen references, each of which could be updated easily mssql • 30 functions • Website would be largely unusable without DB access • We already started transitioning to PDO • Some functionality wasn’t replicated • Some behaviors were subtly different
  11. Focus First On The Biggest Risk • Choose the big,

    hard, unknown thing • That’s where all the risk lives • i.e., everything that could derail your project • Get the clearest view of project scope, as soon as possible • Guide further future thinking
  12. Todo (30 October 2015) • Investigate apc extension. • Investigate

    cgi-fcgi extension. • Investigate whether bug with dba extension will affect us. • Remove dependency on ereg extension. Find repo references via /\b(ereg_replace|ereg|eregi_replace|eregi|split|spliti|sql_regcase)(/. As of Oct 30, there were references to ereg() and split(). • Update ketama extension. • Investigate memcache extension. • Remove mhash from puppet config. The only places in the php repo which could use it favor using the hash extension. • Investigate status of update to mongo extension. • Remove dependency on mssql extension. • Remove dependency on mysql extension. • Investigate whether open bugs with openssl extension will affect us. • Merge our PDO patches into master. It's not clear how much pdo_dblib has been tested by core maintainers, so we should test closely. Alternatively, SQL Relay may have supplanted pdo_dblib by the time we're ready to move forward with PHP7. • Investigate Zend OPcache extension. • Approve and commit compatibility updates for wfstring.
  13. Engineering Time: None Execution Time: Short Actionability: Varies Followup Time:

    Varies, but can be highly parallelizable Planning Fact Sheet
  14. Think Diminishing Risk, Not Deadlines • Use dates to say

    when something should be started, not finished • Calendar dates are a Procrustean bed • Opportunity for team to feel they messed up • Opportunity for stakeholders to dangle a knife • Team should focus on solving problems and delivering value • Stakeholders should know your pace through regular conversation • You want them to be partners in the process, not adversaries • Talk in rough terms: days, weeks, months • Talk in ranges
  15. Always Be Planning Continuously… • Review task list (single source

    of truth) • Add and reorder items • Add information as it’s obtained • Be specific, cite sources • Test certainty of previously made assertions • Ensure format of “map” exposes context and risk to group • Facilitate active participation across group
  16. Changes to variable handling (PHP7) Indirect variable, property and method

    references are now interpreted with left-to-right semantics. $$foo['bar']['baz'] // interpreted as ($$foo)['bar']['baz'] $foo->$bar['baz'] // interpreted as ($foo->$bar)['baz'] $foo->$bar['baz']() // interpreted as ($foo->$bar)['baz']() Foo::$bar['baz']() // interpreted as (Foo::$bar)['baz']() To restore the previous behavior add explicit curly braces: ${$foo['bar']['baz']} $foo->{$bar['baz']} $foo->{$bar['baz']}() Foo::{$bar['baz']}()
  17. Static Analysis • grep (e.g., for deprecated functions) • php

    -l (linter) • php7mar (detect potential compatibility issues) • Phan (AST) For many more: https://github.com/exakat/php-static-analysis-tools
  18. Sample php7mar output Scanning testcases.php Including file extensions: php Processed

    148 lines contained in 1 files. Processing took 0.048094034194946 seconds. # critical#### testcases.php * variableInterpolation * Line 2: `$$foo['bar']['baz']; //Interpreted as ($$foo)['bar']['baz']` * Line 3: `$foo->$bar['baz']; //Interpreted as ($foo->$bar)['baz']` * Line 4: `$foo->$bar['baz'](); //Interpreted as ($foo->$bar)['baz']()` * Line 5: `Foo::$bar['baz'](); //Interpreted as (Foo::$bar)['baz']()` * Line 6: `global $$foo->bar; //The global keyword now only accepts simple variables.` * duplicateFunctionParameter * Line 15: `function foo($a, $b, $unused, $unused) { /*...*/ }` * reservedNames ...
  19. Evaluating Static Analysis Tools • Start with “weakest” analysis •

    https://github.com/etsy/phan/wiki/Tutorial-for-Analyzing-a-Large-Sloppy- Code-Base • Spot check output • Consider… • Output volume • Output value • Ownership cost (ad hoc runs or someone’s ongoing responsibility?)
  20. Engineering Time: Short Execution Time: Short (consider as CI step)

    Actionability: Generates concrete punch list, but there could be false positives Followup Time: Varies, but can be highly parallelizable Static Analysis Fact Sheet
  21. Finding the Next Biggest Risk • Be alert to momentum

    • Momentum implies reduced risk • Reduced risk implies the project is winding down • Managing execution is different from managing the project • Even if there’s a lot of execution to manage • Don’t confuse tasks with objectives • Invert any underlying assumptions • In this case: when can we start running code? • Look for simplest opportunity to start sizing next biggest risk
  22. Prioritizing: when can we start running code? What are all

    the ways we can run code? • Web requests (php-fpm) • Batch processes (i.e., php -f) • Automated tests • Unit tests • Integration tests • Acceptance tests (browser tests)
  23. From: PHP7 Engineer Sent: Mar 16, 2016 To: PHP7 Team;

    Chief Architect Subject: Exciting PHP7 news Exciting proof of something everyone already assumes – Adam and I ran PHP5.6 and PHP7 head to head, and the results are that PHP7 is significantly faster than PHP5.6. Methodology We compared running times while unit testing feature_detect_test.php (for those who are unfamiliar, this is a regex-heavy set of tests that is notoriously long-running). PHP5.6 completed the tests in an average of 14.37 seconds, whereas PHP7 took an average of 2.9 seconds. Conclusions This is primarily a proof-of-concept while working in a vacuum, but it is promising that we are heading in the right direction. Other Observations PHP7 used up significantly more memory, as reported by PHPUnit (10MB vs 5.75MB). The initial iteration of PHP5.6 was quite the outlier – removing that reduces the average to about 13.44 seconds. Next Steps Our next project is to get PHP7 to render actual pages on our site – so far this has been throwing 502s while the server segfaults. Steps we had to take to get the unit tests to run: • Disable codeception + dependencies • Manually build DOM + ctype extensions • Update MPDF to latest development branch Below you’ll find a few charts outlining the runtimes of each individual iteration. […]
  24. Learning From Tests • Accept small victories • Mostly not

    working is okay • Identify components that are not working the most • Rearticulate next most important goal • AKA Always Be Planning
  25. Managing Common Risk • Always communicate in terms of value

    to the business • “Performance saves money,” not “spaceship operators!” • Don’t promise too much, too soon • Be in touch with broader org to understand potential collisions • Work the room • Share successes • Reinforce idea that project is headed in the right direction • Identify allies at different levels of the org • Harvest byproducts, opportunistically
  26. Automated Testing • Unit tests • Acceptance tests (browser tests)

    • Integration tests Most automated tests can be built on the same framework (PHPUnit).
  27. Sample PHPUnit output PHPUnit 6.0.0 by Sebastian Bergmann and contributors.

    ...F Time: 0 seconds, Memory: 5.75Mb There was 1 failure: 1) DataTest::testAdd with data set #3 (1, 1, 3) Failed asserting that 2 matches expected 3. /home/sb/DataTest.php:9 FAILURES! Tests: 4, Assertions: 4, Failures: 1.
  28. Engineering Time: Moderate – to create Execution Time: Short Actionability:

    Failures require some interpretation by engineers, but frameworks usually let you annotate them in test output Followup Time: Moderate to extensive – all engineers should monitor tests at build time and contribute to ongoing maintenance Automated Testing Fact Sheet
  29. Test for Evaluating and Managing Risk • Identify risk by...

    • Understanding what you know • Admitting what you don’t know • Creating a shared “map” that your team can build on • Discussing concerns early and often • Tests should… • Tell you as much as possible • As soon as possible • About the biggest sources of risk • And be updated to cover what you learn along the way
  30. Rollout • Project began on 30 October 2015 • First

    test runs in March 2016 • First production deploy planned for June 2016
  31. Identify Vague Hypotheses • Start with biggest components of system

    • Rank by least implausible • Identify means of validating hypotheses • Seek consistent reproducibility
  32. Extension Memory Management? • Type mismatches involving size_t • Side

    effect of incomplete changes for updated C APIs • Reread code to identify misses • Invalid read/writes that Valgrind would catch • Had previously tested extensions • Would suggest lacking test coverage • Shared memory corruption (very vague) • Had previously seen issues with PCRE JIT • Would have to reproduce error state with GDB
  33. Use Automated Tests To Find Memory Issues • Use php7dev

    vagrant box (or similar) for controlled environment • Use Gcov to evaluate test coverage • Create additional tests as needed • Run all tests with Valgrind
  34. Automated Testing For Extensions • Use make test or php

    run-tests.php • Failures recorded in test runner output • Failed tests will produce artifacts • Use .sh artifact to rerun test more easily • Or to run test with GDB • Tools documented at https://qa.php.net • Read run-tests.php to fill in any gaps
  35. Example .phpt Test --TEST— Mustache::parse() member function --SKIPIF— <?php if(

    !extension_loaded('mustache') ) die('skip '); ?> --FILE— <?php $m = new Mustache(); $tmpl = $m->parse('{{test}}'); var_dump(get_class($tmpl)); ?> --EXPECT— string(11) "MustacheAST"
  36. Example Test Runner Output ===================================================================== PHP : /home/vagrant/php-src/sapi/cli/php PHP_SAPI :

    cli PHP_VERSION : 7.2.0-dev ZEND_VERSION: 3.2.0-dev PHP_OS : Linux - Linux php7dev 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 INI actual : /home/vagrant/php-src/tmp-php.ini More .INIs : --------------------------------------------------------------------- PHP : /home/vagrant/php-src/sapi/phpdbg/phpdbg PHP_SAPI : phpdbg PHP_VERSION : 7.2.0-dev ZEND_VERSION: 3.2.0-dev PHP_OS : Linux - Linux php7dev 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 INI actual : /home/vagrant/php-src/tmp-php.ini More .INIs : --------------------------------------------------------------------- CWD : /home/vagrant/php-src Extra dirs : VALGRIND : Not used ===================================================================== Running selected tests. PASS Mustache::parse() member function [001.phpt] ===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) -------- Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) --------------------------------------------------------------------- Time taken : 0 seconds =====================================================================
  37. Example With Memory Leak [...] --------------------------------------------------------------------- CWD : /home/vagrant/php-src Extra

    dirs : VALGRIND : valgrind-3.10.0 ===================================================================== Running selected tests. LEAK Mustache::parse() member function [001.phpt] ===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) -------- Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests leaked : 1 (100.0%) (100.0%) Tests passed : 0 ( 0.0%) ( 0.0%) --------------------------------------------------------------------- Time taken : 4 seconds ===================================================================== ===================================================================== LEAKED TEST SUMMARY --------------------------------------------------------------------- Mustache::parse() member function [001.phpt] =====================================================================
  38. Example Valgrind Output (001.mem) ==13733== Invalid write of size 4

    ==13733== at 0xF849E55: zim_Mustache_parse (mustache_mustache.cpp:850) ==13733== by 0x91ED13: ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER (zend_vm_execute.h:1097) ==13733== by 0x8C60DA: execute_ex (zend_vm_execute.h:429) ==13733== by 0x92115F: zend_execute (zend_vm_execute.h:474) ==13733== by 0x87F7E3: zend_execute_scripts (zend.c:1474) ==13733== by 0x81EE9F: php_execute_script (main.c:2533) ==13733== by 0x9232E9: do_cli (php_cli.c:990) ==13733== by 0x44E7BB: main (php_cli.c:1378)
  39. Use Gcov To Evaluate Test Coverage 1. phpize 2. ./configure

    CFLAGS="--coverage" CXXFLAGS="--coverage" LDFLAGS="- -coverage" # may vary slightly for your extension 3. make clean all 4. lcov --directory . --zerocounters # cleanup previous run, if any 5. # run tests 6. lcov --directory . --capture --output-file coverage.info 7. genhtml --output-directory lcov_html coverage.info For php-src, use existing build targets: https://wiki.php.net/doc/articles/writing-tests
  40. Capture Error State In GDB • Find simplest means to

    reproduce • Ideally script: gdb --args php –f [.php file] • Or core file: gdb php-fpm [core file] • But maybe HTTP request (and live process): gdb php-fpm [pid] • But first you’ll need a debug build • May behave differently! • Script GDB to replay debugging steps Excellent tutorial: http://www.unknownroad.com/rtfm/gdbtut/
  41. The Culprit: APCu • Invalid write to memory region containing

    string size • Solved problem with latest extension version
  42. What We Could’ve Tried • More thorough review of known

    issues with third-party extensions • Better scrutiny of extensions that operated on shared memory • Had already disabled PCRE JIT • Assumed APCu extension was “safe” because we didn’t write it • Ensure we’d deployed latest version of all extensions • Strived earlier for better sense of “correctness”?
  43. Replay Testing • Sample access logs • Feed to cURL

    • Examine logs • Consider combining with stress testing (ApacheBench)
  44. Working Through Unknown Unknowns • Identify vague hypotheses • Break

    into smaller pieces • Overdocument, overcommunicate • Look for anything you recognize in the noise • Rotate through problem solving strategies • Pair problem solving • Keep multiple irons in the fire
  45. Managing Expectations Around Unknowns • Be sensitive to stress and

    frustration • Reframe around thrill of solving big problems • Call out indications that the problem is getting resolved • Be able to identify when you have more information than you used to • Keep working the room • Always be ready to pull the plug
  46. Fringe Benefits • Expand scope of what’s “known” • Develop

    new skills and tools • New tests? • New monitoring opportunities? • New things to automate? • Engineers like solving big problems!
  47. Rollout Considerations • Datacenter move was in-flight simultaneously • Testing

    focused on customer-facing website • No one wants to impact revenue
  48. Rollout • Schedule with lead time for hard dates around

    datacenter move • Be ready to pause project if the two will collide • Start with non-production environments • Then focus on customer-facing website • The most value would be in speeding up that experience • Use production load balancer to control traffic served by PHP7 • But first agree on monitoring plan • Make changes from “war room” • Let changes bake in • Overcommunicate plan to broader org • Not impacting others is different from not surprising them
  49. Your Systems Will Have In-Between States • Embrace simultaneous realities

    • Think in terms of backwards- and forwards-compatibility • Attachment to a single state is a form of risk • Detachment facilitates separating workflows Harold Pinter: “A thing is not necessarily either true or false; it can be both true and false.”
  50. Rollout, Continued • Confirm metrics • “Well, now my capacity

    planning is done for the year!” • Celebrate! • Continue slow roll to cover all website traffic • Begin slow roll on other services
  51. Rollout, Continued? • Project began on 30 October 2015 •

    First test runs in March 2016 • First production deploy planned for June 2016 • Last blocking issues resolved in July 2016 • Development and staging environments updated in July 2016 • First production deploy in August 2016 • All servers on PHP7 on 7 February 2017
  52. Test for Evaluating and Managing Risk • Identify risk by...

    • Understanding what you know • Admitting what you don’t know • Creating a shared “map” that your team can build on • Discussing concerns early and often • Tests should… • Tell you as much as possible • As soon as possible • About the biggest sources of risk • And be updated to cover what you learn along the way
  53. Test to Continuously Challenge Worldview • Developers shouldn't just think

    about whether something works • Something doesn't work, it works just once • We initially assumed metrics were wrong!
  54. What We Covered • Risk as “what you don’t know”

    • Identifying risk in complex applications • Common risk for infrastructure changes • Testing as risk management • Manual and automated tools • AKA software engineering as information gathering • Product management strategies • Identifying goals • Kicking off and running big projects • Choosing the most important thing to work on in service of that goal • AKA delivering value to the business