ASE 2017 Keynote: Software Engineering without Borders

ASE 2017 Keynote: Software Engineering without Borders

Slides for my ASE 2017 keynote, November 20, 2017.

http://ase2017.org/keynotes

https://www.youtube.com/watch?v=BxizjBmHXdA&feature=youtu.be

DevOps approaches software engineering by advocating the removal of borders between development and operations. DevOps emphasizes operational resilience, continuous feedback from operations back to development, and rapid deployment of features developed. In this talk we look at selected (automation) aspects related to DevOps, based on our collaborations with various industrial partners. For example, we explore (automated) methods for analyzing log data to support deployments and monitor REST API integrations, and (search-based) test input generation for reproducing crashes and testing complex database queries. We close by looking at borders beyond those between development and operations, in order to see whether there are other borders we need to remove in order to strengthen the impact of software engineering research, such as sharing data and tools, collaborating with software engineering practice, opening up from our research silos, and adopting open access publication policies.

The keynote is based on joint work with former and current master and PhD students from Delft University of Technology, and co-workers in industry and academia.

5414357263ef617c7ab3eb067c22412d?s=128

Arie van Deursen

November 02, 2017
Tweet

Transcript

  1. Software Engineering without Borders Arie van Deursen Delft University of

    Technology @avandeursen November 2, 2017, Urbana-Champaign, USA ASE 2017 keynote address Images: Wikipedia 1
  2. ! Jeroen Castelein ! Peter Evers " Mozhan Soltani #

    Annibale Panichella $ # Maurício Aniche ! Joop Aué ! Maikel Lobbezoo 2 ! Rick Wieman ! Sicco Verwer " Pouria Derakhshanfar % Xavier Devroey ! Felienne Hermans % Andy Zaidman & Georgios Goussios # ' Alberto Bacchelli
  3. (Y)OUR SOFTWARE ENGINEERING CURIOSITY Mohammad Abdullah https://flic.kr/p/pz5X9 3

  4. DEV OPS xebia.com 4

  5. Context: Payments Payment Provider 5

  6. Payment Provider • # payment methods 250+ • # currencies

    150+ • Revenue in 2016 $727 million • Revenue growth 2016 99% • $$ processed in 2016 $90 billion • Volume growth 2016 80% • # employees (end 2016) 500 • Revenue per employee 1.5 million Commons.wikimedia, munttoren 6 https://www.adyen.com/press-and-media/press-releases/press-release-detail/2017/adyen-discloses-2016-revenues-of-727-million-growing-99-year-over-year
  7. Some of Adyen’s 4000+ Merchants 7

  8. Merchant’s Single Point of Failure? Payment Provider A 8

  9. Merchant’s Solution: Competitive A/B Deployment Payment Providers B A 9

    KPI to optimize: Conversion Rate
  10. One Billion Log Lines a Day: Monitoring using the ELK

    Stack • Logstash: Unify different logging sources • Elastic Search: Search and filter large log data • Kibana: Visual interactive dashboard Image credit: www.neteye-blog.com 10
  11. Poll: Java Exceptions in a Payment System Your payment system

    in production generates 1 billion log lines per day. How many errors / warnings with exceptions do you expect to see? A. None. “We have a zero exception policy.” B. 1 Thousand. “Some exceptions are unavoidable.” C. 1 Million. “Most exceptions are harmless.” D. 1 Billion. “We only log errors and exceptions.” 11 Adyen, Nov 2016: ~1,000,000 per day
  12. Log Analysis in Research 1. Abstraction Seeing the bigger picture

    2. Detection Finding errors and anomalies 3. Enhancing More effective logging practices 4. Parsing Extracting message templates 5. Modeling Message ordering and protocols 6. Scaling Dealing with many many logs 7. Visualizing Put the eyes to use Joop Aué, Maurício Aniche, Arie van Deursen. Log Analysis from A-Z: A Literature Survey. TU Delft, 2017, in preparation. Identified 73 core papers. Venues: SIGOPS SOSP ACM TOCS Usenix WASL Usenix OSDI IEEE ISSRE ICSE 12
  13. Logness: Extract, Cluster, Tag • Extract features: • application name,

    class name, exception • Remove details: • literal numbers, (encryption) hashes • Cluster: • Same payment identifier in 15min window • Same features • longest common substring above threshold • Tag as severe, known (monitored, bug), and unknown 13 Peter Evers, Maurício Aniche, Arie van Deursen, Maikel Lobbezoo. Finding Relevant Errors in Massive Payment Log Data. TU Delft, 2017, in preparation. 1,000,000 err log lines --> 250 exception clusters
  14. Logness Dashboard 14

  15. 15 Zoom in to individual exception cluster

  16. Issues Found in Research Period 16 First credit cards starting

    with 95 and with 19 digits: long overflow! Merchant configuration error. All payments stalled. Discovered before being noticed by merchant Firewall configuration problem: Server unreachable. Discovered before merchants were assigned to this server Server update incompatible with legacy point of sale terminals. Customer could buy, but merchant received no money. IOException triggered.
  17. Complex API Integration • Payment APIs are complex • Integration

    faults are easily made • Merchant needs assistance with API usage • Merchant may not notice mistakes • 2.5M http error responses per month • What can we learn from them? 17
  18. 2.5M Errors to 69 Fault Cases 18 { } API

    consumer End user API Provider Third party FC12 Contract not found Replication latency. FC24 iDEAL communication error FC42 Invalid paRes from issuer FC1 ApplePay token amount-mismatch FC5 Billing address problem (Country 0) FC62 Unable to decrypt data FC14 Could not read XML stream. FC15 Couldn’t parse expiry year Joop Aué, Maurício Aniche, Arie van Deursen, Maikel Lobbezoo An Exploratory Study on Faults in Web API Integration in a Large-Scale Payment Company . TU Delft, 2017. Submitted.
  19. 11 Common Causes for API Error Reponses 19

  20. 27% 60% 13% 17% 44% 28% 11% 17% 44% 17%

    22% 16% 42% 37% 5% 14% 38% 29% 19% 14% 36% 36% 14% 44% 17% 39% 21% 14% 21% 43% 33% 14% 52% 29% 21% 50% 5% 19% 24% 52% Invalid user input (n=18) Missing user input (n=15) Expired request data (n=14) Invalid request data (n=21) Missing request data (n=18) Insufficient permissions (n=19) Double processing (n=14) Configuration (n=21) Missing server data (n=18) Internal error (n=21) Third party error (n=14) 0 25 50 75 100 Percentage Response None Low Moderate High 20 What impact did you experience for each error cause?
  21. API Integration Recommendations • API Consumer: • Actually handle all

    error codes returned by provider • API Producer: • Document which error codes can be returned under what circumstances • Offer easy-to-use test harness for integrations created by consumers • Make explicit which error codes are ‘retriable’ • Enrich returned error codes with actionable info (for consumer or end user) • Offer Error Dashboard for API consumer offering live insight in error handling • API Researcher: • Rethink API usability in this context 21
  22. Payment Terminals Payment Provider 22

  23. Point of sale terminal variability • Card brands • Card

    entry modes (chip, swipe, contactless) • Currency conversion • Loyalty points • Validation type (pin, signature) • Issuer responses (declined, insufficient balance) • Cancellations (shopper, merchant) 23
  24. Passive learning Identifying system behavior from observations, and representing it

    in the smallest possible model. 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved Rick Wieman, Maurício Aniche, Willem Lobbezoo, Sicco Verwer and Arie van Deursen. An Experience Report on Applying Passive Learning in a Large-Scale Payment Company. ICSME Industry Track, 2017 https://automatonlearning.net/ DFASAT / FlexFringe Heule & Verwer, ICGI 2010 24
  25. Use Inferred Models to Analyze: Bugs in Test Phase •

    Terminal asked for PIN • AND asked for signature • Domain expert noted this unwanted behavior in inferred model. • Fixed before it went into production 25
  26. Use Inferred Models to Analyze: Differences Between Card Brands 26

    Twice as many chip errors Informed merchant about issue.
  27. Use Inferred Models to Analyze: Time out problems 27 Improved

    performance under network instability by adding targeted retry mechanism Timeout
  28. 28

  29. What to Do with a (Logged) Exception? The Anatomy of

    a Stack Trace 29 java.lang.IllegalArgumentException: org.apache.commons.collections.map.AbstractHashedMap.<init> (AbstractHashedMap.java:142) org.apache.commons.collections.map.AbstractHashedMap.<init> (AbstractHashedMap.java:127) org.apache.commons.collections.map.AbstractLinkedMap.<init> (AbstractLinkedMap.java:95) org.apache.commons.collections.map.LinkedMap.<init> (LinkedMap.java:78) org.apache.commons.collections.map.TransformedMap.transformMap (TransformedMap.java:153) org.apache.commons.collections.map.TransformedMap.putAll (TransformedMap.java:190) Exception name Class name Method name Specific line Potential Cause of the Exception
  30. EvoCrash: Search-Based Crash Reproduction • Leverage EvoSuite • Devise fitness

    function rewarding 1. Target offending method call reached 2. Same exception thrown 3. Stack trace similarity • Produce initial random tests focused on stack trace • Mutation and cross-over operators taking stack traces into account 30 Evaluate Fitness Selection Guided Crossover Initialize Population Fit? Guided Mutation Create next generation Re- insertion Mozhan Soltani, Annibale Panichella, Arie van Deursen: A guided genetic algorithm for automated crash reproduction. ICSE 2017: 209-220 Search-Based Crash Reproduction and its Impact on Debugging. In preparation
  31. Application to 54 Open Source Crash Reports 31 9 ,

    e n - 3 0 - r 0 t s e TABLE 1 The 54 real-world bugs used in our study. Project Bug IDs Versions Exceptions Priority Ref. ACC 4, 28, 35, 2.0 - 4.0 NullPointer (5), Major (10) [11] 48, 53, 68, UnsupportedOperation (1), Minor (2) [63] 70,77, 104, IndexOutOfBounds, (1) 331, 277, 411 IllegalArgument(1), ArrayIndexOutOfBounds, (2) ConcurrentModification, (1) IllegalState (1), ANT 28820, 33446, 34722, 1.6.1 - 1.8.2 ArrayIndexOutOfBounds (3), Critical (2) [11] 34734, 36733, 38458, NullPointer (17), Major (5) [41] 38622, 41422, 42179, StringIndexOutOfBounds (1) Medium (14), 43292, 44689, 44790, 46747, 47306, 48715, 49137, 49755, 49803, 50894, 51035, 53626 LOG 29, 43, 509, 10528, 1.0.2 - 1.2 NullPointer (17), Critical (1) [11] 10706, 11570, 31003, InInitializerError (1) Major (4) [41] 40212, 41186, 44032, Medium (11) 44899, 45335, 46144, Enhanc. (1) 46271, 46404, 47547, Blocker (1) 47912, 47957 ActiveMQ 5035 5.9 ClassCastExecption (1), Major (1) [41] DnsJava 38 2.1 ClassCastException (1), N/A (1) [41] JFreeChart 434 1.0 NullPointerException (1), N/A (1) [41] [11] = STAR Chen & Kim, TSE 2015 Symbolic execution [63] = muCrash Xuan et al, ESEC FSE 2015 Test suite mutation [41] = JCHARMING Nayrolles et al, JSEP 2016 Model checking
  32. 32

  33. 33

  34. Search-Based Testing for … SQL Queries • Monitored applications often

    have persistent state • How can we support testing such data-intensive applications? • How do we find the right data, to test (complex) database queries? • Explore search-based techniques! 34 SELECT * FROM `account` LEFT JOIN `user` AS `assignedUser` ON account.assigned_user_id = assigneduser.id LEFT JOIN `user` AS `modifiedBy` ON account.modified_by_id = modifiedby.id LEFT JOIN `user` AS `createdBy` ON account.created_by_id = createdby.id LEFT JOIN `entity_email_address` AS `emailAddressesMiddle` ON account.id = emailaddressesmiddle.entity_id AND emailaddressesmiddle.deleted = '0' AND emailaddressesmiddle.primary = '1' AND emailaddressesmiddle.entity_type = 'Account' LEFT JOIN `email_address` AS `emailAddresses` ON emailaddresses.id = emailaddressesmiddle.email_address_id AND emailaddresses.deleted = '0' LEFT JOIN `entity_phone_number` AS `phoneNumbersMiddle` ON account.id = phonenumbersmiddle.entity_id AND phonenumbersmiddle.deleted = '0' AND phonenumbersmiddle.primary = '1' AND phonenumbersmiddle.entity_type = 'Account' LEFT JOIN `phone_number` AS `phoneNumbers` ON phonenumbers.id = phonenumbersmiddle.phone_number_id AND phonenumbers.deleted = '0' WHERE (( account.name LIKE 'Besha%' OR account.id IN (SELECT entity_id FROM entity_email_address JOIN email_address ON email_address.id = entity_email_address.email_address_id WHERE entity_email_address.deleted = 0 AND entity_email_address.entity_type = 'Account' AND email_address.deleted = 0 AND email_address.name LIKE 'Besha%') )) AND account.deleted = '0'
  35. MC/DC Coverage on SQL Queries Find data that lets each

    condition independently set outcome • T1: items { id: 42 }, invoice { id: 42, amount: 1000, taxFree: true } ✅ • T2: items { id: 42 }, invoice { id: 42, amount: 1001, taxFree: false } ✅ • T3: items { id: 42 }, invoice { id: 42, amount: 1000, taxFree: false } ❌ • T4: items { id: 42 }, invoice { id: 41, amount: 1000, taxFree: true } ❌ 35 66 67 68 69 70 71 72 73 74 75 76 77 78 79 quired to fully test a SQL query grows together with the complexity of the query itself. Consider a SQL query that joins two tables and contains two predicates: SELECT items .* FROM invoice JOIN items ON invoice.id = items.invoiceid WHERE amount > 1000 OR taxFree = true This SQL query returns all items of invoices that either have amount greater than 1000 or that are tax free. To test this query rigorously, the tester may want to exercise all “branches” that can be executed in this SQL query. Thus, the tester needs to target 1) the JOIN relation, 2) the left predicate (amount > 1000) to be evaluated to true, 3) the right predicate (taxFree = true) to be evaluated to true. For that to happen, the two tables should contain the right
  36. Tuya et al, 2010: Full Predicate Coverage • MC/DC coverage

    on conditions used in SQL Queries. • Coverage target = simplified query that should yield at least one row • Establish coverage potential of given (hand-written) data sets • Still needed: Actual test data! 36
  37. EvoSQL: Test Data for SQL Queries • Fitness of populated

    database to yield one row for given target query • Step level: Number of steps still to be executed in query • Step distance: How close we are to particular condition • Specialized for comparison operators, string operators, SQL-specific operators (IS NULL, EXISTS) • Mutation: Delete, change, insert rows • Crossover: Swap values between rows • Seeding: All constants in query 37 Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panichella and Arie van Deursen Search-Based Test Data Generation for SQL Queries. Submitted, 2017
  38. EvoSQL Evaluation Approach • Collected all queries from 4 systems

    (ERP, CRM, e-learning) • 2000 unique queries • State of the art: Use SAT solving ☹️ Can’t handle 85% of our queries (nested queries, string manipulation, JOINs) ☹️ No implementation available • Compared EvoSQL to pure random and biased (seeded) random • Implemented on top of HSQLDB and SQLFpc-as-a-service 38
  39. EvoSQL Evaluation Outcomes • 100% of targets covered for 98%

    of the queries • On average 86% covered for the remaining 2% • Usually within seconds • Outperforms biased and random alternatives: • Biased random can handle 90% of simple queries (< 10 rules) • Biased random often finds no solution for complex queries (10+ rules) 39 Coverage Rules 1-2 3-4 5-6 7-8 9-10 11-15 16-20 21+ # Queries 656 382 408 346 114 107 51 71
  40. (Y)OUR SOFTWARE ENGINEERING CURIOSITY Mohammad Abdullah https://flic.kr/p/pz5X9 40

  41. Vision: Informed, data-driven, software development with continuous feedback between operations

    and development 41 Mission: to develop and advance the theory and technology to make this happen
  42. “Operations” is under-represented in software engineering research (and ops is

    all about automation!) 42 Border I: Operations
  43. Border II: Our Discipline To make true progress, we software

    engineering researchers must embrace other disciplines 43 Image: Doc Searls, https://flic.kr/p/9o5AEY
  44. Border III: Practice True understanding in software engineering research comes

    from seeing it work in practice 44 Commons.wikimedia, full orchestra
  45. The Adyen – Delft Collaboration Model • Mutual trust •

    Long term (10+ years) • Mutual understanding of needs: win -- win • Challenging and engaging environment for MSc / PhD thesis projects • Mutual education: devs to students, researchers to industry • Embrace openness • Academics willing to get their hands dirty 45
  46. Border IV: My Data, My Tools Demonstrable progress in software

    engineering research requires shared data and tools 46 https://en.wikipedia.org/wiki/File:3d10_fm_de_vilafranca.jpg
  47. The GHTorrent Success Story Scalable, query-able, offline mirror of data

    from GitHub REST API. • Data since 2012 • 16 TB json in MongoDB • 5 billion rows in MySQL • 2 GB per hour collected • 350 users in > 200 institutions • 200+ papers • 3 data mining challenges • Confirmed commercial uses by Microsoft, Deloitte, Blackduck • Support gifts from Microsoft 47 Georgios Gousios, Diomidis Spinellis, Martin Pinzger, Arie van Deursen, Andy Zaidman, Margaret-Anne Storey, Alberto Bacchelli. MSR 2012, MSR 2013, ICSE 2014, ICSE 2015, ICSE 2016 GHTorrent dataset, pull-based software development, integrator and developer perspectives.
  48. Border V: The Public at Large We must engage with

    the public at large to demonstrate our relevance 48 http://fightingillini.com/
  49. 49 Exemplary Research Blogging Adrian Coyler

  50. Border VI: Our Publishing Culture We cannot expect tax paying

    software engineers to enjoy our research if we make them pay twice for it. We need open access. 50 Image: Tony Armstrong, https://flic.kr/p/XhvQiF
  51. Software Engineering without Borders Arie van Deursen Delft University of

    Technology @avandeursen November 2, 2017, Urbana-Champaign, USA ASE 2017 keynote address Images: Wikipedia 51