Slide 1

Slide 1 text

Maurício F. Aniche [email protected] @mauricioaniche “Testing in Production”: A gentle walk on software monitoring research

Slide 2

Slide 2 text

https://xkcd.com/1428/ Contemporary (tech) companies: • Move fast • Are highly complex

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Monitoring as a first-class citizen!

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

• 3,500 customers, including eBay, Uber, Spotify, and Facebook. • In 2015, Adyen achieved a valuation of $2.3 billion, making it the sixth largest European unicorn, and the largest Dutch unicorn. • In April 2017, the company was granted a European banking license. • 1000+ employees from 80+ countries. • Adyen has processed more than 159 billion euros in payments.

Slide 7

Slide 7 text

1 1 1 2 1 2 1 1 1 3 1 1 5 1 6 3 1 5 2 4 2 8 1 4 2 2 6 1 3 6 5 2 5 5 1 '17 '16 '09 '18 '12 '14 '03 '04 '13 '15 '11 '10 '07 '05 '08 '98 '00 '06 '92 Year Publication Type industry track paper journal research track paper

Slide 8

Slide 8 text

/RJ 3URFHVVRU /RJ,QIUDVWUXFWXUH 5XQQLQJ$SSOLFDWLRQ %XLOGDQG'HSOR\PHQW ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @O>CDIB?<« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿I<=G@OJA« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @NOJMDIBN« /RJGDWD ORJ JHQHUDWLRQ ([HFXWLRQ(QYLURQPHQW 6RXUFHFRGH ORJ WUDQVPLVVLRQ 6WRUDJH 23V(QJLQHHU /RJ(QJLQHHULQJ /RJ$QDO\VLV 'DVKERDUGV 'HYHORSHU @NKJIN@?

Slide 9

Slide 9 text

Contemporary Logging Framework Log Engineering Log Infrastructure Log Analysis Log Platforms Anti-patterns in logging code Implementation of log statements Empirical studies Parsing Storage Anomaly detection Security and privacy Root cause analysis Failure prediction Software testing Model inference and invariant mining Reliability and dependability End-to-end analysis tools

Slide 10

Slide 10 text

Log Engineering • Deals with the logging decisions from the development side. • What to log? • Where to log? • How severe should it be? • 3 – 6% of code are related to logs. • Those decisions matter a lot in practice! • Research focus has been: • Anti-patterns in logging code • Implementation of log statements • Empirical Studies in Log Engineering /RJ 3URFHVVRU /RJ,QIUDVWUXFWXUH 5XQQLQJ$SSOLFDWLRQ %XLOGDQG'HSOR\PHQW ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @O>CDIB?<« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿I<=G@OJA« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @NOJMDIBN« /RJGDWD ORJ JHQHUDWLRQ ([HFXWLRQ(QYLURQPHQW 6RXUFHFRGH ORJ WUDQVPLVVLRQ 6WRUDJH 23V(QJLQHHU /RJ(QJLQHHULQJ /RJ$QDO\VLV 'DVKERDUGV 'HYHORSHU @NKJIN@?

Slide 11

Slide 11 text

Common log engineering problems • Improper log message • Improper log levels • Runtime issues (e.g., logging a NULL variable) • Configuration errors • Missing log statements • Overwhelming volume of data Shang, W., Nagappan, M., Hassan, A. E., & Jiang, Z. M. (2014, September). Understanding log lines using development knowledge. In 2014 IEEE International Conference on Software Maintenance and Evolution (pp. 21-30). IEEE. Hassani, M., Shang, W., Shihab, E., & Tsantalis, N. (2018). Studying and detecting log-related issues. Empirical Software Engineering, 23(6), 3248-3280.

Slide 12

Slide 12 text

Can ML help? Can we learn ‘where’ to log? Zhu, J., He, P., Fu, Q., Zhang, H., Lyu, M. R., & Zhang, D. (2015, May). Learning to log: Helping developers make informed logging decisions. In Proceedings of the 37th International Conference on Software Engineering-Volume 1 (pp. 415-425). IEEE Press.

Slide 13

Slide 13 text

Can DL help? • Can we learn not only from the structure, but also from the semantics of code? • Like DeepBugs or Code2Vec. • First exploration shows that a “simple language models” won’t work. (Best we got was balanced accuracy=.71, recall=.48, but this was a first exploration by our MSc students; more to come)

Slide 14

Slide 14 text

Log Infrastructure • The tool support necessary to make the further analysis feasible. • Logs are highly unstructured and in massive quantities. • Research has been focusing on: • Log Parsing • Log Storage and Compression • We need more work here! /RJ 3URFHVVRU /RJ,QIUDVWUXFWXUH 5XQQLQJ$SSOLFDWLRQ %XLOGDQG'HSOR\PHQW ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @O>CDIB?<« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿I<=G@OJA« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @NOJMDIBN« /RJGDWD ORJ JHQHUDWLRQ ([HFXWLRQ(QYLURQPHQW 6RXUFHFRGH ORJ WUDQVPLVVLRQ 6WRUDJH 23V(QJLQHHU /RJ(QJLQHHULQJ /RJ$QDO\VLV 'DVKERDUGV 'HYHORSHU @NKJIN@?

Slide 15

Slide 15 text

log.info(“Customer “ + customer + “ paying ” + paymentValue); [2019-02-03 15:43:24] [MagicPayment.java][L456] Customer Maurício paying 235.67 Class name Line number Schipper, D., Aniche, M., & van Deursen, A. (2019). Tracing Back Log Data to its Log Statement: From Research to Practice. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR): Proceedings (pp. 545-549). [8816773] IEEE. Short paper.

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

In practice… • Large companies can’t really log the class and the line number that originates a log message. • Developers ‘grep’ the source code • Can we automatically detect where origin of the log lines? • W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 117–132. ACM, 2009.

Slide 18

Slide 18 text

log.info(“Customer “ + customer + “ ID “ + id); à Customer .* ID \d*

Slide 19

Slide 19 text

RQ1: Accuracy • We collect 100k messages from a week day. • These 100k messages pointed to 676 different locations in the source code. • 95% CL, 5% CI sample = 245 links. • Manual investigation = 97.6% accuracy (239 out of 245 correct links) • 99.8% accuracy in the original paper (two projects: HDFS and Darkstar, millions of log lines, # of log statements not specified).

Slide 20

Slide 20 text

RQ2: Performance • Related work: not really reported Building ASTs of the entire codebase is the most expensive part of the process.

Slide 21

Slide 21 text

When does it fail? • JSON-based logs • The developer logs a JSON (that is created in runtime) and thus the log statement is simply “log.info(json)”, making our template inaccurate. • Unknown logging method • Developers create their own logging methods, which our tool can't recognize. • Log strings created on-the-fly • Some log messages are too complex and developers create them by means of multiple line of code (e.g. String log = “content1”; log = log “content 2”, ...), which makes the analysis harder. • Related work: “Almost all of these (failed) messages contain long string variables.” => same for us.

Slide 22

Slide 22 text

Logs of Complex API Integrations • Payment APIs are complex • Integration faults are easily made • Merchant needs assistance with API usage • Merchant may not notice mistakes • 2.5M http error responses per month (back in 2016!) • What can we learn from them? 22 Aué, J., Aniche, M., Lobbezoo, M., & van Deursen, A. (2018). An Exploratory Study on Faults in Web API Integration in a Large-Scale Payment Company. In ICSE-SEIP '18: 40th International Conference on Software Engineering: Software Engineering in Practice Track (pp. 13-22). Association for Computing Machinery (ACM).

Slide 23

Slide 23 text

11 Common Causes for API Error Reponses

Slide 24

Slide 24 text

11 Common Causes for API Error Reponses Integrators are the main responsible for API integration problems!

Slide 25

Slide 25 text

11 Common Causes for API Error Reponses Maybe these are the exceptions one cares more about?

Slide 26

Slide 26 text

Log Analysis • Knowledge acquisition from log data for a specific purpose. • The most explored area in the topic: • Anomaly detection • Security and privacy • Root cause analysis • Failure prediction • Software testing • Model inference and invariant mining • Reliability and dependability /RJ 3URFHVVRU /RJ,QIUDVWUXFWXUH 5XQQLQJ$SSOLFDWLRQ %XLOGDQG'HSOR\PHQW ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @O>CDIB?<« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿I<=G@OJA« ½»¼ÄŠ»ÀŠ½¾¼Àˆ»¿ @NOJMDIBN« /RJGDWD ORJ JHQHUDWLRQ ([HFXWLRQ(QYLURQPHQW 6RXUFHFRGH ORJ WUDQVPLVVLRQ 6WRUDJH 23V(QJLQHHU /RJ(QJLQHHULQJ /RJ$QDO\VLV 'DVKERDUGV 'HYHORSHU @NKJIN@?

Slide 27

Slide 27 text

Open/data challenges • An avalanche of different techniques • Supervised and unsupervised ML, passive learning and state machines, graphs, ... • More recently, deep learning and NLP techniques. • There is no single technique that works everywhere… Gunter, D., Tierney, B. L., Brown, A., Swany, M., Bresnahan, J., & Schopf, J. M. (2007, September). Log summarization and anomaly detection for troubleshooting distributed systems. In 2007 8th IEEE/ACM International Conference on Grid Computing (pp. 226-234). IEEE. • Lack of more representative datasets • Lots of HPC logs • No industry/enterprise logs • Starting point: https://github.com/logpai/loghub • Building labeled datasets

Slide 28

Slide 28 text

Beschastnikh, Ivan, et al. "Inferring models of concurrent systems from logs of their behavior with CSight." Proceedings of the 36th International Conference on Software Engineering. ACM, 2014. Beschastnikh, I., Brun, Y., Ernst, M. D., Krishnamurthy, A., & Anderson, T. E. (2012). Mining temporal invariants from partially ordered logs. ACM SIGOPS Operating Systems Review, 45(3), 39-46. Model inference

Slide 29

Slide 29 text

Passive learning Identifying system behavior from observations, and representing it in the smallest possible model. 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved https://automatonlearning.net/ DFASAT / FlexFringe Heule & Verwer, ICGI 2010 Rick Wieman, Maurício Aniche, Willem Lobbezoo, Sicco Verwer and Arie van Deursen. An Experience Report on Applying Passive Learning in a Large-Scale Payment Company. ICSME Industry Track, 2017

Slide 30

Slide 30 text

Use Inferred Models to Analyze: Bugs in Test Phase • Terminal asked for PIN • AND asked for signature • Domain expert noted this unwanted behavior in inferred model. • Fixed before it went into production

Slide 31

Slide 31 text

Use Inferred Models to Analyze: Differences Between Card Brands Twice as many chip errors Informed merchant about issue.

Slide 32

Slide 32 text

Use Inferred Models to Analyze: Time out problems Timeout Improved performance under network instability by adding targeted retry mechanism

Slide 33

Slide 33 text

Scalability is needed! Distributed system Log collector Message Broker Trace aggregator Prefix tree generator Prefix tree aggregator Prefix tree database Collects the logs generated by the many systems Aggregates traces in prefix trees Aggregates individual trees in a complete one Groups log events per trace Rick Proost’s MSc thesis. Under development. Next steps: • Implementation (almost CHECK) • Evaluation • Case study • Future work: State merging (tricky!!)

Slide 34

Slide 34 text

Domain-based Fuzzing with Equivalence Partitioning for Supervised Learning of Anomaly Detection in a Cyber Physical System. Herman Nuradhy Wijaya. MSc Thesis. SUTD, 2019. Short paper under development. • Hackers might get control of the sensors • Domain testing to determine attack models • Data: • Normal traces: 4.5h of normal execution • Abnormal traces: 3 hours of 30s attacks + 60s of rest • Feature vector = 90 sec • Supervised Machine Learning • Extra Trees, accuracy of 99.1% • Lots of future work to be done! SUTD’s testbed Cyber Physical Systems (CPS)

Slide 35

Slide 35 text

Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., ... & Chen, J. (2019, August). Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 807-817). ACM. The state-of-the- art, IMHO. (I like that it tackles the reality that log lines evolve…)

Slide 36

Slide 36 text

Log Platforms • Focus on the integration of all the dimensions, in an end-to-end log analysis tool. • Not so many papers here. • Quite tool focused.

Slide 37

Slide 37 text

Monitoring-aware IDEs Winter, J., Aniche, M., Cito, J., & van Deursen, A. (2019). Monitoring-aware IDEs. In ESEC/FSE 2019 : Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 420-431). Association for Computing Machinery (ACM). How it is How it should be

Slide 38

Slide 38 text

A Monitoring-Aware IDE prototype • Timely Integrated Feedback • Activity of log statements in production • Traceability • Link from monitoring to source • Link from source to monitoring • Search Capability • Searchable monitoring information YRLGPHWKRG^ ORJHUURUHUURURFFXUUHGYDU LI^ ORJZDUQZDUQYDU ` WKURZQHZ5XQWLPH([FHSWLRQ ` N N N $MDYDNQHZH[FHSWLRQV %MDYDNQHZH[FHSWLRQV &MDYDNQHZH[FHSWLRQV 7IEVGLEFPI QSRMXSVMRK MRJSVQEXMSRSJXLI IRXMVIWSJX[EVI W]WXIQ 5YMGOP]MRHMGEXIW LS[SJXIREPSK WXEXIQIRXSV I\GITXMSRLETTIRW MRTVSHYGXMSR (IXEMPIHMRJSVQEXMSR EFSYXXLEXWXEXIQIRX 0EWXLSYVW ?+SXSXLIQSRMXSVMRKA

Slide 39

Slide 39 text

0 200 400 600 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Participant Number of files with monitoring information Week 4 Week 3 Week 2 Week 1 Developers use it!

Slide 40

Slide 40 text

0 5 10 15 20 25 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Participant Qty of actions impact no impact 2 (4%) 2 (4%) 1 (2%) 2 (4%) 3 (7%) 1 (2%) 2 (4%) 3 (7%) 5 (11%) 9 (20%) 15 (33%) L3: Removed log line L4: Added log line I1: Fixed a bug L1: Improved log message L2: Changed log severity O4: Identified issue in log code I2: Improve code quality O1: Identified a bug O2: Identified performance issue O6: Understood implementation stability O5: Understood business process 0 5 10 15 20 Quantity Actions

Slide 41

Slide 41 text

Developers’ Perceptions • Less context switching • Saves time • Reduce cognitive load

Slide 42

Slide 42 text

Opportunities for Future Work • Log Engineering • AI for a better log engineering

Slide 43

Slide 43 text

Opportunities for Future Work • Log Engineering • AI for a better log engineering • Log Infrastructure • Scalability and real-time analysis as first-class citizens (think of billions, trillions of logs a month!)

Slide 44

Slide 44 text

Opportunities for Future Work • Log Engineering • AI for a better log engineering • Log Infrastructure • Scalability and real-time analysis as first-class citizens (think of billions, trillions of logs a month!) • Log Analysis • Need for systematic comparisons among the different approaches • Embrace unsupervised learning and unlabeled data

Slide 45

Slide 45 text

Opportunities for Future Work • Log Engineering • AI for a better log engineering • Log Infrastructure • Scalability and real-time analysis as first-class citizens (think of billions, trillions of logs a month!) • Log Analysis • Need for systematic comparisons among the different approaches • Embrace unsupervised learning and unlabeled data • Log platforms • Closing the gap between research and industry by means of better standards

Slide 46

Slide 46 text

Thanks to all my colleagues! Jeanderson Barros Joop Aué Jos Winter Rick Wieman Rick Proost Daan Schipper Arie van Deursen Jürgen Cito Sicco Verwer Herman Wijaya Asterios Katsifodimos

Slide 47

Slide 47 text

Summary • Monitoring is a first-class citizen in contemporary software development. • Log research can be classified into log engineering, log analysis, log infrastructure, and log platforms. And we are moving fast! • Open challenges: unstable, unstructured, and unlabeled data, need for scalability, better standards. • There is lots of engineering work to be done, before good research can happen. • Maurício Aniche, [email protected], @mauricioaniche

Slide 48

Slide 48 text

License • You can use and share this material, but you should not use it for any commercial purposes. • You must give credits to the original author of this presentation. • You agree not to sell it or make profit in any way with this. • Papers, tools, and everything else I refer to have their own license.

Slide 49

Slide 49 text

Images in this presentation • Cover photos by Chris Liverani, Stephen Dawson, and Luke Chesser (Unsplash) • DevOps picture, by Temitope Oteyowo: https://medium.com/tech- tajawal/devops-in-a-scaling-environment-9d5416ecb928 • Dev productivity, by undraw.co. • Diagrams and charts made by the authors. • You should not reuse authors’ photos for your own purpose.