Performance Automation Tools in Orion Health

Slide 1

Slide 1 text

Performance Automation Tools in Orion Health Orion Health Viktoriia Kuznetcova / Senior Performance Engineer

Slide 2

Slide 2 text

Page 2• Copyright © 2013 Orion Health™ group of companies • All rights reserved Agenda • Automation in Testing • Specifics of Automation in Performance Testing • Overview of automation tools used in Orion Health • Dive into test data generation • Dive into test data analysis • Dive into SWAT

Slide 3

Slide 3 text

Page 3• Copyright © 2013 Orion Health™ group of companies • All rights reserved Test Automation “In software testing, test automation is the use of special software (separate from the software being tested) to control the execution of tests and the comparison of actual outcomes with predicted outcomes.” © Wikipedia • Exploratory testing – cannot be automated • “Check” automation – checking that outcomes are as expected • Works sometimes for unit testing, sometimes for regression testing

Slide 4

Slide 4 text

Page 4• Copyright © 2013 Orion Health™ group of companies • All rights reserved Automation in Testing We automate anything that helps in testing and does not require human intelligence: – Preparing and validating test environment – Generating Test Data – Monitoring the system under test – Gathering information about production: from underlying data to user workflows and any issues – Analyzing raw test results, producing meaningful human-readable output – …

Slide 5

Slide 5 text

Page 5• Copyright © 2013 Orion Health™ group of companies • All rights reserved Challenges in Performance Testing • Complex production-like, often disposable, test environments • Production-like in volume, complexity and variability test data • Complex user workflows need to be simulated with automation at high volumes • Monitoring can be complicated – lots of nodes and metrics to gather • Results Analysis – too much information means it is easy to miss important details

Slide 6

Slide 6 text

Page 6• Copyright © 2013 Orion Health™ group of companies • All rights reserved What do we automate? • Spinning up a Test Environment: AWS, Puppet, Ansible, bash • Generating Test Data: Data Pot, other in-house tools, PL/SQL • Generating User Load: Apache Jmeter, Gatling, WebPageTest, in-house tools • Monitoring: AWS CloudWatch, sar, Capt. Morgan, ElasticSearch, perfmon, etc. • Processing and Analyzing Test Results: R, Scala, Splunk • Automating simplified performance testing for nightly builds: Ansible, bash, AWS cli

Slide 7

Slide 7 text

Page 7• Copyright © 2013 Orion Health™ group of companies • All rights reserved Test Environment Automation • Infrastructure-level automation: AWS - CloudFormation • Hardware-level automation: AWS - EC2 instances, RDS instances • OS-level automation: AMIs come with EC2 instances, Puppet and/or Ansible can install and configure the rest • Application-level automation: in-house tool Graviton glues together and drives automation for deploying and configuring specific applications making up the system under test

Slide 8

Slide 8 text

Page 8• Copyright © 2013 Orion Health™ group of companies • All rights reserved Test Data – Problem Statement • Clinical data – complex, rich • One way to get good data: data from production – rarely applicable, because it is hard to anonymize it, and legally impossible to use as is • Another way is to generate data resembling production data: – Similar volumes for all relevant data types – Similar variability for all relevant data types and fields – Similar data distributions – Realistic values, where the behavior of the system is data-driven

Slide 9

Slide 9 text

Page 9• Copyright © 2013 Orion Health™ group of companies • All rights reserved Test Data – Solution: Data Pot • In-house tool, but the principles can be applied in a wider context • “Cooks” data in the internal format inside Oracle database, using PL/SQL and reference tables • The data is then transformed into the format system expects via Orion Health Rhapsody • Resulting dishes are fed to the system, which populates internal databases as necessary • Schemas dumps are taken and reused Oracle Rhapsody System under test

Slide 10

Slide 10 text

Page 10• Copyright © 2013 Orion Health™ group of companies • All rights reserved Data Pot: Features • Full control over major data distributions and volumes • Randomized values for all of the relevant fields • Easy to extend and customize data before and after each stage • Layered data generation: allows for complex logic where there are inter-data dependencies (e.g. Lab results depend on the type of Lab tests) • Data content is de-coupled from data format • Fast generation of huge data volumes: performance is mostly limited by the system under test, everything else can be scaled

Slide 11

Slide 11 text

Page 11• Copyright © 2013 Orion Health™ group of companies • All rights reserved Data Pot: Basic Principles • Start with understanding production data • Design data model that accounts for all properties of the production data you want to cover • Generate data in layers • Start simple, add complexity as you go • Remember about performance of data generation

Slide 12

Slide 12 text

Page 12• Copyright © 2013 Orion Health™ group of companies • All rights reserved Test Data: Additional Considerations • Production data changes over time. Test environment should reflect that • Test Data actually used during test run matters: it needs to represent various users and workflows to have a good test coverage • Use understanding of production data to decide how to slice test data • Use SQL to find representative users/test data sets to actually use in the testing • Doesn’t matter if it’s performance testing or functional testing – the principles stand

Slide 13

Slide 13 text

Page 13• Copyright © 2013 Orion Health™ group of companies • All rights reserved Test Load • For web applications we use Jmeter, Gatling and WebPageTest • Jmeter and Gatling generate server load on a protocol level, do not emulate browser • WebPageTest uses real browsers, but doesn’t scale well • Testing is not automated, creating the load and measuring the results is! • To get understanding of what to model, one can use production access logs

Slide 14

Slide 14 text

Page 14• Copyright © 2013 Orion Health™ group of companies • All rights reserved Monitoring • There are many tools for all levels of monitoring • Automating monitoring output using a tool like ELK/Prometheus/NewRelic/etc. makes it easier to see patterns and dig into metrics retrospectively and during the test • Real User Monitoring is very useful, but needs to be built into the code. Alternative – Captain Morgan • Another alternative is monitoring Apache access.log or smth similar • Building good logging into the application greatly improves testability

Slide 15

Slide 15 text

Page 15• Copyright © 2013 Orion Health™ group of companies • All rights reserved Processing Test Results • Types of test results we see in Performance testing: – Jmeter and Gatling logs with response times, codes, server errors etc. – Application logs (we have 3 types of logs with various information) – Access logs – GC logs – AWR reports • Analysis we do: aggregation, finding high resource utilization items, correlating events from different logs to each other, making sure test load was as designed • Tools: fast grep and awk sometimes, Excel sometimes, Splunk sometimes, but mostly R and Scala/Java in-house apps

Slide 16

Slide 16 text

Page 16• Copyright © 2013 Orion Health™ group of companies • All rights reserved Examples of Data Processing • Cleaning up access logs (remove PHI, enable aggregation) • Aggregating Jmeter/Gatling results (percentiles) • Analyzing application logs to find issues • Analyzing HAR files to find issues with caching and Gzip • Analyzing application configuration to find issues (best practices adherence)

Slide 17

Slide 17 text

Page 17• Copyright © 2013 Orion Health™ group of companies • All rights reserved Pritcel • Pritcel – performance test results report, highlights: – Slow requests (from Jmeter/Gatling log ) and events (from app logs) – Slow pages (from WebPageTest) – Slow SQL queries (from AWR report) – Long GC pauses (from GC logs) – Internal resources contention – db connection pools, caches (from app logs) – Error rates (from Jmeter/Gatling log and from app logs) – Concurrency levels (from Jmeter logs) – Response times aggregated, and detailed throughout the test

Slide 18

Slide 18 text

Page 18• Copyright © 2013 Orion Health™ group of companies • All rights reserved SWAT – CI Performance Automation • Meant to help developers measure performance for each new build and get a quick feedback • Does the whole workflow in a simplified form, from creating the environment and preparing test data, to running the tests and processing results • Version 1 uses bash as a glue. Version 2 uses Ansible as glue • PEU owns automation. Developers own using it for their specific project and monitoring results

Slide 19

Slide 19 text

Page 19• Copyright © 2013 Orion Health™ group of companies • All rights reserved Contact Details • https://testinglass.blogspot.com • https://twitter.com/miss-hali • [email protected] • [email protected] Questions?