Slide 1

Slide 1 text

KNOWING WHAT DATA TO EXPECT - Sandhya Sasidharan

Slide 2

Slide 2 text

We are part of building software applications When do call it a “SUCCESS” ?

Slide 3

Slide 3 text

WHEN … It is does not fail in production The intended business runs uninterrupted There is no loss incurred due to the application behaviour for the organisation using the application

Slide 4

Slide 4 text

Test early Validate requirements Do thorough regression Automate your tests Do various levels of testing like feature testing, system testing, integration testing and so on Get the actual users test Develop for the NFRS (Scalability, Performance, Security, Availability, Reliability) After all of these measures, we still see applications miserably fail in production. TESTING BEST PRACTICES SAYS

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

BECAUSE… The reality in production is different from what was anticipated

Slide 7

Slide 7 text

WHY DO WE NEED TO UNDERSTAND DATA? Many applications fail as the it is not prepared to to handle certain kind of data, that may occur in reality. Especially when the data is huge; the type of scenarios that would happen in production causing different data variations to flow in, becomes difficult to anticipate. It is primarily important to understand the business user flows; but it is always not not evident on how the users use the application and in-turn how the data gets manipulated and transferred In today’s world, the level of integration is really high due to micro services architecture and need for IOT to work together. Hence data being transferred from one system and being interpreted by another may lead to more issues

Slide 8

Slide 8 text

Some tips on how to dig in

Slide 9

Slide 9 text

UNDERSTAND THE DATA MODEL Familiarise where and how data resides Structured or unstructured When you test each requirement, see what are all the tables impacted in case of inserts/updates/deletes Dose it need a cascade action? Will the data integrity be maintained upon any action? What other information should be used to performa particular action

Slide 10

Slide 10 text

UNDERSTAND THE DATA MODEL (CONTD.) Example: Employee management application tbl_employees Any bulk operation for all employees, needs to done on employee Ids fetched from one table Assume a contract employee becomes permanent, the update of EmpType value is sufficient. Does not need any cascade action EmpID EmpName EmpType 1000 Aaa Permanent 1001 Xyz Contract 1002 Naa Permanent 1003 Bla Permanent

Slide 11

Slide 11 text

UNDERSTAND THE DATA MODEL (CONTD.) tbl_permanent_employees tbl_contract_employees Any bulk operation for all employees, needs to be done on employee Ids fetched from both tables Assume a contract employee becomes permanent, and gets inserted into the tbl_permanent_employees; then there is a cascade action required in the tbl_contract_employees as well. EmpID EmpName 1000 Aaa 1002 Naa 1003 Bla EmpID EmpName ContractAgency 1001 Aaa Agency1 1004 Xyz Agency2 1011 Bla Agency2

Slide 12

Slide 12 text

UNDERSTAND THE DATA MODEL (CONTD.) Denormalized data Example: tbl_emp_skillset Incase you for-see a requirement in her future to capture the proficiency level for each skill set, then there is some rework needed. EmpID SkillSet 1000 {‘Ruby’} 1002 {‘Java’, ‘Ruby’} 1003 {‘Scala’}

Slide 13

Slide 13 text

LOOK FOR DISTINCT VALUES IN COLUMNS THAT CATEGORISE DATA Your number of test scenarios directly increases by the number of distinct values in such columns Example: select distinct EmployeeType from tbl_employee; You may have to do PF calculation for salaried employees Interns receive stipend and not salary and hence this should not be applied to interns EmployeeType Intern Permenant Contractor

Slide 14

Slide 14 text

COMBINED MEANING OF DATA You may have to do tax exemption calculation for salaried joined before 31-Aug-2015 Flags matter - they carry a lot of business meaning most of the time; sometime used to handle exception scenarios EmpID EmpName EmpType EmpStatus IsOnLOP 1000 Aaa Permanent OnRoll 0 1001 Xyz Contract Exited 1002 Naa Permanent InNotice 0 1003 Bla Permanent OnRoll 1

Slide 15

Slide 15 text

NULL VALUES Should a column be not nullable? If a column can hold NULL as a valid case, then does the code handle when NULL is fetched Validate how NULL is interpreted and conveyed to the user. Example1: An customer is supposed to rate products between 0 to 10 tbl_product_rating ProductId Product Name Rate 12345 prod1 3 12346 prod6 8 12347 prod3 0 12348 prod9 Yet to be rated

Slide 16

Slide 16 text

NULL VALUES (CONTD.) Example2: To display price of an product tbl_product If the user is shown 0 price for both products in UI, it is still fine What does the upstream system which gives the data means and how it is interpreted by the downstream matters. ProductId Product Name Category Price IsComplimentary 12345 prod1 10 5.50 0 12346 prod2 11 1 12347 prod3 16 0.0 1 12348 prod4 16 30 0

Slide 17

Slide 17 text

NULL VALUES (CONTD.) Lack of a record vs NULL Example: To find articles with no price tbl_product tbl_price Article 1002 does not have a record in price table. Article 1003 has a record in price table, but null value for price column. In both cases, the articles should be considered as no price. ArticleId Price 1000 5.50 1001 1003 10.00 ProductId Product Name Category IsComplimentary 1000 prod1 10 0 1001 prod2 11 1 1002 prod3 16 1 1003 prod4 16 0

Slide 18

Slide 18 text

LOOK OUT FOR DB CONSTRAINTS Understand Business reasoning for constraints Example: tbl_stock Once the stock unit is marked as expired, it should not be allowed to be sold. Can duplicate data be entered for a column used as a primary key? Need for a composite key. Example: tbl_top_sales StockId IsExpiered IsSold Date 7123 1 0 21-04-2016 7124 0 1 21-04-2016 7125 0 0 21-04-2016 TopSellerProductId StoreId Period SalesUnits 1000 1 Jan-2016 87 1024 2 Jan-2016 50 1000 1 Feb-2016 82

Slide 19

Slide 19 text

FOUND ANOMALIES? WHAT NEXT? PATTERN ANALYSIS Frequency of occurrence: Often or Rare Timeframe of occurrence: Recent / Past, Happens post some action? Specific to some users or roles? Are the anomalies due to misuse of features Question how will the pattern be in near future? Based on these patterns decide on criticality and priority of handling the scenario

Slide 20

Slide 20 text

GETTING THE RIGHT DATA SET TO TEST To understand the real data, we need the right data set to test. Getting the production data onto test environments might be a challenge Confidentiality - Go for obfuscation, if needed Volume - Identify the right sample set Be it Manual testing or Automation Follow the 80:20 Pareto principle

Slide 21

Slide 21 text

Happy testing & Thank you! Q & A