Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI&Software Testing - Using NLP to Detect Requirements Defects

5206c19df417b8876825b5561344c1a0?s=47 Exactpro
PRO
November 17, 2020

AI&Software Testing - Using NLP to Detect Requirements Defects

Мурад Мамедов
AI-исследователь, Exactpro

Серия семинаров AI&Software Testing
17.11.20

Мурад Мамедов представит научную работу под названием «Использование NLP-методик для обнаружения дефектов требований: промышленный опыт в железнодорожной отрасли» (Using NLP to Detect Requirements Defects: an Industrial Experience in the Railway Domain).
Авторы работы — группа итальянских исследователей из Флорентийского университета (DINFO, Florence), Института информационных наук и технологий им. Алессандро Фаэдо (ISTI-CNR, Pisa) и компании Alstom Ferroviaria.

Видео: https://youtu.be/H1c1Dgmiusc

---
Подписывайтесь на Exactpro в социальных сетях:

LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Instagram https://www.instagram.com/exactpro/

Подписывайтесь на YouTube канал Exactpro http://www.youtube.com/c/ExactproVlog

5206c19df417b8876825b5561344c1a0?s=128

Exactpro
PRO

November 17, 2020
Tweet

Transcript

  1. Build Software to Test Software exactpro.com Research Seminar on AI

    in Test: Using NLP to Detect Requirements Defects 10.11.2020 Murad Mamedov, AI Researcher
  2. 2 Build Software to Test Software exactpro.com Using NLP to

    Detect Requirements Defects Topic: Using NLP to Detect Requirements Defects: an Industrial Experience in the Railway Domain Purpose: The authors want to verify requirements for completeness, clearness, preciseness, unequivocality, verifiability, testability, maintainability, and feasibility. Details: • The research consists of two parts: preliminary and large set analysis • Authors compared Manual Verification vs. NLP Analysis Link: https://www.researchgate.net/publication/313867290_Using_N LP_to_Detect_Requirements_Defects_An_Industrial_Experien ce_in_the_Railway_Domain
  3. 3 Build Software to Test Software exactpro.com Workflow of the

    Research Defect Classes Detection Role: VE1 Subtask: To assess relevance of the defects Subtask: To define patterns for each defect class using GATE Preliminary Research + Role: NLP-E Subtask: To assess feasibility of the defects Full Set Research Annotate Raw Data Metrics Assessment Role: VE3 Subtask: Data Markup Role: VE1, VE2 Subtask: Markup Review + Execution two iterations
  4. 4 Build Software to Test Software exactpro.com GATE Tool Overview

    Tokenization Splits a document into separate tokens e.g. words, numbers, spaces, punctuation POS Tagging Defines Part-of-Speech for each token e.g. noun (NN), verb (VB), adjective (JJ) Shallow Parsing: Identifies Noun and Verb Phrases e.g. in sentence “Messages are received by the system”, {messages, the system} is NP and {are received} is VP JAPE Rules With this technology you can define rules similar to regexp instructions Gazetteer Searches for a list of predetermined terms
  5. 5 Build Software to Test Software exactpro.com Patterns for Defects

    Prediction # Pattern Description 1 Anaphoric ambiguity References to the previous parts using pronouns, when there are few options to refer to 2 Coordination ambiguity When conjunctions lead to multiple interpretations 3 Vague terms When a term has no precise semantic 4 Modal adverb The adverbs that have the suffix -ly 5 Passive voice When it isn’t followed by the subject 6 Excessive length Picked the length of sentence >60 tokens 7 Missing condition Each if should have else/otherwise 8 Missing a unit of measurement Each number is required to have an associated unit of measurement, unless the number represents a reference 9 Missing reference A reference is presented in the text of the requirements but not in the list of references 10 Undefined term In the case of this company they had camelCase notation for terms and the researchers were checking all such terms if they are presented in the Glossary of requirement doc
  6. 6 Build Software to Test Software exactpro.com JAPE Rules for

    Patterns # Defect Class JAPE Rule 1 Anaphoric ambiguity PANA = (NP)(NP) + (Split)[0,1] (Token.POS == PP | Token.POS =∼ PR*) 2 Coordination ambiguity PCO1 = ((Token)+ (Token.string == AND | OR)) [2] PCO2 = (Token.POS == JJ) (Token.POS == NN | NNS) (Token.string == AND | OR) (Token.POS == NN | NNS) 3 Vague terms PV AG = (Token.string ∈ Vague) 4 Modal adverb PADV = (Token.POS == RB | RBR), (Token.string =∼ ”[.]*ly$”) 5 Passive voice PP V = (AUXVERB)(NOT)?(Token.POS == RB | RBR)? (Token.POS ==VBN) 6 Excessive length PLEN = Sentence.len > 60 7 Missing condition PMC = (IF)(Token, !Token.kind == punctuation)* (Token.kind == punctuation)(!(ELSE | OTHERWISE)) 8 Missing a unit of measurement PMU1 = (NUMBER)((Token)[0, 1](NUMBER))?(!MEASUREMENT) PMU2 = (NUMBER)((Token)[0, 1](NUMBER))?(!PERCENT) 9 Missing reference PMR = (Token.string == “Ref”)(Token.string == “.”) (SpaceToken)?(NUMBER) 10 Undefined term PUT = (Token.kind == word, Token.orth == mixedCaps)
  7. 7 Build Software to Test Software exactpro.com Dataset Structure Area:

    Railway signalling software consists of 4 components Volume: The raw dataset has 1866 requirements, each requirement may have 0, 1 or more than 1 defect Data Markup Approach: Manually (in order to compare with GATE’s output) Markup Stages: a. If a requirement was accepted or rejected b. If it was rejected, then why: completeness clarity preciseness unequivocality verifiability testability maintainability feasibility c. If it was rejected due to completeness clarity unequivocality then what exactly was lacking from Patterns perspective Markup Output: • 1733 accepted reqs • 93 rejected reqs • Majority of the defects are due to passive voice
  8. 8 Build Software to Test Software exactpro.com Dataset Structure ReqID

    Accepted ... Completeness Clarity Unequivocality req_1 1 0 0 0 req_2 0 0 0 0 ... req_n 0 1 0 1 req_1 = has no bugs and was accepted req_2 = has bugs but not related to this research req_n = has 2 bugs, one on Completeness and one Unequivocality ReqID FragID PatternID req_1 frag_1 pattern_4 req_1 frag_2 pattern_2 ... req_m frag_n pattern_7 frag_1 = modal adverb defect frag_2 = coordination ambiguity frag_n = missing condition Dataset A Dataset B
  9. 9 Build Software to Test Software exactpro.com Measurement Approach Evaluation

    Measures by Defect Precision and recall calculated on top of “defects”. Where one defect is a piece of a requirement which contains a flaw following the Patterns Evaluation Measures by Requirement Focuses on requirements themselves In order to track the efficiency of the patterns applied together tpD - number of requirements fragments labeled as defective and correctly identified by the pattern fpD - number of requirements fragments wrongly identified as defective by the pattern fnD - number of requirements fragments labeled as defective that are not discovered by the pattern
  10. 10 Build Software to Test Software exactpro.com Patterns Tuning after

    Preliminary Research # Patterns Changes 1 Anaphoric ambiguity - 2 Coordination ambiguity - 3 Vague terms Added new terms, added stop-words (domain specific words) 4 Modal adverb - 5 Passive voice - 6 Excessive length Lists were also recognized as excessive length sentences (won’t fix) 7 Missing condition if-else construction can be replaced with an if-if option in some cases (won’t fix) 8 Missing a unit of measurement Measurements are not applicable for ranges 9 Missing reference - 10 Undefined term -
  11. 11 Build Software to Test Software exactpro.com Evaluation of the

    Results False Negative FN errors appeared due to req which have defect not presented in the patterns i.e. testability, feasibility defects False Positive FP errors appeared due to lack of standardization VE1 and VE3 had different understanding of how to annotate the data VE3 tolerated some of the linguistic defects (i.e. Vague Terms) and marked up only “tech” defects
  12. 12 Build Software to Test Software exactpro.com Evaluation of the

    Results Defect Class D R tpD fpD pD Anaphoric ambiguity 387 327 258 129 66.6% Coordination ambiguity 263 213 190 73 72.24% Vague terms 496 306 290 206 58.46% Modal adverbs 476 373 331 145 69.53% Passive voice 1265 615 1242 23 98.1% Excessive length 16 16 16 16 100% Missing condition 188 148 129 59 68.61% Missing unit of measurement 0 0 0 0 - Missing reference 4 2 4 0 100% Undefined term 54 49 43 11 79.62% Average 79.24% Requirements tpR fpR pR 1042 175 85.6%
  13. 13 Build Software to Test Software exactpro.com Conclusion In-House NLP:

    About proper tool usage/customization Requirements Language Counts: About a recommendation for such tools to be used by requirements editors (instead of VEs), because it’s about writing style (i.e. the major defect pattern “passive voice”) Validation Criteria Counts: About make criteria clear before annotation NLP is Only a Part of the Answer: That they were not able to detect testability/feasibility defects using described techniques Statistical NLP vs Lexical Techniques: If you want to use lexical-based approach instead of statistics-based, you need to revise it better.