Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI&Software Testing - Using NLP to Detect Requirements Defects

Exactpro
PRO
November 17, 2020

AI&Software Testing - Using NLP to Detect Requirements Defects

Мурад Мамедов
AI-исследователь, Exactpro

Серия семинаров AI&Software Testing
17.11.20

Мурад Мамедов представит научную работу под названием «Использование NLP-методик для обнаружения дефектов требований: промышленный опыт в железнодорожной отрасли» (Using NLP to Detect Requirements Defects: an Industrial Experience in the Railway Domain).
Авторы работы — группа итальянских исследователей из Флорентийского университета (DINFO, Florence), Института информационных наук и технологий им. Алессандро Фаэдо (ISTI-CNR, Pisa) и компании Alstom Ferroviaria.

Видео: https://youtu.be/H1c1Dgmiusc

---
Подписывайтесь на Exactpro в социальных сетях:

LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Instagram https://www.instagram.com/exactpro/

Подписывайтесь на YouTube канал Exactpro http://www.youtube.com/c/ExactproVlog

Exactpro
PRO

November 17, 2020
Tweet

More Decks by Exactpro

Other Decks in Education

Transcript

  1. Build Software to Test Software
    exactpro.com
    Research Seminar on AI in Test:
    Using NLP to Detect Requirements Defects
    10.11.2020
    Murad Mamedov, AI Researcher

    View Slide

  2. 2 Build Software to Test Software
    exactpro.com
    Using NLP to Detect Requirements Defects
    Topic: Using NLP to Detect Requirements Defects: an
    Industrial Experience in the Railway Domain
    Purpose: The authors want to verify requirements for
    completeness, clearness, preciseness, unequivocality,
    verifiability, testability, maintainability, and feasibility.
    Details:
    ● The research consists of two parts: preliminary
    and large set analysis
    ● Authors compared Manual Verification vs. NLP Analysis
    Link:
    https://www.researchgate.net/publication/313867290_Using_N
    LP_to_Detect_Requirements_Defects_An_Industrial_Experien
    ce_in_the_Railway_Domain

    View Slide

  3. 3 Build Software to Test Software
    exactpro.com
    Workflow of the Research
    Defect Classes Detection
    Role: VE1
    Subtask: To assess
    relevance of the defects
    Subtask: To define
    patterns for each defect
    class using GATE
    Preliminary Research
    + Role: NLP-E
    Subtask: To assess
    feasibility of the defects
    Full Set Research
    Annotate Raw Data
    Metrics Assessment
    Role: VE3
    Subtask: Data Markup
    Role: VE1, VE2
    Subtask: Markup Review
    +
    Execution
    two iterations

    View Slide

  4. 4 Build Software to Test Software
    exactpro.com
    GATE Tool Overview
    Tokenization
    Splits a document into
    separate tokens
    e.g. words, numbers,
    spaces, punctuation
    POS Tagging
    Defines Part-of-Speech
    for each token
    e.g. noun (NN), verb
    (VB), adjective (JJ)
    Shallow Parsing:
    Identifies Noun and
    Verb Phrases
    e.g. in sentence
    “Messages are received by
    the system”, {messages, the
    system} is NP and {are
    received} is VP
    JAPE Rules
    With this technology
    you can define rules
    similar to regexp
    instructions
    Gazetteer
    Searches for a list of
    predetermined terms

    View Slide

  5. 5 Build Software to Test Software
    exactpro.com
    Patterns for Defects Prediction
    # Pattern Description
    1 Anaphoric ambiguity References to the previous parts using pronouns, when there are few options to refer to
    2 Coordination ambiguity When conjunctions lead to multiple interpretations
    3 Vague terms When a term has no precise semantic
    4 Modal adverb The adverbs that have the suffix -ly
    5 Passive voice When it isn’t followed by the subject
    6 Excessive length Picked the length of sentence >60 tokens
    7 Missing condition Each if should have else/otherwise
    8 Missing a unit of measurement
    Each number is required to have an associated unit of measurement, unless the number represents a
    reference
    9 Missing reference A reference is presented in the text of the requirements but not in the list of references
    10 Undefined term
    In the case of this company they had camelCase notation for terms and the researchers were checking all
    such terms if they are presented in the Glossary of requirement doc

    View Slide

  6. 6 Build Software to Test Software
    exactpro.com
    JAPE Rules for Patterns
    # Defect Class JAPE Rule
    1 Anaphoric ambiguity
    PANA = (NP)(NP) + (Split)[0,1]
    (Token.POS == PP | Token.POS =∼ PR*)
    2 Coordination ambiguity
    PCO1 = ((Token)+ (Token.string == AND | OR)) [2]
    PCO2 = (Token.POS == JJ) (Token.POS == NN | NNS)
    (Token.string == AND | OR) (Token.POS == NN | NNS)
    3 Vague terms PV AG = (Token.string ∈ Vague)
    4 Modal adverb PADV = (Token.POS == RB | RBR), (Token.string =∼ ”[.]*ly$”)
    5 Passive voice PP V = (AUXVERB)(NOT)?(Token.POS == RB | RBR)? (Token.POS ==VBN)
    6 Excessive length PLEN = Sentence.len > 60
    7 Missing condition
    PMC = (IF)(Token, !Token.kind == punctuation)* (Token.kind == punctuation)(!(ELSE |
    OTHERWISE))
    8 Missing a unit of measurement
    PMU1 = (NUMBER)((Token)[0, 1](NUMBER))?(!MEASUREMENT)
    PMU2 = (NUMBER)((Token)[0, 1](NUMBER))?(!PERCENT)
    9 Missing reference PMR = (Token.string == “Ref”)(Token.string == “.”) (SpaceToken)?(NUMBER)
    10 Undefined term PUT = (Token.kind == word, Token.orth == mixedCaps)

    View Slide

  7. 7 Build Software to Test Software
    exactpro.com
    Dataset Structure
    Area:
    Railway signalling software consists of
    4 components
    Volume:
    The raw dataset has 1866 requirements,
    each requirement may have 0, 1 or
    more than 1 defect
    Data Markup Approach:
    Manually (in order to compare with GATE’s
    output)
    Markup Stages:
    a. If a requirement was accepted or
    rejected
    b. If it was rejected, then why:
    completeness
    clarity
    preciseness
    unequivocality
    verifiability
    testability
    maintainability
    feasibility
    c. If it was rejected due to
    completeness
    clarity
    unequivocality
    then what exactly was lacking
    from Patterns perspective
    Markup Output:
    ● 1733 accepted reqs
    ● 93 rejected reqs
    ● Majority of the defects are
    due to passive voice

    View Slide

  8. 8 Build Software to Test Software
    exactpro.com
    Dataset Structure
    ReqID Accepted
    ...
    Completeness Clarity Unequivocality
    req_1 1 0 0 0
    req_2 0 0 0 0
    ...
    req_n 0 1 0 1
    req_1 = has no bugs and was accepted
    req_2 = has bugs but not related to this research
    req_n = has 2 bugs, one on Completeness and one Unequivocality
    ReqID FragID PatternID
    req_1 frag_1 pattern_4
    req_1 frag_2 pattern_2
    ...
    req_m frag_n pattern_7
    frag_1 = modal adverb defect
    frag_2 = coordination ambiguity
    frag_n = missing condition
    Dataset A Dataset B

    View Slide

  9. 9 Build Software to Test Software
    exactpro.com
    Measurement Approach
    Evaluation Measures by Defect
    Precision and recall calculated on top of “defects”. Where one
    defect is a piece of a requirement which contains
    a flaw following the Patterns
    Evaluation Measures by Requirement
    Focuses on requirements themselves
    In order to track the efficiency of the patterns applied together
    tpD - number of requirements fragments labeled as defective
    and correctly identified by the pattern
    fpD - number of requirements fragments wrongly identified
    as defective by the pattern
    fnD - number of requirements fragments labeled as defective
    that are not discovered by the pattern

    View Slide

  10. 10 Build Software to Test Software
    exactpro.com
    Patterns Tuning after Preliminary Research
    # Patterns Changes
    1 Anaphoric ambiguity -
    2 Coordination ambiguity -
    3 Vague terms Added new terms, added stop-words (domain specific words)
    4 Modal adverb -
    5 Passive voice -
    6 Excessive length Lists were also recognized as excessive length sentences (won’t fix)
    7 Missing condition if-else construction can be replaced with an if-if option in some cases (won’t fix)
    8 Missing a unit of measurement Measurements are not applicable for ranges
    9 Missing reference -
    10 Undefined term -

    View Slide

  11. 11 Build Software to Test Software
    exactpro.com
    Evaluation of the Results
    False Negative
    FN errors appeared due to req which have defect not presented in the patterns
    i.e. testability, feasibility defects
    False Positive
    FP errors appeared due to lack of standardization
    VE1 and VE3 had different understanding of how to annotate the data
    VE3 tolerated some of the linguistic defects (i.e. Vague Terms) and marked up only “tech” defects

    View Slide

  12. 12 Build Software to Test Software
    exactpro.com
    Evaluation of the Results
    Defect Class D R tpD fpD pD
    Anaphoric ambiguity 387 327 258 129 66.6%
    Coordination ambiguity 263 213 190 73 72.24%
    Vague terms 496 306 290 206 58.46%
    Modal adverbs 476 373 331 145 69.53%
    Passive voice 1265 615 1242 23 98.1%
    Excessive length 16 16 16 16 100%
    Missing condition 188 148 129 59 68.61%
    Missing unit of measurement 0 0 0 0 -
    Missing reference 4 2 4 0 100%
    Undefined term 54 49 43 11 79.62%
    Average 79.24%
    Requirements
    tpR fpR pR
    1042 175 85.6%

    View Slide

  13. 13 Build Software to Test Software
    exactpro.com
    Conclusion
    In-House NLP:
    About proper tool usage/customization
    Requirements Language Counts:
    About a recommendation for such tools to be used by requirements editors (instead of VEs),
    because it’s about writing style (i.e. the major defect pattern “passive voice”)
    Validation Criteria Counts:
    About make criteria clear before annotation
    NLP is Only a Part of the Answer:
    That they were not able to detect testability/feasibility defects using described techniques
    Statistical NLP vs Lexical Techniques:
    If you want to use lexical-based approach instead of statistics-based, you need to revise it better.

    View Slide