Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Smart Like A Fox: How clever students trick dumb automated programming assignment assessment systems

Smart Like A Fox: How clever students trick dumb automated programming assignment assessment systems

This case study reports on two first-semester programmings courses with more than 190 students. Both courses made use of automated assessments of students code submissions. We observed how students trick these systems by analyzing the version history of suspect submissions. By analyzing more than 3300 submissions we revealed four astonishingly simple cheat patterns (overfitting, evasion, redirection, and injection) that students can use to trick automated programming assignment assessment systems (APAAS) but we also propose corresponding countermeasures. This immaturity of existing APAAS solutions might have implications for courses that rely deeply on automation like MOOCs. Therefore, we conclude to look at APAAS solutions much more from security (code injection) point of views. Moreover, we identify the need to evolve existing unit testing frameworks into more evaluation-oriented teaching solutions that provide better cheat detection capabilities and differentiated grading support.

Nane Kratzke

March 02, 2019
Tweet

More Decks by Nane Kratzke

Other Decks in Education

Transcript

  1. How clever students trick dumb automated programming
    assignment assessment systems (APAAS)
    Nane Kratzke
    SMART LIKE A FOX
    1

    View full-size slide

  2. Introduction
    Methodology
    Analysis
    Discussion, Counter Measures
    Limitations, Conclusion
    Agenda
    2
    Presentation on SpeakerDeck
    Preprint on ResearchGate
    Presentation at CSEDU 2019, Heraklion, Crete, Greece (2 – 4 May 2019)

    View full-size slide

  3. • We are at a transition point between the
    industrialisation age and the digitisation age.
    • Computer science related skills are a vital asset
    in this context. One of these basic skills is
    practical programming.
    • The course sizes of university and college
    programming courses are steadily increasing.
    • Even MOOC’s are used more frequently to
    convey necessary programming capabilities to
    students of different disciplines.
    • The coursework is composed of assignments
    that are highly suited to be assessed
    automatically.
    • However, it is very often underestimated how
    astonishingly easy it is to trick these systems!
    Introduction
    3
    The question arises
    whether “robots”
    certificate the expertise
    to program or to cheat?

    View full-size slide

  4. A small example to get your attention ...
    4 VPL == Virtual Programming Lab
    • Count the occurence of a character c in
    a String s.
    • Develop a method countChar().
    How to get full points in
    Moodle/VPL?
    The same works for every assignment!
    INTRODUCTION

    View full-size slide

  5. INTRODUCTION
    • APAAS solutions are systems that execute injected code
    (student submissions).
    • Code injection is known as a severe threat from a security
    point of view.
    • APAAS solutions protect the host system via sandbox
    mechanisms.
    • Much effort is invested in sophisticated code
    plagiarism detection and authorship control of
    student submissions.
    • But it was astonishing to see that APAAS solutions like VPL
    overlook the cheating cleverness of students.
    • The grading component can be cheated very
    straightforward.
    • Unattended automated programming examinations must
    be rated suspect.
    APAAS == Code Injection System
    5

    View full-size slide

  6. Introduction
    Methodology
    Analysis
    Discussion, Counter Measures
    Limitations, Conclusion
    Agenda
    6

    View full-size slide

  7. • Two first semester programming Java courses
    in the winter semester 2018/19:
    • A regular computer science study
    programme (CS)
    • An information technology and design
    focused study programme (ITD)
    • In both courses we searched for student
    submissions that intentionally trick the grading
    component.
    • APAAS: Moodle/VPL (Version 3.3.3)
    Methodology
    7
    • To minimise Hawthorne and Experimenter effects neither the students nor the advisers
    were aware to be part of this study.
    • Even if cheating was detected this had no consequences for the students. It was not
    even communicated.
    • Students were unaware that the version history of their submissions were logged and
    analyzed.

    View full-size slide

  8. METHODOLOGY
    • VPL submissions were downloaded
    from Moodle
    • Python/Jupyter based sample selection
    • S1: triggered evaluations
    • S2: maximum versions
    • S3: low average high end
    • S4: condition related terms
    • S5: unusual terms (System.exit, ...)
    • S6: random submissions
    • NumPy, matplotlib, statistics,
    Javaparser libraries
    • Exported weekly into archived PDF
    documents (for manual analysis)
    Searching for cheats
    Automated sample selection, manual sample analysis
    8

    View full-size slide

  9. METHODOLOGY
    Analysis of submissions
    9
    Manual annotation
    Task description
    Result, workload, working
    phases, student identifier

    View full-size slide

  10. Introduction
    Methodology
    Analysis
    Discussion, Counter Measures
    Limitations, Conclusion
    Agenda
    10

    View full-size slide

  11. ANALYSIS
    Observed cheat-pattern frequency
    11

    View full-size slide

  12. ANALYSIS
    Continuous Example Assignment
    12
    Count the occurence of a character c in a String s
    (not case-sensitive).
    We searched for solutions
    that differed significantly
    from this intendend
    (reference) solution.
    The reference solution used to check for correctness.

    View full-size slide

  13. ANALYSIS
    CHEAT PATTERN (1)
    • Get a maximum of points but do not solve the given problem
    in a general way
    • Solution is completely useless outside the scope of the test
    cases
    • Mapping simply input parameters to expected output
    parameters
    (63%) Overfitting
    13

    View full-size slide

  14. ANALYSIS
    CHEAT PATTERN (2)
    (30%) Problem Evasion
    14
    Example assignment:
    Count the occurence of a
    character c in a String s
    recursively.
    Solution pretends to be
    recursive, but it is merely a
    redirection to an overloaded
    method using loops (non-
    recursive).
    Intended solution Evasion solution

    View full-size slide

  15. ANALYSIS
    CHEAT PATTERN (3)
    (6%) Redirection
    15
    (1) A small spelling error will
    result in compiler messages
    indicating that a specific
    method is expected by the test
    logic!
    (2) Compiler error messages
    can reveal the reference
    solution.
    (3) A clever student might
    now simply redirect the
    submission to the reference
    method (to let the grader
    evaluate itself).
    Redirecting solution

    View full-size slide

  16. ANALYSIS
    CHEAT PATTERN (4)
    (2%) Injection
    16
    Print simply the
    points you want to
    have in a APAAS
    specific format on
    standard out.
    • Change the intended workflow of
    the evaluation logic
    • Use the standard out stream to
    place text that is evaluated by the
    APAAS system
    • The evaluator calls the to be evaluated code.
    • The submission code can print to standard out and then terminates further
    evaluation calls.
    • The evaluator parses standard outs content and will give full points!
    Some strings with a specific
    meaning for VPL.

    View full-size slide

  17. Introduction
    Methodology
    Analysis
    Discussion, Counter Measures
    Limitations, Conclusion
    Agenda
    17

    View full-size slide

  18. DISCUSSION
    • Randomize Test Cases
    Overfitting
    • AST-based code inspection
    Problem Evasion
    • AST-based code inspection
    Redirection
    • Seperate standard out stream for
    evaluation and submission logic
    Injection
    Counter Measures
    18
    A more detailed discussion
    can be found in the paper.

    View full-size slide

  19. DISCUSSION
    JEdUnit
    19
    JEdUnit
    https://github.com/nkratzke/JEdUnit
    JEdUnit is a unit testing framework with a
    special focus on educational aspects. It
    strives to simplify automatic evaluation of
    (small) Java programming assignments
    using Moodle/VPL.
    It is used and developed for programming
    classes at the Lübeck University of Applied
    Sciences.
    However, this framework might be helpful
    for other programming instructors, so it has
    been open sourced.

    View full-size slide

  20. DISCUSSION
    Randomize Test Cases
    20
    Don‘t do that:
    Do that:
    JEdUnit DSL to express
    randomized test values. E.g.
    apply regular expressions
    inversely to generate random
    strings.

    View full-size slide

  21. DISCUSSION
    AST-based code inspections
    21
    E.g.: Don‘t allow to bypass recursions
    by inspecting and penalizing loop presence.
    The JEdUnit DSL is able to
    express selectors on abstract
    syntax trees (AST) to check for
    the presence or absence of
    language constructs.
    The selector model of
    JEdUnit works similar like
    CSS selectors work on DOM-
    trees.

    View full-size slide

  22. DISCUSSION
    Isolation of submission and evaluation logic
    22
    Submission logic
    gets an isolated fake
    console
    Submission
    shares stdout
    with evaluation
    process
    JEdUnit
    approach
    VPL
    approach

    View full-size slide

  23. DISCUSSION
    Further Features of JEdUnit
    23
    JEdUnit
    https://github.com/nkratzke/JEdUnit
    • Weighting of test cases (by annotations)
    • Checkstyle integration (weightened rules)
    • DSL
    • to formulate test cases in a check,
    explain, onError pattern
    • to randomize test cases
    • to write arbitrary code inspections
    based on a selector model
    • Predefined code inspections (switch on/off):
    proper collection usage, Loops, Lambdas,
    inner classes, datafields, sonsole output, etc.
    • Automated class structure comparison (OO
    use cases to compare the structural equality
    of a multi-class submission with a multi-class
    reference solution.

    View full-size slide

  24. Introduction
    Methodology
    Analysis
    Discussion, Counter Measures
    Limitations, Conclusion
    Agenda
    24

    View full-size slide

  25. LIMITATIONS
    We searched qualitatively and not
    quantitatively for cheat-patterns
    • Do not draw any conclusions
    what kind of cheat-pattern occur
    at what level of programming
    expertise
    • Do not draw any conclusions on
    the quantitative aspects of
    cheating
    • The study does not proclaim to
    have identified all kinds of cheat-
    patterns
    The study does not proclaim that
    all APAAS solutions have the same
    set of vulnerabilities
    • Do not generalize Moodle/VPL
    specific-problems.
    • However, the Overfitting,
    Problem Evasion, Redirection,
    and Injection patterns can be
    used to check for vulnerabilities
    in other APAAS solutions.
    Threats on Validity
    25

    View full-size slide

  26. • We have to be aware that (even first-year)
    students are clever enough to trick automated
    grading solutions.
    • Cheat patterns:
    • Overfitting
    • Problem Evasion
    • Redirection
    • Injection
    • Options we currently investigate:
    • Randomise test cases
    • Pragmatic code inspection
    • Isolation of submission and evaluation logic
    • Exactly these features seem to be only
    incompletely provided by current APAAS systems.
    Conclusion
    26
    JEdUnit
    https://github.com/nkratzke/JEdUnit

    View full-size slide

  27. Acknowledgement
    27
    Presentation on SpeakerDeck
    Preprint on ResearchGate
    Advisers of the practical courses
    • David Engelhardt, Thomas Hamer, Clemens Stauner,
    Volker Völz, Patrick Willnow
    Student tutors
    • Franz Bretterbauer, Francisco Cardoso, Jannik
    Gramann, Till Hahn, Thorleif Harder, Jan Steffen
    Krohn, Diana Meier, Jana Schwieger, Jake Stradling,
    and Janos Vinz
    Picture Reference
    • Hacker: Pixabay.com (CC0)
    • Robot: Pixabay.com (CC0)

    View full-size slide

  28. About
    28
    Nane Kratzke
    Web: http://nane.kratzke.pages.mylab.th-luebeck.de/about
    Twitter: @NaneKratzke
    LinkedIn: https://de.linkedin.com/in/nanekratzke
    GitHub: https://github.com/nkratzke
    ResearchGate: https://www.researchgate.net/profile/Nane_Kratzke
    SlideShare: http://de.slideshare.net/i21aneka

    View full-size slide