Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A paper-review of "Extracting Training Data from Large Language Models"

Shuntaro Yada
January 25, 2021

A paper-review of "Extracting Training Data from Large Language Models"

A paper-review of "Extracting Training Data from Large Language Models" by Shuntaro Yada at Social Computing Lab. in NAIST, Japan.

Shuntaro Yada

January 25, 2021
Tweet

More Decks by Shuntaro Yada

Other Decks in Research

Transcript

  1. NAIST SocioCom


    Paper reading MTG on 2020-01-25


    Reader: Shuntaro Yada
    Extracting Training Data
    from Large Language
    Models
    A paper-review of

    View Slide

  2. 2
    Extracting Training Data from Large Language Models
    Nicholas Carlini1 Florian Tramèr2 Eric Wallace3 Matthew Jagielski4
    Ariel Herbert-Voss5,6 Katherine Lee1 Adam Roberts1 Tom Brown5
    Dawn Song3 Úlfar Erlingsson7 Alina Oprea4 Colin Raffel1
    1Google 2Stanford 3UC Berkeley 4Northeastern University 5OpenAI 6Harvard 7Apple
    Abstract
    It has become common to publish large (billion parameter)
    language models that have been trained on private datasets.
    This paper demonstrates that in such settings, an adversary can
    perform a training data extraction attack to recover individual
    training examples by querying the language model.
    We demonstrate our attack on GPT-2, a language model
    trained on scrapes of the public Internet, and are able to extract
    hundreds of verbatim text sequences from the model’s training
    data. These extracted examples include (public) personally
    Submitted to arXiv on 14 Dec 2020


    (arXiv:2012.07805)
    otential candidate memorized
    more candidates we would
    ly more memorized content.
    es for extracting memorized
    argeted towards specific con-
    ure work.
    re Overfitting. It is often
    tting (i.e., reducing the train-
    ible to prevent models from
    er, large LMs have no signifi-
    still able to extract numerous
    ining set. The key reason is
    training loss is only slightly
    here are still some training
    low losses.
    re Data. Throughout our
    ntly memorize more training
    mple, in one setting the 1.5
    memorizes over 18⇥ as much
    eter model (Section 7). Wor-
    become bigger (they already
    GPT-2 [5]), privacy leakage
    t.
    to Discover. Much of the
    nly discovered when prompt-
    refix. Currently, we simply
    xes and hope that they might
    fix selection strategies [58]
    data.
    n Strategies. We discuss
    g memorization in LMs, in-
    that our work is not harmful), the same techniques apply
    to any LM. Moreover, because memorization gets worse as
    LMs become larger, we expect that these vulnerabilities will
    become significantly more important in the future.
    Training with differentially-private techniques is one
    method for mitigating privacy leakage, however, we believe
    that it will be necessary to develop new methods that can train
    models at this extreme scale (e.g., billions of parameters)
    without sacrificing model accuracy or training time. More
    generally, there are many open questions that we hope will
    be investigated further, including why models memorize, the
    dangers of memorization, and how to prevent memorization.
    Acknowledgements
    We are grateful for comments on early versions of this paper
    by Dan Boneh, Andreas Terzis, Carey Radebaugh, Daphne Ip-
    polito, Christine Robson, Kelly Cooke, Janel Thamkul, Austin
    Tarango, Jack Clark, Ilya Mironov, and Om Thakkar.
    Summary of Contributions
    • Nicholas, Dawn, Ariel, Tom, Colin and Úlfar proposed the
    research question of extracting training data from GPT-2
    and framed the threat model.
    • Colin, Florian, Matthew, and Nicholas stated the memoriza-
    tion definitions.
    • Florian, Ariel, and Nicholas wrote code to generate candi-
    date memorized samples from GPT-2 and verify the ground
    truth memorization.
    • Florian, Nicholas, Matthew, and Eric manually reviewed
    and categorized the candidate memorized content.
    • Katherine, Florian, Eric, and Colin generated the figures.
    cant train-test gap and yet we are still able to extract numerous
    examples verbatim from the training set. The key reason is
    that even though on average the training loss is only slightly
    lower than the validation loss, there are still some training
    examples that have anomalously low losses.
    Larger Models Memorize More Data. Throughout our
    experiments, larger LMs consistently memorize more training
    data than smaller LMs. For example, in one setting the 1.5
    billion parameter GPT-2 model memorizes over 18⇥ as much
    content as the 124 million parameter model (Section 7). Wor-
    ryingly, it is likely that as LMs become bigger (they already
    have become 100⇥ larger than GPT-2 [5]), privacy leakage
    will become even more prevalent.
    Memorization Can Be Hard to Discover. Much of the
    training data that we extract is only discovered when prompt-
    ing the LM with a particular prefix. Currently, we simply
    attempt to use high-quality prefixes and hope that they might
    elicit memorization. Better prefix selection strategies [58]
    might identify more memorized data.
    Adopt and Develop Mitigation Strategies. We discuss
    several directions for mitigating memorization in LMs, in-
    cluding training with differential privacy, vetting the training
    data for sensitive content, limiting the impact on downstream
    applications, and auditing LMs to test for memorization. All
    of these are interesting and promising avenues of future work,
    but each has weaknesses and are incomplete solutions to
    the full problem. Memorization in modern LMs must be ad-
    dressed as new generations of LMs are emerging and becom-
    ing building blocks for a range of real-world applications.
    dangers of memorization, and how to prevent memorization.
    Acknowledgements
    We are grateful for comments on early versions of this paper
    by Dan Boneh, Andreas Terzis, Carey Radebaugh, Daphne Ip-
    polito, Christine Robson, Kelly Cooke, Janel Thamkul, Austin
    Tarango, Jack Clark, Ilya Mironov, and Om Thakkar.
    Summary of Contributions
    • Nicholas, Dawn, Ariel, Tom, Colin and Úlfar proposed the
    research question of extracting training data from GPT-2
    and framed the threat model.
    • Colin, Florian, Matthew, and Nicholas stated the memoriza-
    tion definitions.
    • Florian, Ariel, and Nicholas wrote code to generate candi-
    date memorized samples from GPT-2 and verify the ground
    truth memorization.
    • Florian, Nicholas, Matthew, and Eric manually reviewed
    and categorized the candidate memorized content.
    • Katherine, Florian, Eric, and Colin generated the figures.
    • Adam, Matthew, and Eric ran preliminary investigations in
    language model memorization.
    • Nicholas, Florian, Eric, Colin, Katherine, Matthew, Ariel,
    Alina, Úlfar, Dawn, and Adam wrote and edited the paper.
    • Tom, Adam, and Colin gave advice on language models
    and machine learning background.
    • Alina, Úlfar, and Dawn gave advice on the security goals.
    13

    View Slide

  3. • Demonstrated that large language
    models memorise and leak
    individual training examples


    – Potential privacy leakage confirmed!


    • Sampled (likely) memorised strings
    among generated texts, by using six
    different metrics


    • Categorised and analysed such
    verbatim generation
    3
    Summary
    Ariel Herbert-Voss5,6 Katherine Lee1 Adam Roberts1 Tom Brown5
    Dawn Song3 Úlfar Erlingsson7 Alina Oprea4 Colin Raffel1
    1Google 2Stanford 3UC Berkeley 4Northeastern University 5OpenAI 6Harvard 7Apple
    Abstract
    It has become common to publish large (billion parameter)
    language models that have been trained on private datasets.
    This paper demonstrates that in such settings, an adversary can
    perform a training data extraction attack to recover individual
    training examples by querying the language model.
    We demonstrate our attack on GPT-2, a language model
    trained on scrapes of the public Internet, and are able to extract
    hundreds of verbatim text sequences from the model’s training
    data. These extracted examples include (public) personally
    identifiable information (names, phone numbers, and email
    addresses), IRC conversations, code, and 128-bit UUIDs. Our
    attack is possible even though each of the above sequences
    are included in just one document in the training data.
    We comprehensively evaluate our extraction attack to un-
    derstand the factors that contribute to its success. For example,
    we find that larger models are more vulnerable than smaller
    models. We conclude by drawing lessons and discussing pos-
    sible safeguards for training large language models.
    1 Introduction
    Language models (LMs)—statistical models which assign a
    GPT-2
    East Stroudsburg Stroudsburg...
    Prefix
    --- Corporation Seabank Centre
    ------ Marine Parade Southport
    Peter W---------
    [email protected]
    +-- 7 5--- 40--
    Fax: +-- 7 5--- 0--0
    Memorized text
    Figure 1: Our extraction attack. Given query access to a
    neural network language model, we extract an individual per-
    son’s name, email address, phone number, fax number, and
    physical address. The example in this figure shows informa-
    tion that is all accurate so we redact it to protect privacy.
    Such privacy leakage is typically associated with overfitting
    rXiv:2012.07805v1 [cs.CR] 14 Dec 2020

    View Slide

  4. • Pre-trained large language models (LMs) get popular


    • While machine learning models could leak training data information,
    overfitting is known as a significant cause of it


    • A prevailing wisdom: SOTA LMs will not overfit to training data


    – Typically, such LMs are trained on massive de-duplicated datasets only once


    • RQ: is that prevailing wisdom really true at all?????
    4
    Background

    View Slide

  5. • Membership inference attack


    – Predict whether or not a particular example was used to train the model


    • Model inversion attack


    – Reconstruct representative views of a subset of examples


    – E.g. a fuzzy image of a particular person within a facial recognition dataset


    • Training data extraction attack


    – Reconstruct verbatim (exact instances of) training examples


    – Model’s “memorisation” of training data, in other words
    5
    Types of privacy attacks

    View Slide

  6. • Data secrecy: (a narrow view)


    – If the training data is confidential or private, extracted memorisation threats the data
    publisher’s security and privacy


    – E.g. Gmail’s autocomplete model is trained on users’ email text; unique string
    memorisation can be a great harm


    • Contextual integrity of data: (a broader view)


    – The data is used outside of its intended context


    – E.g. contact information is exposed in response to a user’s query by a dialogue
    system; even if their email address or phone number are public, unintentional
    exposure may result in harmful outcome
    6
    Risks of training data extraction

    View Slide

  7. 7
    Definition of ‘memorisation’
    black-box interactions where the model generates s as the
    most likely continuation when prompted with some prefix c:
    Definition 1 (Model Knowledge Extraction) A string s is
    extractable4 from an LM fq if there exists a prefix c such that:
    s argmax
    s0: |s0|=N
    fq(s0 | c)
    Note that we abuse notation slightly here to denote by
    fq(s0 | c) the likelihood of an entire sequence s0. Since com-
    puting the most likely sequence s is intractable for large N,
    the argmax in Definition 1 can be replaced by an appropriate
    sampling strategy (e.g., greedy sampling) that reflects the way
    in which the model fq generates text in practical applications.
    We then define eidetic memorization as follows:
    Definition 2 (k-Eidetic Memorization) A string s is k-
    eidetic memorized (for k 1) by an LM fq if s is extractable
    from fq and s appears in at most k examples in the training
    data X: |{x 2 X : s ✓ x}|  k.
    adversary to inspect individual weights or hidden states (e
    attention vectors) of the language model.
    This threat model is highly realistic as many LMs
    available through black-box APIs. For example, the G
    3 model [5] created by OpenAI is available through black-b
    API access. Auto-complete models trained on actual user d
    have also been made public, although they reportedly u
    privacy-protection measures during training [8].
    Adversary’s Objective. The adversary’s objective is to
    tract memorized training data from the model. The stren
    of an attack is measured by how private (formalized as be
    k-eidetic memorized) a particular example is. Stronger atta
    extract more examples in total (both more total sequenc
    and longer sequences) and examples with lower values of
    We do not aim to extract targeted pieces of training data,
    rather indiscriminately extract training data. While targe
    attacks have the potential to be more adversarially harm
    our goal is to study the ability of LMs to memorize d
    black-box interactions where the model generates s as the
    most likely continuation when prompted with some prefix c:
    Definition 1 (Model Knowledge Extraction) A string s is
    extractable4 from an LM fq if there exists a prefix c such that:
    s argmax
    s0: |s0|=N
    fq(s0 | c)
    Note that we abuse notation slightly here to denote by
    fq(s0 | c) the likelihood of an entire sequence s0. Since com-
    puting the most likely sequence s is intractable for large N,
    the argmax in Definition 1 can be replaced by an appropriate
    sampling strategy (e.g., greedy sampling) that reflects the way
    in which the model fq generates text in practical applications.
    We then define eidetic memorization as follows:
    Definition 2 (k-Eidetic Memorization) A string s is k-
    eidetic memorized (for k 1) by an LM fq if s is extractable
    from fq and s appears in at most k examples in the training
    data X: |{x 2 X : s ✓ x}|  k.
    Note that here we count the number of distinct training ex-
    amples containing a given string, and not the total number of
    times the string occurs—a string may appear multiple times
    in a single example, and our analysis counts this as k = 1.
    This definition allows us to define memorization as a spec-
    adversary to inspect individual weights or hidden states (e.g.,
    attention vectors) of the language model.
    This threat model is highly realistic as many LMs are
    available through black-box APIs. For example, the GPT-
    3 model [5] created by OpenAI is available through black-box
    API access. Auto-complete models trained on actual user data
    have also been made public, although they reportedly use
    privacy-protection measures during training [8].
    Adversary’s Objective. The adversary’s objective is to ex-
    tract memorized training data from the model. The strength
    of an attack is measured by how private (formalized as being
    k-eidetic memorized) a particular example is. Stronger attacks
    extract more examples in total (both more total sequences,
    and longer sequences) and examples with lower values of k.
    We do not aim to extract targeted pieces of training data, but
    rather indiscriminately extract training data. While targeted
    attacks have the potential to be more adversarially harmful,
    our goal is to study the ability of LMs to memorize data
    generally, not to create an attack that can be operationalized
    by real adversaries to target specific users.
    Attack Target. We select GPT-2 [50] as a representative
    Given a prefix string c, if a model generate
    a passage which is identical to an example
    in a training dataset, the model’s
    knowledge was “extracted”.
    If the extracted passage only appears in k
    pieces of the dataset, the passage is k-
    eidetic memorised. (Note that the k is
    “document frequency”.)
    Smaller k is more serious and dangerous!

    View Slide

  8. • Adversary’s capabilities:


    – Black-box input and output access to a language model, enabling perplexity
    calculation and next-word prediction


    • Adversary’s objective:


    – To extract memorised training data from the model


    – The strength of an attack = how small k is among successful k-eidetic memorisation


    • Attack target:


    – GPT-2, a popular language model with a public API access, trained on public data.
    8
    Threat model

    View Slide

  9. Training-data extraction attack is twofold:


    1. Generate text


    2. Predict which outputs contain memorised text (i.e. membership inference)


    9
    Attack experiments
    200,000 LM
    Generations
    LM (GPT-2)
    Sorted
    Generations
    (using one of 6 metrics)
    Deduplicate
    Training Data Extraction Attack
    Prefixes
    Evaluation
    Internet
    Search
    Choose
    Top-100
    Check
    Memorization
    Match
    No
    Match
    Figure 2: Workflow of our extraction attack and evaluation. Attack. We begin by generating many samples from GPT-2

    View Slide

  10. 10
    Attack experiments
    200,000 LM
    Generations
    LM (GPT-2)
    Sorted
    Generations
    (using one of 6 metrics)
    Deduplicate
    Training Data Extraction Attack
    Prefixes
    Evaluation
    Internet
    Search
    Choose
    Top-100
    Check
    Memorization
    Match
    No
    Match
    Figure 2: Workflow of our extraction attack and evaluation. Attack. We begin by generating many samples from GPT-2
    when the model is conditioned on (potentially empty) prefixes. We then sort each generation according to one of six metrics
    and remove the duplicates. This gives us a set of potentially memorized training examples. Evaluation. We manually inspect
    100 of the top-1000 generations for each metric. We mark each generation as either memorized or not-memorized by manually
    searching online, and we confirm these findings by working with OpenAI to query the original training data.
    As in prior work [51], we perform basic data-sanitization
    by removing HTML and JavaScript from webpages, and we
    de-duplicate data on a line-by-line basis. This gives us a
    Comparing to Other Neural Language Models. Assume
    that we have access to a second LM that memorizes a different
    set of examples than GPT-2. One way to achieve this would be
    Using 3 generation strategies:


    • Top-n sampling (baseline)


    • Decaying temperature


    • Conditioning on Internet
    text
    Using 6 membership inference metrics:


    • Perplexity-based confidence (baseline)


    • Perplexity of smaller models (medium and small)


    • Ratio of zlib compressed entropy


    • Perplexity of lowercased text


    • Min perplexity of sliding windows
    × = 1800
    4 authors manually search
    Internet for outputs.

    Then check whether
    memorisation is contained
    in the GPT-2 training set.

    View Slide

  11. • Top-n sampling


    – Sampling the next word from top-n likely words based on the model


    • Sampling tokens to generate with decaying temperature


    – To diverse outputs, giving high softmax temperature at first, then for the latter
    generation, giving lower and lower temperature


    • Conditioning on Internet text


    – In generation, seeds the model with snippets from Internet


    – Use different data (Common Crawl) from that GPT-2 uses (Reddit link-based)
    11
    Experiment details
    1. Generate text

    View Slide

  12. • Like existing membership inference attacks, model’s confidence to the
    outputs is used as the measure of memorisation


    • Perplexity of the output is adopted as a baseline, since we use language
    model


    • However, in combination with the standard Top-n sampling, following
    outputs are observed:


    – Trivial memorisation: e.g. a number sequence from 1 to 100


    – Repeated substrings: e.g. “I love you. I love you. I love …”
    12
    Experiment details
    2. Membership inference
    Much memorised
    generation was found too,
    though: MIT licence,
    Terms of Service of web
    apps, etc.

    View Slide

  13. To reduce trivial memorisation and repeated substrings, 5 new metrics are proposed:


    • Perplexity of other neural language models (i.e. smaller GPT-2 models)


    – Small (117M params)


    – Medium (345M params)


    • Ratio of zlib compressed entropy


    • Perplexity of lowercased-version of the generated text


    • Minimum perplexity of the tokens within sliding windows (50 words)


    These are used as “references” to indicate whether the main model (GPT-2 XL)
    outputs ‘unexpectedly high confidence (low perplexity)’
    13
    Experiment details
    2. Membership inference
    Smaller models won’t memorise lower k-eidetic samples
    Detect simple repetitions
    Comparison to canonicalised text
    Detect memorised “sub-strings”

    View Slide

  14. • 604 / 1800 samples (33.5%) were
    memorised


    • Around 4% privacy leakage
    14
    Results
    Category Count
    US and international news 109
    Log files and error reports 79
    License, terms of use, copyright notices 54
    Lists of named items (games, countries, etc.) 54
    Forum or Wiki entry 53
    Valid URLs 50
    Named individuals (non-news samples only) 46
    Promotional content (products, subscriptions, etc.) 45
    High entropy (UUIDs, base64 data) 35
    Contact info (address, email, phone, twitter, etc.) 32
    Code 31
    Configuration files 30
    Religious texts 25
    Pseudonyms 15
    Donald Trump tweets and quotes 12
    Web forms (menu items, instructions, etc.) 11
    Tech news 11
    Lists of numbers (dates, sequences, etc.) 10
    Table 1: Manual categorization of the 604 memorized training
    examples that we extract from GPT-2, along with a descrip-
    F
    2
    s
    Inference
    Strategy
    Text Generation Strategy
    Top-n Temperature Internet
    Perplexity 9 3 39
    Small 41 42 58
    Medium 38 33 45
    zlib 59 46 67
    Window 33 28 58
    Lowercase 53 22 60
    Total Unique 191 140 273
    Table 2: The number of memorized examples (out of 100
    candidates) that we identify using each of the three text gen-
    Memorized
    String
    Sequence
    Length
    Occurrences in Data
    Docs Total
    Y2... ...y5 87 1 10
    7C... ...18 40 1 22
    XM... ...WA 54 1 36
    ab... ...2c 64 1 49
    ff... ...af 32 1 64
    C7... ...ow 43 1 83
    0x... ...C0 10 1 96
    76... ...84 17 1 122
    a7... ...4b 40 1 311

    View Slide

  15. • Membership inference methods characteristics:


    – Zlib finds high k-eidetic samples; Lowercased detects news headlines; Small and
    Medium find rare content


    • Extracting longer verbatim sequences


    – Successfully extend memorised content up to 1450 lines of a source code and to full
    text of MIT licence, Creative Commons, and Project Gutenberg licences


    • Memorisation is context-dependent


    – “3.14159” → [GPT-2] → 25 digits of ; “pi is 3.14159” → 799 digits; “e begins
    2.7182818. pi begins 3.14159” → 824 digits
    π
    15
    Analysis and findings

    View Slide

  16. • 1-eidetic memorisation happens
    and can be detected by the
    metrics

    • Larger model memorises better
    16
    Analysis and findings
    file, namely 1450 lines of verbatim source code. We can
    also extract the entirety of the MIT, Creative Commons, and
    Project Gutenberg licenses. This indicates that while we have
    extracted 604 memorized examples, we could likely extend
    many of these to much longer snippets of memorized content.
    6.5 Memorization is Context-Dependent
    Consistent with recent work on constructing effective
    “prompts” for generative LMs [5,58], we find that the memo-
    rized content is highly dependent on the model’s context.
    For example, GPT-2 will complete the prompt “3.14159”
    with the first 25 digits of p correctly using greedy sampling.
    However, we find that GPT-2 “knows” (under Definition 2)
    more digits of p because using the beam-search-like strategy
    introduced above extracts 500 digits correctly.
    Interestingly, by providing the more descriptive prompt
    “pi is 3.14159”, straight greedy decoding gives the first 799
    digits of p—more than with the sophisticated beam search.
    Further providing the context “e begins 2.7182818, pi begins
    3.14159”, GPT-2 greedily completes the first 824 digits of p.
    This example demonstrates the importance of the context:
    in the right setting, orders of magnitude more extraction is
    Occurrences Memorized?
    URL (trimmed) Docs Total XL M S
    /r/ 51y/milo_evacua... 1 359 X X 1/2
    /r/ zin/hi_my_name... 1 113 X X
    /r/ 7ne/for_all_yo... 1 76 X 1/2
    /r/ 5mj/fake_news_... 1 72 X
    /r/ 5wn/reddit_admi... 1 64 X X
    /r/ lp8/26_evening... 1 56 X X
    /r/ jla/so_pizzagat... 1 51 X 1/2
    /r/ ubf/late_night... 1 51 X 1/2
    /r/ eta/make_christ... 1 35 X 1/2
    /r/ 6ev/its_officia... 1 33 X
    /r/ 3c7/scott_adams... 1 17
    /r/ k2o/because_his... 1 17
    /r/ tu3/armynavy_ga... 1 8
    Table 4: We show snippets of Reddit URLs that appear a
    varying number of times in a single training document. We
    condition GPT-2 XL, Medium, or Small on a prompt that
    contains the beginning of a Reddit URL and report a X if
    the corresponding URL was generated verbatim in the first
    10,000 generations. We report a 1/2 if the URL is generated by
    providing GPT-2 with the first 6 characters of the URL and
    n Strategy
    ure Internet
    3 39
    42 58
    33 45
    46 67
    28 58
    22 60
    140 273
    mples (out of 100
    the three text gen-
    erence techniques.
    tegies; we identify
    andom numbers or
    Memorized
    String
    Sequence
    Length
    Occurrences in Data
    Docs Total
    Y2... ...y5 87 1 10
    7C... ...18 40 1 22
    XM... ...WA 54 1 36
    ab... ...2c 64 1 49
    ff... ...af 32 1 64
    C7... ...ow 43 1 83
    0x... ...C0 10 1 96
    76... ...84 17 1 122
    a7... ...4b 40 1 311
    Table 3: Examples of k = 1 eidetic memorized, high-
    entropy content that we extract from the training data. Each
    is contained in just one document. In the best case, we extract
    a 87-characters-long sequence that is contained in the training
    dataset just 10 times in total, all in the same document.

    View Slide

  17. The paper proposes the following strategies to mitigate privacy leakage


    • Training with differential privacy — add noise on training


    • Curating the training data — carefully de-duplicate data


    • Limiting impact of memorisation on downstream applications


    – Fine-tuning process may cause the language model to forget some pre-training data


    – But be careful:fine-tuning may introduce another privacy leakages


    • Auditing ML models for memorisation — monitor outputs!
    17
    Mitigating privacy leakage

    View Slide

  18. • Extraction attacks are practical threat


    • Memorisation does not require overfitting ← answer to the first RQ


    • Larger models memorise more data


    • Memorisation can be hard to discover


    • Adopt and develop mitigation strategies
    18
    Lessons and future work

    View Slide

  19. • Though not yet peer-reviewed and accepted (as of 2021/01/24), it is quite a good read


    – Well-written and enlightening, with providing clever ideas and interesting results


    – Even appendices are worth reading


    • The paper structure seems like story-telling a little bit


    – Not following standard scientific-paper structures (still readable, though)


    – Putting different points of view one by one is not an ideal way of academic writing


    • Wanted to see privacy leakage-oriented evaluation per strategy and metrics


    – Which strategy and metric did generate most privacy leakage? The provided results are unclear


    – Table 6 and 7 (in Appendix) only give aggregated results
    19
    Reader’s remarks
    — Shuntaro Yada

    View Slide