Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Paper Intro: Human Rademacher Complexity

Paper Intro: Human Rademacherย Complexity

Avatar for Masanari Kimura

Masanari Kimura

August 21, 2023
Tweet

More Decks by Masanari Kimura

Other Decks in Research

Transcript

  1. Human Rademacher Complexity Created by: Masanari Kimura Institute: The Graduate

    University for Advanced Studies, SOKENDAI Dept: Department of Statistical Science, School of Multidisciplinary Sciences E-mail: [email protected]
  2. TL;DR โŠš NIPS2009 [12]; โŠš In statistical learning theory, Rademacher

    complexity is one of the complexity measures of function class, and induces the generalization bounds; โŠš This work propose to use Rademacher complexity as a measure of human learning capacity. 2 สข 23
  3. Introduction โŠš The capacity is one of the main research

    questions in cognitive psychology: โ€ข How much information can humans hold [7, 6, 3]? โ€ข What kinds of functions can humans easily acquire [11, 4]? โ€ข How do humans avoid over-fitting [8]? โŠš In statistical learning theory, there are several concepts for capacity of function class: โ€ข Vapnik-Chervonenkis (VC) dimension [10]; โ€ข Rademacher complexity [5]. โŠš These capacity notions provide the generalization bounds and probability of over-fitting; โŠš Q. Are these capacity measures useful for evaluating human cognitive ability? 3 สข 23
  4. Notations โŠš X: input space; โŠš ๐‘ฅ โˆˆ X: an

    instance from input space; โŠš ๐‘ƒ๐‘‹ : underlying marginal distribution on X; โŠš F: hypothesis space; โŠš ๐‘“ โˆถ X โ†’ โ„: a real-valued function (hypothesis); 4 สข 23
  5. Rademacher complexity โŠš Consider a sample of ๐‘› instances: ๐‘ฅ1,

    โ€ฆ , ๐‘ฅ๐‘› drawn i.i.d. from ๐‘ƒ๐‘‹ . โŠš Generate ๐‘› random variables ๐œŽ1, โ€ฆ , ๐œŽ๐‘› โˆˆ {โˆ’1, +1}. Definition (Rademacher complexity) For a set of real-valued functions F with input space X, a distribution ๐‘ƒ๐‘‹ on X, and sample size ๐‘›, the Rademacher complexity ๐‘…(F, X, ๐‘ƒ๐‘‹ , ๐‘›) is ๐‘…(F, X, ๐‘ƒ๐‘‹ , ๐‘›) = ๐”ผ ๐‘ฅ1 ,โ€ฆ,๐‘ฅ๐‘› โˆผ๐‘ƒ๐‘‹ ๐œŽ1 ,โ€ฆ,๐œŽ๐‘› โˆผ๐ต๐‘’๐‘Ÿ(1/2) [sup ๐‘“ โˆˆF | 2 ๐‘› ๐‘› โˆ‘ ๐‘–=1 ๐œŽ๐‘– ๐‘“ (๐‘ฅ๐‘– )|] , (1) where ๐œŽ1 โ€ฆ , ๐œŽ๐‘› โˆผ ๐ต๐‘’๐‘Ÿ(1/2) with values ยฑ1. 5 สข 23
  6. โŠš Rademacher complexity measures how easy it is for F

    to fit random label flipping. โ€ข Flexible function class F โ‡’ High complexity; โ€ข Inflexible function class F โ‡’ Low complexity. 6 สข 23
  7. Empirical estimation of Rademacher complexity โŠš We can estimate Rademacher

    complexity from random samples {๐‘ฅ(1) ๐‘– , ๐œŽ(1) ๐‘– }๐‘› ๐‘–=1 , โ€ฆ , {๐‘ฅ(๐‘š) ๐‘– , ๐œŽ(๐‘š) ๐‘– }๐‘› ๐‘–=1 . โŠš From McDiarmidโ€™s inequality, we have the following theorem. Theorem Let F be a set of functions mapping to [โˆ’1, 1]. For any integers ๐‘› and ๐‘š, we have โ„™ [|๐‘…(F, X, ๐‘ƒ๐‘‹ , ๐‘›) โˆ’ 1 ๐‘š ๐‘š โˆ‘ ๐‘—=1 sup ๐‘“ โˆˆF | 2 ๐‘› ๐‘› โˆ‘ ๐‘–=1 ๐œŽ(๐‘—) ๐‘– ๐‘“ (๐‘ฅ(๐‘—) ๐‘– )|| โ‰ฅ ๐œ–] โ‰ค 2 exp {โˆ’ ๐œ–2๐‘›๐‘š 8 } (2) 11 สข 23
  8. Generalization error bounds Theorem Let F be a set of

    functions mapping X to {โˆ’1, 1}. Let ๐‘ƒ๐‘‹๐‘Œ be a probability distribution on X ร— {โˆ’1, 1} with marginal distribution ๐‘ƒ๐‘‹ on X. Let {(๐‘ฅ๐‘– , ๐‘ฆ๐‘– )}๐‘› ๐‘–=1 i.i.d. โˆผ ๐‘ƒ๐‘‹๐‘Œ be a training sample of size ๐‘›. For any ๐›ฟ > 0, with probability at least 1 โˆ’ ๐›ฟ, every function ๐‘“ โˆˆ F satisfies ๐‘’(๐‘“ ) โˆ’ ฬ‚ ๐‘’(๐‘“ ) โ‰ค ๐‘…(F, X, ๐‘ƒ๐‘‹ , ๐‘›) 2 + โˆš ln 1 ๐›ฟ 2 , (3) where ๐‘’(๐‘“ ) โ‰” ๐”ผ(๐‘ฅ,๐‘ฆ)โˆผ๐‘ƒ๐‘‹๐‘Œ [๐‘ฆ โ‰  ๐‘“ (๐‘ฅ)] and ฬ‚ ๐‘’(๐‘“ ) โ‰” 1 ๐‘› โˆ‘๐‘› ๐‘–=1 ๐‘ฆ๐‘– โ‰  ๐‘“ (๐‘ฅ๐‘– ). 12 สข 23
  9. โŠš Goal: measure the Rademacher complexity of human learning system.

    โŠš ๐ป๐›ผ: set of functions F that an average human subject can come up with on the experiments. โŠš Two assumptions: โ€ข Universality[1]: every individual has the same ๐ป๐›ผ . โ€ข Computability of the supremum on ๐ป๐›ผ : when making classification judgements, participants use the best function at their disposal. โŠš โ‡’ Participants are doing their best to perform the task. 13 สข 23
  10. Computation of Human Rademacher complexity โŠš Each participants is presented

    with a training sample {(๐‘ฅ๐‘–, ๐œŽ๐‘–)}๐‘› ๐‘–=1 . โŠš They are asked to learn the instance-label mapping. โ€ข The subject is not told that the labels are random. โŠš Assume that the subject will search within ๐ป๐›ผ for the best rule: minimizing training error ๐‘“ โˆ— = argmax๐‘“ โˆˆ๐ป๐›ผ โˆ‘๐‘› ๐‘–=1 ๐œŽ๐‘–๐‘“ (๐‘ฅ๐‘–) = argmin๐‘“ โˆˆ๐ป๐›ผ ฬ‚ ๐‘’(๐‘“ ). โŠš Later, ask the subject to classify the same training instances {๐‘ฅ๐‘–}๐‘› ๐‘–=1 and approximate as sup ๐‘“ โˆˆ๐ป๐›ผ | 2 ๐‘› ๐‘› โˆ‘ ๐‘–=1 ๐œŽ๐‘–๐‘“ (๐‘ฅ๐‘–)| โ‰ˆ | 2 ๐‘› ๐‘› โˆ‘ ๐‘–=1 ๐œŽ๐‘–๐‘“ โˆ—(๐‘ฅ๐‘–)| . (4) 14 สข 23
  11. Given domain X, distribution ๐‘ƒ๐‘‹ , training sample size ๐‘›,

    and number of subjects ๐‘š, generate {(๐‘ฅ(1) ๐‘– , ๐œŽ(1) ๐‘– )}๐‘› ๐‘–=1 , โ€ฆ , {(๐‘ฅ(๐‘š) ๐‘– , ๐œŽ(๐‘š) ๐‘– )}๐‘› ๐‘–=1 , where ๐‘ฅ(๐‘—) ๐‘– i.i.d. โˆผ ๐‘ƒ๐‘‹ and ๐œŽ(๐‘—) ๐‘– i.i.d. โˆผ Ber(1/2, 1/2) with value ยฑ1. 1. Participant ๐‘— is shown {(๐‘ฅ(๐‘—) ๐‘– , ๐œŽ(๐‘—) ๐‘– )}๐‘› ๐‘–=1 . The participant is informed that there are only two categories; the order does not matter; they have only three minutes to study; and later they will be asked to use what they have learned to categorize more instances. 2. After three minutes the sheet is taken away. To prevent active maintenance of training items in working memory, the participant performs a filler task consisting of ten two-digit addition / subtraction questions. 3. The participant is given another sheet with the same {๐‘ฅ(๐‘—) ๐‘– }๐‘› ๐‘–=1 without labels. The order of the ๐‘› instances is randomized. The participant is not told that they are the same training instances, is encouraged to guess if necessary, and there is no time limit. Conduct a post-experiment interview where the subject reports any insights or hypothesis they may have on the categories. 15 สข 23
  12. Experimental setup โŠš Materials: For simplicity, ๐‘ƒ๐‘‹ is uniform in

    all experiments. 1) The โ€Shapeโ€ Domain: X consists of 321 computer-generated 3D shapes. The shapes are parametrized by a real number ๐‘ฅ โˆˆ [0, 1], such that small ๐‘ฅ produces spiky shapes, while large ๐‘ฅ produces smooth ones. 2) The โ€Wordโ€ Domain X consists of 321 English words. Based on the Wisconsin Perceptual Attribute Ratings Database, the words are sorted by their emotion valence. The 161 most positive and the 160 most negative ones are used in the experiments. โŠš Participants: They are 80 undergraduate students, participating for partial course credit. They are divided evenly into eight groups. Each group of ๐‘š = 10 subjects worked on a unique combination of the Shape or the Word domain, and training sample size ๐‘› โˆˆ {5, 10, 20, 40}. 16 สข 23
  13. โŠš Observation 1: Human Rademacher complexities in both domain decrease

    as ๐‘› increase. โ€ข When ๐‘› = 5, one subject thought the shape categories are determined by whether the shape faces downward; another thought the word categories indicated whether the word contains the letter T. โ€ข When ๐‘› = 40, about half the participants believe the labels to be random. โŠš Observation 2: Human Rademacher complexities are significantly higher in the Word domain than in the Shape domain. โ€ข One can speculate that Human Rademacher complexities reflect the richness of the participantโ€™s pre-existing knowledge about the domain. โŠš Observation 3: Many of these Human Rademacher complexities are relatively large. โ€ข This means that humans have a large capacity to learn arbitrary labels. 18 สข 23
  14. Conclusion and discussion โŠš In this study, they suggest that

    complexity measures of statistical machine learning are useful for analyzing human cognitive ability. โŠš Human Rademacher complexity may help explain the human tendency to discern patterns in random stimuli: โ€ข illusory correlations [2]; โ€ข false memory effect [9]. โŠš Human Rademacher complexity can assist experimental psychologists in assessing the likelihood of overfitting in their stimulus materials. โ€ข Human Rademacher complexity exhibits significant variation across domains (from experimental results). 20 สข 23
  15. References [1] Alfonso Caramazza and Michael McCloskey. โ€œThe case for

    single-patient studiesโ€. In: Cognitive Neuropsychology 5.5 (1988), pp. 517โ€“527. [2] Loren J Chapman. โ€œIllusory correlation in observational reportโ€. In: Journal of Verbal Learning and Verbal Behavior 6.1 (1967), pp. 151โ€“155. [3] Nelson Cowan. โ€œThe magical number 4 in short-term memory: A reconsideration of mental storage capacityโ€. In: Behavioral and brain sciences 24.1 (2001), pp. 87โ€“114. [4] Jacob Feldman. โ€œMinimization of Boolean complexity in human concept learningโ€. In: Nature 407.6804 (2000), pp. 630โ€“633. [5] Michael J Kearns and Umesh Vazirani. An introduction to computational learning theory. MIT press, 1994. [6] George A Miller. โ€œSome limits on our capacity for processing informationโ€. In: Psychological Review 63 (1956), pp. 81โ€“97. 21 สข 23
  16. References [7] George A Miller. โ€œThe magical number seven, plus

    or minus two: Some limits on our capacity for processing information.โ€. In: Psychological review 63.2 (1956), p. 81. [8] Randall C Oโ€™Reilly and James L McClelland. โ€œHippocampal Conjunctive Encoding, Storage, and Recall: Avoiding a Tradeoff, Parallel Distributed Processing and Cognitive Neuroscience Technical Report PDPโ€. In: CNS. 1994. [9] Henry L Roediger and Kathleen B McDermott. โ€œCreating false memories: Remembering words not presented in lists.โ€. In: Journal of experimental psychology: Learning, Memory, and Cognition 21.4 (1995), p. 803. [10] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999. [11] William D Wattenmaker et al. โ€œLinear separability and concept learning: Context, relational properties, and concept naturalnessโ€. In: Cognitive Psychology 18.2 (1986), pp. 158โ€“194. 22 สข 23
  17. References [12] Jerry Zhu, Bryan Gibson, and Timothy T Rogers.

    โ€œHuman rademacher complexityโ€. In: Advances in neural information processing systems 22 (2009). 23 สข 23