Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Philosophical Aspects of Data Modelling

The Philosophical Aspects of Data Modelling

Reading Group 2015 - Based on the article "Semantics of Object Representation in Machine Learning" by Birkan Tunç

175389e8c3ad885108fc33f8f05ba9bd?s=128

Emir Muñoz

May 13, 2015
Tweet

Transcript

  1. The Philosophical Aspects of Data Modelling Emir Muñoz National University

    of Ireland Galway Semantics of Object Representation in Machine Learning Birkan Tunç Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, USA
  2. 2

  3. 3 Machine Learning Field of study that gives computers the

    ability to learn without being explicitly programmed (Arthur Samuel, 1959) https://www.informatik.uni-hamburg.de/ML/ Contribution Philosopher INTRODUCTION “ ”
  4. 4 Text recognition Recommender Systems Face detection Self-driving Cars http://commons.wikimedia.org/

    ML APPLICATIONS
  5. 5 INTRODUCTION Philosopher Researcher/ Engineer

  6. 6 INTRODUCTION Philosopher Researcher/ Engineer Idealization Abstraction Latent variables

  7. 7 INTRODUCTION Philosopher Researcher/ Engineer New conceptual development New insights

    into the source of knowledge New aspects of the scientific methodology
  8. 8 Regression Classification Clustering STATISTICAL LEARNING Continuous labels Discrete labels

    Densities
  9. • Author’s proposal: – Machine learning needs to be cultivated

    with the vocabulary of philosophy to extend the range of questions that raised when evaluating various aspects of machine learning, pertaining to data representation 9 STATISTICAL LEARNING Real Entity - Nature - Structure → () Mathematical Object - Properties
  10. 10 Duck? Beaver? Otter? A Platypus WHO CARES?

  11. 11 • «The foundations of pattern recognition can be traced

    to Plato, later extended by Aristotle, who distinguished between an “essential property” […] from an “accidental property” […]» WHO CARES? Pattern recognition  find such essential properties
  12. 12 Training Data Test Data Machine Learning Algorithm Hypothesis Performance

    Feedback What is the justification to use this model and object representation ? WHO CARES?
  13. • “No free lunch” (The Supervised Learning No-Free-Lunch Theorems, Wolpert,

    2002) 13 Our model is a simplification of reality Simplification is based on assumptions (model bias) Assumptions fail in certain situations “No one model works best for all possible situations.” WHO CARES?
  14. 14 • What is the justification to use this model

    and object representation ? Absolute performance Relative performance Quantified by probabilistic bounds of the generalization error Compared to the relative algorithms and other configurations Examples: • Confusion matrix • Accuracy • Misclassification rate Examples: • Mahalanobis distance • Kolmogorov-Smirnov distance • ROC curves and AUC • Gini Need for philosophical attention WHO CARES? (Varieties of Justification in Machine Learning, Corfield, 2010)
  15. 15 WHO CARES? Mental disorders Vs. Normality f(X)

  16. 16 WHO CARES? Which one is better now? I told

    you, we need to look beyond the accuracy, consistency, and relative performance…
  17. 17 WHO CARES? Kernel Trick Linear separation With errors Non-linear

    separation No errors Non-linear surface corresponding to a linear surface in the feature space We boost the performance of our model, regardless of the nonlinearity of original features
  18. 18 WHO CARES? f(X) Output prediction is not the main

    goal. But a more extensive comprehension of the interactions between the main players of the system.
  19. 19 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive

    inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible”
  20. 20 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive

    inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible” Truth Preservation Truth Preservation
  21. 21 INDUCTIVE INFERENCE • Statistical learning (weaker than weak syllogism)

    “if A is true then B is plausible; B is true; therefore A is plausible” Tools to evaluate the degree of plausibility that corresponds to our credence on the truth of conclusions
  22. 22 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) 1 2 3

    induction deduction observations Observing facts Explanatory principles Explanation of the observations Simplification in object representation - Selecting primary/essential attributes - Avoiding the use of accidental attributes
  23. 23 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) Example linear discriminant

    = x ∈ ℜ w ∈ ℜ Observable Hyperplane Most objects of class A reside on the side of the hyperplane where > 0.5 Definition of vector , which needs feature extraction and selection “Most objects of class A reside on the side of the hyperplane where ()>0.5; (’)>0.5 is true for an object ’; therefore ’ is plausible of class A”
  24. 24 INDUCTIVE INFERENCE Galilean Epistemology (1564-1642) Unlike heavenly bodies, the

    mundane objects of the earth were not suitable for mathematical models, as they did not manifest ideal behaviours. Abstraction Idealization representing an object with another object that is easier to handle simplifying properties of an object 3D space to deal with the motion of particles Frictionless surface of rocks falling
  25. 25 INDUCTIVE INFERENCE Linear Algebra Vector Space Model Face Recognition

    Example of abstraction Example of idealization Galilean idealization is pragmatic and aims to reduce computational limitations. E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.
  26. 26 INDUCTIVE INFERENCE Abstraction (a.k.a. Aristotelian idealization) Idealization (a.k.a. Galilean

    idealization) Given a class of individuals, an idealization is a concept under which all of the individuals almost fall (in some pragmatically relevant sense), while at least one individual is excluded by the idealization Given a class of individuals, an abstraction is a concept under which all of the individuals fall.
  27. 27 OBJECT REPRESENTATION IN MACHINE LEARNING • Two main types

    of indeterminacy in learning problems: – Unknown nature of data – Unknown functional form between input and corresponding outputs •  complicate the selection of hypothesis space, but also hinders the identification of essential attributes!!
  28. • More problems: high degree of freedom in the configuration

    of learning algorithms 28 OBJECT REPRESENTATION IN MACHINE LEARNING Researchers play with the original feature space, for example using Principal Component Analysis (PCA). PCA is used for both: - Dimensionality reduction and; - Space transformation by identifying directions of maximum variance.
  29. 29 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction

  30. 30 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction Kernel Trick

    1 = 1 , 2 , … , 2 = ′1 , ′2 , … , ′ Let ∈ , and a mapping ∶ → Real objects (1 , 2 ) ≡ 1 , (2 ) The Kernel Trick (Rasmussen & Williams, 2005): - Enable us to work in very complex vector spaces without even knowing the mapping itself.
  31. 31 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction “Abstraction does

    not necessarily cause epistemic problems since in most cases it is a necessary step to take.” “Without mathematical abstraction, it would not be possible to establish any foundation of statistical learning.” computational gains vs. representational issues
  32. 32 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization It does

    not only act over the features but is also realized during the model construction. Remove irrelevant features to sort out the accidental attributes Remove irrelevant features to alleviate computational issues such as to reduce the dimensionality
  33. 33 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization – (Weisberg,

    2007) identifies 3 kinds of idealization used in scientific models Multi model idealization • Boosting, voting (ensemble methods) • Used when no single model can characterize the underlying causal structure • Small models with different set of features Galilean idealization • Performed against technical difficulties • Deliberate distortions • Bayesian learning model struggles with computational complexities without idealization Minimalist (Aristotelian) idealization • ‘stripping away’ all properties from a concrete object that we believe are not relevant to the problem at hand. • focus on a limited set of properties in isolation
  34. 34 OBJECT REPRESENTATION IN MACHINE LEARNING • Theoretical Variables Theoretical

    term is the negation of observability, i.e. entities that cannot be perceived directly without aid of technical instruments or inferences This object is in cluster C Theoretical/latent variable is any variable not included in the unprocessed feature set Problematic in their semantics!! Does it refer to any real object or property? What is its meaning?
  35. 35 How old am I? Latent Variables Based on teeth.

    • Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth. • Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth. http://www.wikihow.com/Know-Your-Cat%27s-Age Based on fur. • Like humans, cats will also develop grey hairs with age. Based on paws, claws, and pads. • As cats age, their nails will harden and become brittle and overgrown. Based on eyes. • Older cats will develop a cloudiness not present in kittens and younger cats, who have sharp, clear eyes. Based on behaviour. • Younger cats--like younger people--are generally more energetic and attracted to play. Hidden variables Not directly observed but inferred OBJECT REPRESENTATION IN MACHINE LEARNING
  36. • Multiple successful applications of Machine Learning – Not mainly

    rooted in our glorious technological advancements 36 WHAT IS NEXT? Theory of kernels (Aronszajn, 1950) SVM first version (Vapnik & Lerner, 1963) Statistical learning (Vapnik & Chervoneskis, 1974) SVM final version (Cortes & Vapnik, 1995) 30 years!!!! Success associated with strong foundations, not with increasing size of the computer memory
  37. 37 WHAT IS NEXT? First steps into the relationship between

    Philosophy and Machine Learning Which one is better now?
  38. 38 What real entity corresponds this? WHAT IS NEXT?

  39. 39 WHAT IS NEXT?

  40. 40 HOW THIS IS RELATED TO MY PHD • RDF

     method for conceptual description or modelling of information • Linked Data  method of publishing structured data • I want to apply ML techniques over Linked Data • What is the nature or structure of a Linked Data dataset? Thanks!