The Philosophical Aspects of Data Modelling

The Philosophical Aspects of Data Modelling Emir Muñoz National University
of Ireland Galway Semantics of Object Representation in Machine Learning Birkan Tunç Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, USA

3 Machine Learning Field of study that gives computers the
ability to learn without being explicitly programmed (Arthur Samuel, 1959) https://www.informatik.uni-hamburg.de/ML/ Contribution Philosopher INTRODUCTION “ ”

4 Text recognition Recommender Systems Face detection Self-driving Cars http://commons.wikimedia.org/
ML APPLICATIONS

5 INTRODUCTION Philosopher Researcher/ Engineer

6 INTRODUCTION Philosopher Researcher/ Engineer Idealization Abstraction Latent variables

7 INTRODUCTION Philosopher Researcher/ Engineer New conceptual development New insights
into the source of knowledge New aspects of the scientific methodology

8 Regression Classification Clustering STATISTICAL LEARNING Continuous labels Discrete labels
Densities

• Author’s proposal: – Machine learning needs to be cultivated
with the vocabulary of philosophy to extend the range of questions that raised when evaluating various aspects of machine learning, pertaining to data representation 9 STATISTICAL LEARNING Real Entity - Nature - Structure → () Mathematical Object - Properties

10 Duck? Beaver? Otter? A Platypus WHO CARES?

11 • «The foundations of pattern recognition can be traced
to Plato, later extended by Aristotle, who distinguished between an “essential property” […] from an “accidental property” […]» WHO CARES? Pattern recognition  find such essential properties

12 Training Data Test Data Machine Learning Algorithm Hypothesis Performance
Feedback What is the justification to use this model and object representation ? WHO CARES?

• “No free lunch” (The Supervised Learning No-Free-Lunch Theorems, Wolpert,
2002) 13 Our model is a simplification of reality Simplification is based on assumptions (model bias) Assumptions fail in certain situations “No one model works best for all possible situations.” WHO CARES?

14 • What is the justification to use this model
and object representation ? Absolute performance Relative performance Quantified by probabilistic bounds of the generalization error Compared to the relative algorithms and other configurations Examples: • Confusion matrix • Accuracy • Misclassification rate Examples: • Mahalanobis distance • Kolmogorov-Smirnov distance • ROC curves and AUC • Gini Need for philosophical attention WHO CARES? (Varieties of Justification in Machine Learning, Corfield, 2010)

15 WHO CARES? Mental disorders Vs. Normality f(X)

16 WHO CARES? Which one is better now? I told
you, we need to look beyond the accuracy, consistency, and relative performance…

17 WHO CARES? Kernel Trick Linear separation With errors Non-linear
separation No errors Non-linear surface corresponding to a linear surface in the feature space We boost the performance of our model, regardless of the nonlinearity of original features

18 WHO CARES? f(X) Output prediction is not the main
goal. But a more extensive comprehension of the interactions between the main players of the system.

19 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive
inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible”

20 INDUCTIVE INFERENCE • Deductive reasoning (strong syllogism) • Inductive
inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible” Truth Preservation Truth Preservation

21 INDUCTIVE INFERENCE • Statistical learning (weaker than weak syllogism)
“if A is true then B is plausible; B is true; therefore A is plausible” Tools to evaluate the degree of plausibility that corresponds to our credence on the truth of conclusions

22 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) 1 2 3
induction deduction observations Observing facts Explanatory principles Explanation of the observations Simplification in object representation - Selecting primary/essential attributes - Avoiding the use of accidental attributes

23 INDUCTIVE INFERENCE Aristotelian Epistemology (384-322 BC) Example linear discriminant
= x ∈ ℜ w ∈ ℜ Observable Hyperplane Most objects of class A reside on the side of the hyperplane where > 0.5 Definition of vector , which needs feature extraction and selection “Most objects of class A reside on the side of the hyperplane where ()>0.5; (’)>0.5 is true for an object ’; therefore ’ is plausible of class A”

24 INDUCTIVE INFERENCE Galilean Epistemology (1564-1642) Unlike heavenly bodies, the
mundane objects of the earth were not suitable for mathematical models, as they did not manifest ideal behaviours. Abstraction Idealization representing an object with another object that is easier to handle simplifying properties of an object 3D space to deal with the motion of particles Frictionless surface of rocks falling

25 INDUCTIVE INFERENCE Linear Algebra Vector Space Model Face Recognition
Example of abstraction Example of idealization Galilean idealization is pragmatic and aims to reduce computational limitations. E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.

26 INDUCTIVE INFERENCE Abstraction (a.k.a. Aristotelian idealization) Idealization (a.k.a. Galilean
idealization) Given a class of individuals, an idealization is a concept under which all of the individuals almost fall (in some pragmatically relevant sense), while at least one individual is excluded by the idealization Given a class of individuals, an abstraction is a concept under which all of the individuals fall.

27 OBJECT REPRESENTATION IN MACHINE LEARNING • Two main types
of indeterminacy in learning problems: – Unknown nature of data – Unknown functional form between input and corresponding outputs •  complicate the selection of hypothesis space, but also hinders the identification of essential attributes!!

• More problems: high degree of freedom in the configuration
of learning algorithms 28 OBJECT REPRESENTATION IN MACHINE LEARNING Researchers play with the original feature space, for example using Principal Component Analysis (PCA). PCA is used for both: - Dimensionality reduction and; - Space transformation by identifying directions of maximum variance.

29 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction

30 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction Kernel Trick
1 = 1 , 2 , … , 2 = ′1 , ′2 , … , ′ Let ∈ , and a mapping ∶ → Real objects (1 , 2 ) ≡ 1 , (2 ) The Kernel Trick (Rasmussen & Williams, 2005): - Enable us to work in very complex vector spaces without even knowing the mapping itself.

31 OBJECT REPRESENTATION IN MACHINE LEARNING • Abstraction “Abstraction does
not necessarily cause epistemic problems since in most cases it is a necessary step to take.” “Without mathematical abstraction, it would not be possible to establish any foundation of statistical learning.” computational gains vs. representational issues

32 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization It does
not only act over the features but is also realized during the model construction. Remove irrelevant features to sort out the accidental attributes Remove irrelevant features to alleviate computational issues such as to reduce the dimensionality

33 OBJECT REPRESENTATION IN MACHINE LEARNING • Idealization – (Weisberg,
2007) identifies 3 kinds of idealization used in scientific models Multi model idealization • Boosting, voting (ensemble methods) • Used when no single model can characterize the underlying causal structure • Small models with different set of features Galilean idealization • Performed against technical difficulties • Deliberate distortions • Bayesian learning model struggles with computational complexities without idealization Minimalist (Aristotelian) idealization • ‘stripping away’ all properties from a concrete object that we believe are not relevant to the problem at hand. • focus on a limited set of properties in isolation

34 OBJECT REPRESENTATION IN MACHINE LEARNING • Theoretical Variables Theoretical
term is the negation of observability, i.e. entities that cannot be perceived directly without aid of technical instruments or inferences This object is in cluster C Theoretical/latent variable is any variable not included in the unprocessed feature set Problematic in their semantics!! Does it refer to any real object or property? What is its meaning?

35 How old am I? Latent Variables Based on teeth.
• Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth. • Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth. http://www.wikihow.com/Know-Your-Cat%27s-Age Based on fur. • Like humans, cats will also develop grey hairs with age. Based on paws, claws, and pads. • As cats age, their nails will harden and become brittle and overgrown. Based on eyes. • Older cats will develop a cloudiness not present in kittens and younger cats, who have sharp, clear eyes. Based on behaviour. • Younger cats--like younger people--are generally more energetic and attracted to play. Hidden variables Not directly observed but inferred OBJECT REPRESENTATION IN MACHINE LEARNING

• Multiple successful applications of Machine Learning – Not mainly
rooted in our glorious technological advancements 36 WHAT IS NEXT? Theory of kernels (Aronszajn, 1950) SVM first version (Vapnik & Lerner, 1963) Statistical learning (Vapnik & Chervoneskis, 1974) SVM final version (Cortes & Vapnik, 1995) 30 years!!!! Success associated with strong foundations, not with increasing size of the computer memory

37 WHAT IS NEXT? First steps into the relationship between
Philosophy and Machine Learning Which one is better now?

38 What real entity corresponds this? WHAT IS NEXT?

39 WHAT IS NEXT?

40 HOW THIS IS RELATED TO MY PHD • RDF
 method for conceptual description or modelling of information • Linked Data  method of publishing structured data • I want to apply ML techniques over Linked Data • What is the nature or structure of a Linked Data dataset? Thanks!

The Philosophical Aspects of Data Modelling

The Philosophical Aspects of Data Modelling

More Decks by Emir Muñoz

Other Decks in Research

Featured

Transcript