of Ireland Galway Semantics of Object Representation in Machine Learning Birkan Tunç Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, USA

with the vocabulary of philosophy to extend the range of questions that raised when evaluating various aspects of machine learning, pertaining to data representation 9 STATISTICAL LEARNING Real Entity - Nature - Structure → () Mathematical Object - Properties

to Plato, later extended by Aristotle, who distinguished between an “essential property” […] from an “accidental property” […]» WHO CARES? Pattern recognition find such essential properties

2002) 13 Our model is a simplification of reality Simplification is based on assumptions (model bias) Assumptions fail in certain situations “No one model works best for all possible situations.” WHO CARES?

and object representation ? Absolute performance Relative performance Quantified by probabilistic bounds of the generalization error Compared to the relative algorithms and other configurations Examples: • Confusion matrix • Accuracy • Misclassification rate Examples: • Mahalanobis distance • Kolmogorov-Smirnov distance • ROC curves and AUC • Gini Need for philosophical attention WHO CARES? (Varieties of Justification in Machine Learning, Corfield, 2010)

separation No errors Non-linear surface corresponding to a linear surface in the feature space We boost the performance of our model, regardless of the nonlinearity of original features

inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible”

inference (weak syllogism) “if A is true then B is true; A is true; therefore B is true” “if A is true then B is true; B is true; therefore A is plausible” Truth Preservation Truth Preservation

“if A is true then B is plausible; B is true; therefore A is plausible” Tools to evaluate the degree of plausibility that corresponds to our credence on the truth of conclusions

induction deduction observations Observing facts Explanatory principles Explanation of the observations Simplification in object representation - Selecting primary/essential attributes - Avoiding the use of accidental attributes

= x ∈ ℜ w ∈ ℜ Observable Hyperplane Most objects of class A reside on the side of the hyperplane where > 0.5 Definition of vector , which needs feature extraction and selection “Most objects of class A reside on the side of the hyperplane where ()>0.5; (’)>0.5 is true for an object ’; therefore ’ is plausible of class A”

mundane objects of the earth were not suitable for mathematical models, as they did not manifest ideal behaviours. Abstraction Idealization representing an object with another object that is easier to handle simplifying properties of an object 3D space to deal with the motion of particles Frictionless surface of rocks falling

Example of abstraction Example of idealization Galilean idealization is pragmatic and aims to reduce computational limitations. E.g., feature selection to facilitate –otherwise infeasible- training of a classifier.

idealization) Given a class of individuals, an idealization is a concept under which all of the individuals almost fall (in some pragmatically relevant sense), while at least one individual is excluded by the idealization Given a class of individuals, an abstraction is a concept under which all of the individuals fall.

of indeterminacy in learning problems: – Unknown nature of data – Unknown functional form between input and corresponding outputs • complicate the selection of hypothesis space, but also hinders the identification of essential attributes!!

of learning algorithms 28 OBJECT REPRESENTATION IN MACHINE LEARNING Researchers play with the original feature space, for example using Principal Component Analysis (PCA). PCA is used for both: - Dimensionality reduction and; - Space transformation by identifying directions of maximum variance.

1 = 1 , 2 , … , 2 = ′1 , ′2 , … , ′ Let ∈ , and a mapping ∶ → Real objects (1 , 2 ) ≡ 1 , (2 ) The Kernel Trick (Rasmussen & Williams, 2005): - Enable us to work in very complex vector spaces without even knowing the mapping itself.

not necessarily cause epistemic problems since in most cases it is a necessary step to take.” “Without mathematical abstraction, it would not be possible to establish any foundation of statistical learning.” computational gains vs. representational issues

not only act over the features but is also realized during the model construction. Remove irrelevant features to sort out the accidental attributes Remove irrelevant features to alleviate computational issues such as to reduce the dimensionality

2007) identifies 3 kinds of idealization used in scientific models Multi model idealization • Boosting, voting (ensemble methods) • Used when no single model can characterize the underlying causal structure • Small models with different set of features Galilean idealization • Performed against technical difficulties • Deliberate distortions • Bayesian learning model struggles with computational complexities without idealization Minimalist (Aristotelian) idealization • ‘stripping away’ all properties from a concrete object that we believe are not relevant to the problem at hand. • focus on a limited set of properties in isolation

term is the negation of observability, i.e. entities that cannot be perceived directly without aid of technical instruments or inferences This object is in cluster C Theoretical/latent variable is any variable not included in the unprocessed feature set Problematic in their semantics!! Does it refer to any real object or property? What is its meaning?

• Count them. Kittens will have 26 deciduous teeth and adult cats will have 30 teeth. • Cats younger than 8 weeks will still be developing their deciduous, or "baby" teeth. http://www.wikihow.com/Know-Your-Cat%27s-Age Based on fur. • Like humans, cats will also develop grey hairs with age. Based on paws, claws, and pads. • As cats age, their nails will harden and become brittle and overgrown. Based on eyes. • Older cats will develop a cloudiness not present in kittens and younger cats, who have sharp, clear eyes. Based on behaviour. • Younger cats--like younger people--are generally more energetic and attracted to play. Hidden variables Not directly observed but inferred OBJECT REPRESENTATION IN MACHINE LEARNING

rooted in our glorious technological advancements 36 WHAT IS NEXT? Theory of kernels (Aronszajn, 1950) SVM first version (Vapnik & Lerner, 1963) Statistical learning (Vapnik & Chervoneskis, 1974) SVM final version (Cortes & Vapnik, 1995) 30 years!!!! Success associated with strong foundations, not with increasing size of the computer memory

method for conceptual description or modelling of information • Linked Data method of publishing structured data • I want to apply ML techniques over Linked Data • What is the nature or structure of a Linked Data dataset? Thanks!