Predicting hyperparameters from meta-features in binary classification problems

Predicting hyperparameters from meta-features in binary classification problems

The presence of computationally demanding problems and the current inability to auto- matically transfer experience from the application of past experiments to new ones delays the evolution of knowledge itself. In this paper we present the Automated Data Scientist, a system that employs meta-learning for hyperparameter selection and builds a rich ensem- ble of models through forward model selection in order to automate binary classification tasks. Preliminary evaluation shows that the system is capable of coping with classification problems of medium complexity.

82ac131798f5fcf25770d0df8fac0ee8?s=128

Kyriakos Chatzidimitriou

July 17, 2018
Tweet

Transcript

  1. 1.

    Predicting hyperparameters from meta-features in binary classification problems • Proposal:

    Automated Data Scientist – a system that employs meta-learning for hyperparameter prediction and builds a rich ensemble of models through forward model selection in order to automate binary classification tasks. • Design requirement: User just inserts a dataset. Default setting: full automation with opinionated choices AutoML 2018, Stockholm Nisioti E., Chatzidimitriou K., Symeondis A. - Aristotle University of Thessaloniki 1 • data cleaning (inappropriate value removal, data type recognition, compression) • data preprocessing (normalization, compression, feature engineering) • data splitting • Hyperparameter selection is performed using prediction models, trained on data consisting of meta-features and optimal hyperparameters, produced through Bayesian optimization on a repository of 100 binary classification datasets • 31 meta-features extracted from the dataset (from 77 studied) • Forward selection ensembler • Result: performance equivalence to both well-established and state-of-the-art hyperparameter optimization techniques, while bringing the additional benefits of generation of meta-knowledge and speed, as the time-consuming search was replaced by a simple prediction. • Autogenerated, intuitive reporting, to help guide manual tweaks