The probabilistic formulation of the No Free Lunch Theorem is given by f∈F P (Dy|f, Λ, n) = f∈F+ P (Dy|f, Λ, n) + f∈F0 P (Dy|f, Λ, n) + f∈F− P (Dy|f, Λ, n) where Λ is an optimization algorithm, Dy is the data set of corresponding outputs, f is a function to be optimized, and n is the number of samples in the data set. This definition can be interpreted as f∈F P (Dy|f, Λ1 , n) = f∈F P (Dy|f, Λ2 , n) Anti-training can be viewed as a generalization of meta-learning that exploits the consequences of the NFL Theorem. Anti-training tailors learning and optimization algorithms to problem distributions (i.e., data sets) Bottleneck with Sacrificial Data Generating data to obtain measurements from F− is not trivial and how much sacrificial data do you need to generate? The real-problem: Execution of 10 fold cross validation can take up to 20 weeks for one data set! ECE523: Engineering Applications of Machine Learning and Data Analytics Learning What We Don’t Care About: Regularization with Sacrificial Functions