Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing George Cybenko, Guofei Jiang
and Daniel Bilar Thayer School of Engineering Dartmouth College 22th Sept.,1999, 37th Allerton Conference Urbana-Champaign, Illinois Acknowledgements: This work was partially supported by AFOSR grants F49620-97-1-0382, NSF grant CCR-9813744 and DARPA contract F30602-98-2-0107.

Grid vision ▪  Grid computing refers to computing in a
distributed networked environment in which computing and data resources are located throughout the network. ▪  Grid infrastructures provide basic infrastructure for computations that integrate geographically disparate resources, create a universal source of computing power that supports dramatically new classes of applications. ▪  Several efforts are underway to build computational grids such as Globus, Infospheres and DARPA CoABS.

Client Server Matchmaker Advertise Service Location Request Reply Request Service
Grid services ▪  A fundamental capability required in grids is a directory service or broker that dynamically matches user requirements with available resources. ▪  Prototype of grid services

Matching conflicts ▪  Brokers and matchmakers use keywords and domain
ontologies to specify services. ▪  Keywords and ontologies cannot be defined and interpreted precisely enough to make brokering or matchmaking between grid services robust in a truly distributed, heterogeneous computing environment. ▪  Matching conflicts exist between client’s requested functionality and service provider’s actual functionality.

An example ▪  A client requires a three-dimensional FFT. A
request is made to a broker or matchmaker for a FFT service based on the keywords and possibly parameter lists. ▪  The broker or matchmaker uses the keywords to retrieve its catalog of services and returns with the candidate remote services. ▪  Literally dozens of different algorithms for FFT computations with different assumptions, dimensions, accuracy, input-output format and so on. ▪  The client must validate the actual functionality of these remote services before the client commits to use it.

Functional validation ▪  Functional validation means that a client presents
to a prospective service provider a sequence of challenges. The service provider replies to these challenges with corresponding answers. Only after the client is satisfied that the service provider’s answers are consistent with the client’s expectations is an actual commitment made to using the service. ▪  Three steps: –  Service identification and location. –  Service functional validation. –  Commitment to the service

Our approach ▪  Challenge the service provider with some test
cases x1 , x2 , ..., xk . The remote service provider R offers the corresponding answers fR (x1 ), fR (x2 ), ..., fR (xk ). The client C may or may not have independent access to the answers fC (x1 ), fC (x2 ), ..., fC (xk ). ▪  Possible situations and machine learning models: –  C “knows” fC (x) and R provides fR (x). •  PAC learning and Chernoff bounds theory –  C “knows” fC (x) and R does not provide fR (x). •  Zero-knowledge proof –  C does not “know” fC (x) and R provides fR (x). •  Simulation-based learning and reinforcement learning

Mathematical framework ▪  The goal of PAC learning is to
use few examples as possible, and as little computation as possible to pick a hypothesis concept which is a close approximation to the target concept. ▪  Define a concept to be a boolean mapping . X is the input space. c(x)=1 indicates x is a positive example , i.e. the service provider can offer the “correct” service for challenge x. ▪  Define an index function ▪  Now define the error between the target concept c and the hypothesis h as . } , { X : c 1 0 ♦ ( ) ( ) ( ) γ ≤ − = otherwise x f x f if x F R C 0 1 ( ) [ ] x h ) x ( c ob Pr ) h ( error P x ? = !

Mathematical framework(cont’d) ▪  The client can randomly pick m samples
to PAC learn a hypothesis h about whether the service provider can offer the “correct” service . ▪  Theorem 1(Blumer et.al.) Let H be any hypothesis space of finite VC dimension d contained in , P be any probability distribution on X and the target concept c be any Borel set contained in X. Then for any , given the following m independent random examples of c drawn according to P , with probability at least , every hypothesis in H that is consistent with all of these examples has error at most . X 2 1 0 < δ ε < , δ − 1 ε √ ↵ # ε ε δ ε ? 13 8 2 4 log d , log max m ( ) ( ) ( ) { } ) x ( F , x , , ) x ( F , x , ) x ( F , x S m m m ! 2 2 1 1 =

Simplified results ▪  Assuming that with regard to some concepts,
all test cases have the same probability about whether the service provider can offer the “correct” service. ▪  Theorem 2(Chernoff bounds): Consider independent identically distributed samples , from a Bernoulli distribution with expectation . Define the empirical estimate of based on these samples as Then for any , if the sample size , then the probability . ▪  Corollary 2.1: For the functional validation problem described above, given any , if the sample size , then the probability . 1 x m x x , , 2 ! p m x p m i i = = 1 ˆ 2 2 2 ln ln ε δ − − ? m [ ] δ ε ≤ ? − p p probm ˆ 1 , 0 < < δ ε 1 , 0 < < δ ε 2 2 ln ε δ − ? m [ ] δ ε ≤ ? − ) 1 ( p probm

Simplified results(cont’d) ▪  Given a target probability P, the client
needs to know how many positive consecutive samples are required so that the next request to the service will be correct with probability P. ▪  So probabilities , and P have the following inequality: ▪  Formulate the sample size problem as the following nonlinear optimization problem: s.t. and δ ε ) 1 )( 1 ( δ ε − − ≤ P √ ↵ # − = 2 , 2 ln min ε δ δ ε m P ? − − ) 1 )( 1 ( δ ε 1 , 0 < < δ ε

Simplified results(cont’d) ▪  From the constraint inequality, ▪  Then transfer
the above two dimensional function optimization problem to the one dimensional one: s.t. ▪  Elementary nonlinear functional optimization methods. ε δ − − ≤ 1 1 P √ √ √ √ ↵ # − − − = 2 2 ) 1 1 ln( min ε ε ε P m P − < < 1 0 ε

Mobile Functional Validation Agent User Interface User Agent Interface Agent
Computing Server A Mobile Agent Machine A Machine C, D, E, ... Create Send Jump A’s Service Correct Incorrect B’s Service Correct Incorrect C, D, E, ….. MA Interface Agent Computing Server B MA Machine B MA Correct Service

Future work and open questions ▪  Integrate functional validation into
grid computing infrastructure as a standard grid service. ▪  Extend to other situations described(like zero- knowledge proofs, etc.). ▪  Formulate functional validation problems into more appropriate mathematical models. ▪  Explore solutions for more difficult and complicated functional validation situations. ▪  Thanks!!

Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing

Daniel Jacob Bilar

More Decks by Daniel Jacob Bilar

Other Decks in Research

Featured

Transcript

Machine Learning Applications in Grid Computing George Cybenko, Guofei Jiang

Grid vision ▪  Grid computing refers to computing in a

Client Server Matchmaker Advertise Service Location Request Reply Request Service

Matching conflicts ▪  Brokers and matchmakers use keywords and domain

An example ▪  A client requires a three-dimensional FFT. A

Functional validation ▪  Functional validation means that a client presents

Our approach ▪  Challenge the service provider with some test

Mathematical framework ▪  The goal of PAC learning is to

Mathematical framework(cont’d) ▪  The client can randomly pick m samples

Simplified results ▪  Assuming that with regard to some concepts,

Simplified results(cont’d) ▪  Given a target probability P, the client

Simplified results(cont’d) ▪  From the constraint inequality, ▪  Then transfer

Mobile Functional Validation Agent User Interface User Agent Interface Agent

Future work and open questions ▪  Integrate functional validation into