Applied machine learning at facebook a datacenter infrastructure perspective HPCA18

Slide 1

Slide 1 text

Applied Machine Learning at Facebook : A Datacenter Infrastructure Perspective International Symposium on High-Performance Computer Architecture (HPCA) 18 I Shunya Ueta (@hurutoriya) 2018-03-07

Slide 2

Slide 2 text

Abstract “This paper describes the hardware and software infrastructure that supports machine learning at global scale.” 2.1 billion Users served Machine Learning (ML-as-a- Service). Ranking posts for News Feed, Speech and Text Translations,and Photo and Real-time Video Classification FAIR System & Network

Slide 3

Slide 3 text

What’s Contribution? ● MLaaS, Computer Vision represents only a small fraction of the resource requirements. ● FB relies upon an incredibly diverse set of ML approaches. ○ e.g. SVM, GBDT,Logistic Regression(LR) ● Inference used mainly CPU, Training used CPU and GPU.

Slide 4

Slide 4 text

MLaas Pipeline Design on Facebook

Slide 5

Slide 5 text

Major Services Leveraging Machine Learning 1. News Feed : Ranking Alg. Almost user visit for News Feed. 2. Ads: ML to determine which ads to display to a given user a. “Practical lessons from predicting clicks on ads at facebook,” ADKDD14 3. Search : Videos, Photos, People, Events, etc. 4. Sigma : is the general classification and anomaly detection framework 5. Lumos : high-level attributes and embeddings from an image and its content 6. Facer : Facebook’s face detection and recognition framework. 7. Language Translation : Support translations for more than 45 languages. [link] 8. Speech Recognition : provides automated captioning for video

Slide 6

Slide 6 text

Machine Learning Models - LR and SVM are efficient to train and use for prediction. - MLP : ranking newsfeed, CNN : CV, RNN/LSTM : NLP

Slide 7

Slide 7 text

MLaaS inside Facebook

Slide 8

Slide 8 text

FBLeaner Flow

Slide 9

Slide 9 text

FBLeaner Flow

Slide 10

Slide 10 text

FBLeaner Flow

Slide 11

Slide 11 text

DNN Framework ● PyTorch is optimized for research. Focuses on flexibility, debugging, and dynamic neural which ena enbles rapid experimentation. Not optimized for production and mobile deployments. ● Caffe2 iis optimized for production. Performance, Cross-platform Support, and coverage for CNN,RNN,MLP Third party package can use cuDNN, MKL, and Metal

Slide 12

Slide 12 text

Research result transfer to production by ONNX ● Decoupling Research and Production Frameworks (Pytorch ←→Caffe2)

Slide 13

Slide 13 text

RESOURCE IMPLICATIONS OF MACHINE LEARNING [link]

Slide 14

Slide 14 text

Compute Type and Locality Distributed Training : P. Goyal et al. Takuya Akiba et al.

Slide 15

Slide 15 text

RESOURCE REQUIREMENTS OF ONLINE INFERENCE WORKLOADS.

Slide 16

Slide 16 text

Future of MLaaS at Facebook ● ML workloads benefit from SIMD, specialized convolution or matrix multiplication engines. ● Model compression, Quantization, and High-bandwidth memory ○ "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, ICLR16 Song Han et al ○ "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1" Matthieu Courbariaux et al. ○ “Ternary Neural Networks for Resource-Efficient AI Applications” Hande Alemdar et al. ● Relational Work : ○ "TFX: A TensorFlow-Based Production-Scale Machine Learning Platform" KDD17

Slide 17

Slide 17 text

Conclusion ● 2.1 billion Users served MLaaS at Facebook!! ● MLaaS, Computer Vision represents only a small fraction of the resource requirements. ● FB relies upon an incredibly diverse set of ML approaches. ○ e.g. SVM, GBDT,Logistic Regression(LR) ● Inference used mainly CPU, Training used CPU and GPU.