Meta learning for fun and profit

About me > Manager for Security R&D team in LINE
> Stanford MS in CS(Network and system security) > Seoul Nat’l Univ BS in CSE w/ Economics, Psychology > I love interdisciplinary stuff > 10+ years pen-test in public and private sectors > CISSP

Agenda › Small data challenge › Preliminaries › System architecture
› Experiment › Conclusion

Finding bug › Code auditing is a highly sophisticated task.
We need a team of ethical hackers. › Scalability issue › Can we make an autonomous system for this? › One of the best strategy to protect IT systems from malicious hackers.

Deep learning? › Natural Language Processing(NLP) has been progressed a
lot w/ deep learning! › Programming languages and natural languages share a lot of properties › First of all, the system must be able to understand codes. › We need an intelligent system

Small dataset challenge › It is almost impossible to get
legitimate security bug samples that much › Most common pain point in the security area › Deep learning requires humongous amount of data for training › More parameters - more data to train them › GPT-3 used 499B tokens

Our ingredients Reduce training param Transfer learning Meta training Foreign
domain data Few shot learner Meta learning

Transfer learning %BUB %PH 8PMG ʜ 3BU %BUB #JLF #JDZDMF
ʜ $BS Weight transfer Fine tuning

Transformer and BERT .BU.VM .BU.VM 4PGU.BY 4DBMF Q K V
.VMUJ)FBE "UUFOUJPO "EE/PSN 'FFE 'PSXBSE "EE/PSN *OQVU &NCFEEJOH Input Positional Encoding &ODPEFS &ODPEFS &ODPEFS &ODPEFS &ODPEFS &NCFEEJOHT

Few shot learning 5 way, 1 shot image classification 2
way, 1 shot image classification

Meta-Learning › Meta learning › Learning a learner › Given
experience on previous tasks, learn a new task quickly. Meta training Meta testing … … … …

Model-Agnostic Meta-Learning › REPTILE, Nichol et al., 2018 › First-order
approximation › θ ← θ + α 1 n n ∑ i=1 (Uk τi (θ) − θ) › MAML, Finn et al. › Optimization based meta learning › min θ ∑ task i L(θ − α∇ θ L(θ, Dtr i ), Dts i )

Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer
e n c o d e r Bi-LSTM FCNN Softmax FCNN Softmax Start position End position …… e n c o d e r e n c o d e r FCNN Is vuln? ALBERT

Prediction process Embedding layer Transformer layer Transformer layer Transformer layer
LSTM layer … ▁< script > ▁docu ment . onload = alert ( … . . . Softmax layer

Training strategy BPE Tokenizer e n c o d e
r …… e n c o d e r e n c o d e r Phase 1 (Pre-training) Phase 2 (Meta-training) Phase 3 (Fine-tuning) BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln Training target Training target Training target English ALBERT Phase1 ALBERT

Experiment Target › DOM-based XSS › XSS happens in DOM
instead of HTML › HTML code is intact -> runtime investigation is necessary › Source and sink <script> document.write("You are visiting: " + document.baseURI); </script> http://www.example.com/vuln.html#<script>alert('xss')</script>

Experiment datasets › Meta-learning data (Foreign domain) › The Stanford
Question Answering Dataset(SQuAD 2.0) › Generated mini-batch tasks(24 samples for each task) › Fine-tuning data (XSS bug samples) › Patch history from public and private GIT repos › 29 samples of the bug(23 for training, 6 for validation) › Pre-training data › HTML corpus from web(367M) DOM-base XSS bug finding

Experiment › Experiment setup › Baseline › Random init for
RNN, FCNN › Meta-learned › Meta-trained parameters Meta-learning curve Fine-tuning curve comparison EM/F1 score

Is it promising? › Parameter size matters › In GPT-3’s
few shot learning experiments, it achieved 32.1(125M), 55.9(2.7B), 69.8(175B) for SQuAD 2.0 › Our model has 18M parameters. › F1 score of human performance in SQuAD 2.0 is 89.452 › Even though the task is different, ours got 40.1 › The point of the experiment › Our ingredients actually led to better performance.

Where are we? Number of parameters(Billion) 0 45 90 135
180 175 17 1.5 0.34 0.11 0.094 We are here. 0.018B GPT-3 T-NLG GPT-2 BERT-Large GPT ELMo 2018.4 2018.7 2018.10 2019.2 2020.1 2020.6

Conclusion › Meta-learning algorithm can be helpful in case of
small dataset problems › This is a huge point in security area › Foreign domain can be used for meta-training › Structural similarity required › Transformer model is useful but it requires lots of data

Future works › More long term dependency › Handling nested
structure › Problem extension › Polyglot, different kind of bugs › Ensemble model › Better performance w/o increasing the number of the parameters › Training is so expensive › Leveraging programming language’s grammar and structure

Thank you Paper: Cross-domain meta-learning for bug finding in the
source codes with a small dataset Presented at EICC 2020, France

Meta learning for fun and profit

Meta learning for fun and profit

LINE DevDay 2020

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript

About me > Manager for Security R&D team in LINE

Agenda › Small data challenge › Preliminaries › System architecture

Finding bug › Code auditing is a highly sophisticated task.

Deep learning? › Natural Language Processing(NLP) has been progressed a

Small dataset challenge › It is almost impossible to get

Our ingredients Reduce training param Transfer learning Meta training Foreign

Transfer learning %BUB %PH 8PMG ʜ 3BU %BUB #JLF #JDZDMF

Transformer and BERT .BU.VM .BU.VM 4PGU.BY 4DBMF Q K V

Few shot learning 5 way, 1 shot image classification 2

Meta-Learning › Meta learning › Learning a learner › Given

Model-Agnostic Meta-Learning › REPTILE, Nichol et al., 2018 › First-order

Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

Prediction process Embedding layer Transformer layer Transformer layer Transformer layer

Training strategy BPE Tokenizer e n c o d e

Training strategy BPE Tokenizer e n c o d e

Training strategy BPE Tokenizer e n c o d e

Experiment Target › DOM-based XSS › XSS happens in DOM

Experiment datasets › Meta-learning data (Foreign domain) › The Stanford

Experiment datasets › Meta-learning data (Foreign domain) › The Stanford

Experiment datasets › Meta-learning data (Foreign domain) › The Stanford

Experiment › Experiment setup › Baseline › Random init for

Is it promising? › Parameter size matters › In GPT-3’s

Where are we? Number of parameters(Billion) 0 45 90 135

Conclusion › Meta-learning algorithm can be helpful in case of

Future works › More long term dependency › Handling nested

Thank you Paper: Cross-domain meta-learning for bug finding in the