exploits Vulnerabilities not known or fixed prior to the attack Specific Attack Signature CA-2001-26 (IE/IIS vulnerability used by Nimda Worm) GET /scripts/root.exe GET /scripts/..\xc1\x1c../winnt/system32/cmd.exe GET /scripts/..%35c../winnt/system32/cmd.exe Behavior Signature A class of JS attacks [Karanth et al. MSR 2010] unescape() replace() new_array() Stuxnet CVE-2010-2568 (Windows) CVE-2010-2729 (Windows) CVE-2010-2772 (Siemens) EMC RSA Attack CVE-2011-0609 (Flash) Operation Aurora CVE-2012-0779 (Flash) CVE-2012-1875 (IE) CVE-2012-1889 (MS XML) CVE-2012-1535 (Flash) 3
Observations Instruction Trace Function Call/Ret Trace System Call Trace … push mov sub call mov sysenter ret … … call call call ret ret ret call ret … … sysenter sysenter sysenter sysenter … Precise Program Trace Practical Program Trace
: a program trace • : the set of all normal program traces A string A formal language (deterministic or stochastic) Whether the string is accepted by the language Precision Scope of The Norm What level is the projection? How descriptive is the grammar? How hard to cheat the detection? Which practical traces at the projection level are selected as practical normal traces? Program anomaly detection is A program anomaly detection approach is
b g g b b … 3-grams: • bgg • ggb • gbb Two rules • ggb can follow bgg • gbb can follow ggb ggb bgg gbb b b Finite State Automaton (FSA) Regular Language int sum(int n){ if(n==0){ s1 (); s2 (); return n; }else return n+sum(n-1); } sum sum … sum … main + Pushdown Automaton (PDA) = 1 2 : ≥ 1 Context-free Language
that share the same practical program trace. Precision (Detection Capability) … a b c g e g a b b c f … … b g g b b … … m b n g s g m b b n t … Normal Anomalous This real-world PAD is lack of precision to detect the anomalous execution. More Descriptive Grammar -> Higher Precision Context-free languages are more descriptive than regular languages. Pushdown automaton approaches can better describe practical program traces than n-gram methods. PDA approaches can give more accurate detection than n-gram approaches.
g b 3 0.1 0.2 0.7 Probabilistic FSA g b g g b b FSA 1 g b g b b FSA 2 Same precision, different decision on anomaly detection. g g b b b g g g g b g g b g b g g b b b g g g g b g g b g b Λ = > Λ = The scope of the normal can be defined deterministically or probabilistically. One trace 1 2 All regular language traces One trace All regular language traces
N00014-13-1-0016. A PAD approach is a formal language Uniform framework to understand PAD precision Theoretical accuracy limit proved Future directions discussed