Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Martínez at Big Data Spain 2017

Attacking Machine Learning used in AntiVirus with RL Datahack

# whoami Rubén Martínez Sánchez • Twitter: @eldarsilver • Computer
Engineer (Universidad Politécnica Madrid) • Security Researcher (Pentester) • Certified Etical Hacker (CEH) • Member of MundoHacker (TV Show) • Master Data Science Datahack • Cloudera Developer Training for Apache Spark • Cloudera Developer Training for Apache Hadoop

Agenda# ls() • Static Malware Analysis • Reinforcement Learning (RL)
• Antivirus Evasion using RL • Demo Antivirus Evasion using RL

# cat Static_Malware_Analysis • Definition  Static Malware Analysis is
usually performed by dissecting the different resources of the binary file without executing it and studying each component. The binary file can also be disassembled (or reverse engineered) using a disassembler such as IDA or radare. (Wikipedia)  Search for signatures in the executable.

# cat Static_Malware_Analysis • Portable Executable (PE)  The Portable
Executable (PE) file format is a data structure that contains the information necessary for the Windows OS loader to manage the wrapped executable code.  PE File Format by Saurabh & Chinmaya

# cat Reinforcement_Learning • Definitions  A Reinforcement Learning model
consists of an angent and an environment.  For each turn, an agent receives a state and may choose one from a set of actions .  The policy is the agent’s behavior, i.e., a mapping from states to actions .  The agent receives the next state and a scalar reward .  http://www.ausy.tu-darmstadt.de/Research/Research Α

# cat Reinforcement_Learning • Definitions  Immediate rewards are generally
not very helpful while learning a game. So, what we should aim for is long term rewards.  The long term reward of step t will be:  The agent aims to maximize the expectation of such long term return from each state.  The parameter is the discount factor that defines the weight of distant rewards in relation to those obtained sooner.  The discounting by ensured that this sum is finite.

# cat Reinforcement_Learning • Q value  The optimal action-value
function: Q value  A Neural Network will be used to approximate this function.  Next we can define the policy to choose an action.  The Loss function to update the Network: http://web.stanford.edu/class/cs20si/lectures/slides_14.pdf

# cat Reinforcement_Learning • Actor-Critic Algorithms  The actor produces
an action given the current state of the environment.  The critic produces a TD (Temporal-Difference) error signal given the state and resultant reward.  If the critic is estimating the action-value function Q(s,a), it will also need the output of the actor.  The output of the critic drives learning in both the actor and the critic.  In Deep Reinforcement Learning, neural networks can be used to represent the actor and critic structures.

# cat Antivirus_Evasion_Using_RL • Overview  The environment → the
malware sample.  The environment emits the state in the form of a 2350-dimensional feature vector:  PE header metadata.  Section metadata: section name, size and characteristics.  Import & Export Table metadata.  Counts of human readable strings.  Byte histogram.

# cat Antivirus_Evasion_Using_RL • Overview  The agent → the
algorithm used to change the environment.  The agent sends actions to the environment, and the environment replies with observations and rewards (that is, a score).  There will be an anti-malware engine (the attack target).  Each step will provide:  Reward: value of reward scored by the previous action. 10.0 (pass), 0.0 (fail).  Observation space (object): feature vector summarizing the composition of the malware sample.  Done(bool): Determines whether environment needs to be reset; True means episode was successful.

# cat Antivirus_Evasion_Using_RL • Overview  The actions that can
be performed on a malware sample in our environment consist of the following binary manipulations: * append_zero * append_random_ascii * append_random_bytes * remove_signature * upx_pack * upx_unpack * change_section_names_from_list * change_section_names_to random * modify_export * remove_debug * break_optional_header_checksum  Over time, the agent learns which combinations lead to the highest rewards, or learns a policy.

#python Demo_Effect.py 

Attacking Machine Learning used in AntiVirus wi...

Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Martínez at Big Data Spain 2017

Big Data Spain

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript

Attacking Machine Learning used in AntiVirus with RL Datahack

# whoami Rubén Martínez Sánchez • Twitter: @eldarsilver • Computer

Agenda# ls() • Static Malware Analysis • Reinforcement Learning (RL)

# cat Static_Malware_Analysis • Definition  Static Malware Analysis is

# cat Static_Malware_Analysis • Portable Executable (PE)  The Portable

# cat Reinforcement_Learning • Definitions  A Reinforcement Learning model

# cat Reinforcement_Learning • Definitions  Immediate rewards are generally

# cat Reinforcement_Learning • Q value  The optimal action-value

# cat Reinforcement_Learning • Actor-Critic Algorithms  The actor produces

# cat Antivirus_Evasion_Using_RL • Overview  The environment → the

# cat Antivirus_Evasion_Using_RL • Overview  The agent → the

# cat Antivirus_Evasion_Using_RL • Overview  The actions that can

#python Demo_Effect.py 