Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automating Data Classification using ActiveModel

Automating Data Classification using ActiveModel

If you had to go through PCI compliance, you would know how painful it is to audit and maintain data classification of any Personal Identifiable Information. In this talk, Rida will share how to use tools like Rails' ActiveModel to automate and maintain your classification as part of your application.

Rida Al Barazi

February 24, 2016
Tweet

More Decks by Rida Al Barazi

Other Decks in Technology

Transcript

  1. Data breach "A data breach is a security incident in

    which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so." - Data Breach (wikipedia)
  2. Data breach "A data breach is a security incident in

    which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so." - Data Breach (wikipedia)
  3. Data breach "A data breach is a security incident in

    which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so." - Data Breach (wikipedia)
  4. Data breach "A data breach is a security incident in

    which sensitive, protected or confidential data is copied, transmitted, viewed, stolen or used by an individual unauthorized to do so." - Data Breach (wikipedia)
  5. Types of Data • There are two types of data:

    • Data that someone wants to steal • Everything else
  6. The 3P PII = Personally identifiable information PHI = Protected

    health information pci = Payment card industry
  7. PII Information that can be used on its own or

    with other information to identify, contact, or locate a single person, or to identify an individual in context. - Wikipedia Personal Identifiable Information
  8. PHI Any information about health status, provision of health care,

    or payment for health care that is created or collected by a "Covered Entity" and can be linked to a specific individual. - Wikipedia Protected Health Information
  9. PHI Protected by Health Insurance Portability and Accountability Act (HIPAA)

    in USA The Personal Health Information Protection Act (PHIPA) in Canada Protected Health Information
  10. PCI - DSS A proprietary information security standard for organizations

    that handle branded credit cards from the major card schemes including Visa, MasterCard, American Express, Discover, and JCB. - Wikipedia The Payment Card Industry Data Security Standard
  11. PCI - DSS Enforced by the major card brands The

    Payment Card Industry Data Security Standard
  12. Data is normally classified based on • It’s nature: Medical,

    Financial, Personal. • The laws, policies and regulations • Organization and business needs
  13. How? 1- Define the classification levels. Example: Prohibited, Restricted, Public.

    2- Define the level of security for each level (The implementation) 3- Define how to classify data within each level. Normally you do that based on data type (3Ps) and threat level.
  14. Example Restricted: Data should be classified as Restricted when the

    unauthorized disclosure, alteration or destruction of that data could cause a significant level of risk to the University or its affiliates. Examples of Restricted data include data protected by state or federal privacy regulations and data protected by confidentiality agreements. The highest level of security controls should be applied to Restricted data. Private: Data should be classified as Private when the unauthorized disclosure, alteration or destruction of that data could result in a moderate level of risk to the University or its affiliates. By default, all Institutional Data that is not explicitly classified as Restricted or Public data should be treated as Private data. A reasonable level of security controls should be applied to Private data. Public: Data should be classified as Public when the unauthorized disclosure, alteration or destruction of that data would results in little or no risk to the University and its affiliates. Examples of Public data include press releases, course information and research publications. While little or no controls are required to protect the confidentiality of Public data, some level of control is required to prevent unauthorized modification or destruction of Public data.
  15. Database backed applications • One database with one user •

    Different user types: End Users / Admins / Support • Authentication and Authorization isn’t built in
  16. spreadsheets • Manual: We had to manually maintain the spreadsheet.

    • Integrity: We lost integrity really fast. When developers forgot to update the spreadsheet after a schema change. • No enforcement. Problems
  17. Automation • Rake task to generate the CSV file. Auto

    generates after migrations. • We added a classification YAML file for each database table. • We ran it as part of our CI for enforcement
  18. Automation class DataClassificationRules CLASSIFICATIONS = [ UNCLASSIFIED = 'UNCLASSIFIED', UNRESTRICTED

    = 'Unrestricted', CONFIDENTIAL = 'Confidential', RESTRICTED = 'Restricted', PROHIBITED = 'Prohibited' ] COLUMN_CLASSIFICATIONS_LIST = { UNRESTRICTED => %w(), CONFIDENTIAL => %w(id country state province city), RESTRICTED => %w(description date_of_birth first_name last_name), PROHIBITED => %w(sin sxn ssn tin routing_number account_number) } def self.classify(column_name) return CONFIDENTIAL if /(_at|_on|_id)$/.match(column_name) COLUMN_CLASSIFICATIONS_LIST.each do |level, list| return level if list.include?(column_name) end return UNCLASSIFIED end Classification Rules
  19. Automation ActiveRecord::Base.connection.tables.each do |table_name| columns = ActiveRecord::Base.connection.columns(table_name) raw_columns = columns.each_with_object({})

    do |column, out| out[column.name] = DataClassificationRules.classify(column.name) end file_path = DATA_CLASSIFICATION_PATH.join("#{table_name}.yml") classified_columns = if File.exists?(file_path) file_data = YAML.load_file(file_path) file_data[table_name] else Hash.new end all_columns = raw_columns.merge(classified_columns) File.open(DATA_CLASSIFICATION_PATH.join("#{table_name}.yml"), 'w') do |file| file << { table_name => all_columns }.to_yaml end end Idempotent export functionality
  20. Automation --- employees: id: Confidential employer_id: Confidential last_name: Restricted sin:

    Prohibited address: Restricted created_at: Confidential updated_at: Confidential birthday: Restricted date_of_hire: Confidential province: Confidential status: Confidential city: Confidential postal: Confidential first_name: Restricted email: Restricted employee_wid: Confidential middle_initial: Confidential deleted_on: Confidential ignore_gnis_precision: Confidential YAML file per database table
  21. Automation Dir["#{DATA_CLASSIFICATION_PATH}/*.yml"].each do |database_file| YAML.load_file(database_file).each do |table_name, columns| columns.each do

    |column_name, classification| expect(classification).not_to eq(DataClassificationRules::UNCLASSIFIED) end end end CI enforcement
  22. Automation • Rake task to generate the CSV file. Auto

    generates after migrations. • We added a classification YAML file for each database table. • We ran it as part of our CI for enforcement
  23. • Manual: We had to manually maintain the spreadsheet. •

    Integrity: We lost integrity really fast. When developers forgot to update the spreadsheet after a schema change. • Even though we enforced classifying the data, we did not enforce the implementation. Problems Automation
  24. enforcement • We moved the classification to the model. •

    Killed the YAML files. Maintain the CSV generation • Centralized the implementation.
  25. enforcement class Employee < ActiveRecord::Base restricted_attribute :first_name restricted_attribute :last_name prohibited_attribute

    :sin restricted_attribute :address restricted_attribute :birthday restricted_attribute :email Model level classification
  26. enforcement module HasAttributeClassifications extend ActiveSupport::Concern included do class_attribute :attribute_classifications, instance_accessor:

    false self.attribute_classifications = AttributeClassification.new end module ClassMethods def classify_attribute(column_name, classification) self.attribute_classifications = attribute_classifications.merge(column_name => classification) end def prohibited_attribute(column_name) classify_attribute(column_name, :prohibited) attr_encrypted column_name, type: :json, compress: true validates :"encrypted_#{column_name}", symmetric_encryption: true end def restricted_attribute(column_name) classify_attribute(column_name, :restricted) end Centralized implementation
  27. enforcement • We moved the classification to the model. •

    Killed the YAML files. Maintain the CSV generation • Centralized the implementation.
  28. results • Developer driven classification and enforcement • Maintainable and

    versioned data classification • Extensible programatic solution