Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain 2017

Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain 2017

Security is a tradeoff between usability and safety and should be driven by the perceived threats.

https://www.bigdataspain.org/2017/talk/keeping-enterprises-big-data-secure

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

November 30, 2017
Tweet

Transcript

  1. None
  2. © Hortonworks Inc. 2017 Keeping Your Enterprise’s Big Data Secure

    Nov 2017 Page 1 Owen O’Malley owen@hortonworks.com @owen_omalley
  3. © Hortonworks Inc. 2017 Security Page 2 • What are

    the important threats? • How to minimize attack surfaces? • How do we control who has access? • What data do we have?
  4. © Hortonworks Inc. 2017 Threat: Accidental File Deletion Page 3

    • Need HDFS permissions • Trash needs to be enabled too! • Accountability • Audit log is very important • Avoid group accounts
  5. © Hortonworks Inc. 2017 Threat: Accidental Killing Tasks Page 4

    • Need Linux Container Executor • Small setuid executable • Enables isolation between users • Also provides local file protection • In the works, Docker containers • Stronger task killing • Does not require user accounts
  6. © Hortonworks Inc. 2017 Threat: Pretending to be a User

    Page 5 • Need Kerberos • Provides mutual authentication • User logs in to get ticket • Delegation tokens authenticate jobs • Good for 1 day, renewable for 7 • Renewed and cancelled by YARN
  7. © Hortonworks Inc. 2017 Threat: User accesses private data Page

    6 • Need Kerberos • HDFS permissions are critical • ACLs add additional flexibility • Define user-to-group mapping • Need group directories • Minimize user or global spaces
  8. © Hortonworks Inc. 2017 Threat: Pretending to be a Service

    Page 7 • Need Kerberos integration • Each service needs a keytab • Hadoop’s UserGroupInformation • loginFromKeytab • When you are testing, test for longer than 24 hours
  9. © Hortonworks Inc. 2017 Threat: Remote Access Page 8 •

    Need Service Level Authorization • Which users can use each service • Apache Ranger simplifies this • Need firewall around cluster • Never let users log in to masters
  10. © Hortonworks Inc. 2017 Security Architecture Page 9

  11. © Hortonworks Inc. 2017 Threat: Eavesdropping Inside Data Center Page

    10 • Super user can watch network traffic • Need wire encryption • RPC encryption • Data transfer • MapReduce Shuffle • Very expensive • Generally not recommended
  12. © Hortonworks Inc. 2017 Threat: Eavesdropping Outside Data Center Page

    11 • Intranet hard to control • Need HTTPS encryption for outside • Set up SSL certificates • Create a master certificate to sign all of the others. • Have users add master certificate to their browsers
  13. © Hortonworks Inc. 2017 Threat: Physical access Page 12 •

    Very Rare • Attacker can get to physical box • Hopeless • Can remove hard drives • Includes access to retired drives • Need raw file system encryption
  14. © Hortonworks Inc. 2017 Threat: Attackers gets Hadoop Admin Page

    13 • Attacker is an Hadoop Admin • But not root • Need HDFS Encryption Zones • Directory sub-tree encryption • Each file gets unique key • Client decrypts data
  15. © Hortonworks Inc. 2017 HDFS Encryption Page 14 • Each

    zone has master key • Each file gets unique sub-key • HDFS stores sub-key encrypted with master key • Client uses sub-key to decrypt file • Noticeable performance impact
  16. © Hortonworks Inc. 2017 KeyProvider API Page 15 • Key

    management • Allows 3rd party plugins • Named keys • Key versions and key rolling • Ranger provides a Key Management Server
  17. © Hortonworks Inc. 2017 Ranger KMS Ranger KMS Transparent Data

    Encryption in HDFS NN NN A B C D HDFS Client HDFS Client A B C D A B C D DN DN DN Benefits  Selective encryption of relevant files/folders  Fine grained access controls  Transparent to end application w/o changes  Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
  18. © Hortonworks Inc. 2017 Threat: Sensitive Tabular Data Page 17

    • Need to protect sensitive columns • Example - Personal information • Restrict rows based on filters • Example – Different Regions • HDFS permissions don’t work • Need Hive Server 2
  19. © Hortonworks Inc. 2017 Hive Architecture with Storage-Based Auth Page

    18
  20. © Hortonworks Inc. 2017 Hive Architecture with Hive Server 2

    Page 19
  21. © Hortonworks Inc. 2017 Configuring Hive Server 2 Page 20

    • Configure Ranger security policies • Or use Hive SQL Standard Auth • Restrict permissions in HDFS • Only Hive has access • All normal users use Hive Server 2
  22. © Hortonworks Inc. 2017 Attribute-Based Access Control (ABAC) Page 21

    • Ranger policies allow you to control access with ABAC. • Combination of the subject, action, resource, and environment • Uses descriptive attributes: AD group, Atlas-based tags or classifications, geo-location, etc. • Consistent with NIST 800-162 • Avoid role proliferation and manageability issues
  23. © Hortonworks Inc. 2017 Data Masking and Anonymization Page 22

    • Some users need limited access • Very difficult problem • Ranger’s dynamic masking • Redact – Replace character sets • Hash – Replace with hash • Tokenization – Replace with ID • User defined
  24. © Hortonworks Inc. 2017 ORC Column Encryption Page 23 •

    Encrypts just the sensitive columns • Columns may use different keys • Uses Ranger KMS • Local key for each column & file • Unencrypted data can be statically masked • Follow progress in ORC-14.
  25. © Hortonworks Inc. 2017 Tabular Access from Spark Page 24

    • Require fine-grain access control • Hive has created LLAP • Live, Long & Process • Caches hot columns and partitions • Runs sub-query on daemons • Created an LLAP Context for Spark
  26. © Hortonworks Inc. 2017 Apache Atlas – Data Governance Page

    25 • As data and clusters grow need to track the data. • Atlas provides Data Governance • What data do you have? • Where did it come from? • Extensible data modelling • Integrates with other system
  27. © Hortonworks Inc. 2017 Apache Atlas: Lineage and Impact

  28. © Hortonworks Inc. 2017 Apache Atlas: Classification • Categorize and

    curate data assets for easier discovery • Associate context with data– Governance, Security, Business GOVERNANCE GOVERNANCE SECURITY SECURITY BUSINESS BUSINESS
  29. Metadata Catalog Search : Basic Search for a hive_table classified

    as ‘PII’ and name starting with ‘prov’ Filter by Data Asset type Filter by Data Asset type Filter by Classification Filter by Classification Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum* Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum*
  30. Metadata Catalog Search : Advanced Filter by Data asset type

    Filter by Data asset type Search for a hive_table named ‘employees’ and owner ‘hive’ DSL search with SQL like syntax Select columns from impressions table in raw database hive_column where table.name=‘impressions’ and table.db.name = ‘raw’ DSL query string DSL query string
  31. © Hortonworks Inc. 2017 Key Take Aways Page 30 •

    Think about security in terms of threats. • Think holistically about security. • Consider encryption and masking. • Create Data Catalog with Atlas. • Identify and classify data. • Understand data propagation.
  32. © Hortonworks Inc. 2017 Thank You! Page 31 Owen O’Malley

    @owen_omalley owen@hortonworks.com