Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keeping your Enterprise’s Big Data Secure by Ow...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain 2017

Security is a tradeoff between usability and safety and should be driven by the perceived threats.

https://www.bigdataspain.org/2017/talk/keeping-enterprises-big-data-secure

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Avatar for Big Data Spain

Big Data Spain

November 30, 2017

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. © Hortonworks Inc. 2017 Security Page 2 • What are

    the important threats? • How to minimize attack surfaces? • How do we control who has access? • What data do we have?
  2. © Hortonworks Inc. 2017 Threat: Accidental File Deletion Page 3

    • Need HDFS permissions • Trash needs to be enabled too! • Accountability • Audit log is very important • Avoid group accounts
  3. © Hortonworks Inc. 2017 Threat: Accidental Killing Tasks Page 4

    • Need Linux Container Executor • Small setuid executable • Enables isolation between users • Also provides local file protection • In the works, Docker containers • Stronger task killing • Does not require user accounts
  4. © Hortonworks Inc. 2017 Threat: Pretending to be a User

    Page 5 • Need Kerberos • Provides mutual authentication • User logs in to get ticket • Delegation tokens authenticate jobs • Good for 1 day, renewable for 7 • Renewed and cancelled by YARN
  5. © Hortonworks Inc. 2017 Threat: User accesses private data Page

    6 • Need Kerberos • HDFS permissions are critical • ACLs add additional flexibility • Define user-to-group mapping • Need group directories • Minimize user or global spaces
  6. © Hortonworks Inc. 2017 Threat: Pretending to be a Service

    Page 7 • Need Kerberos integration • Each service needs a keytab • Hadoop’s UserGroupInformation • loginFromKeytab • When you are testing, test for longer than 24 hours
  7. © Hortonworks Inc. 2017 Threat: Remote Access Page 8 •

    Need Service Level Authorization • Which users can use each service • Apache Ranger simplifies this • Need firewall around cluster • Never let users log in to masters
  8. © Hortonworks Inc. 2017 Threat: Eavesdropping Inside Data Center Page

    10 • Super user can watch network traffic • Need wire encryption • RPC encryption • Data transfer • MapReduce Shuffle • Very expensive • Generally not recommended
  9. © Hortonworks Inc. 2017 Threat: Eavesdropping Outside Data Center Page

    11 • Intranet hard to control • Need HTTPS encryption for outside • Set up SSL certificates • Create a master certificate to sign all of the others. • Have users add master certificate to their browsers
  10. © Hortonworks Inc. 2017 Threat: Physical access Page 12 •

    Very Rare • Attacker can get to physical box • Hopeless • Can remove hard drives • Includes access to retired drives • Need raw file system encryption
  11. © Hortonworks Inc. 2017 Threat: Attackers gets Hadoop Admin Page

    13 • Attacker is an Hadoop Admin • But not root • Need HDFS Encryption Zones • Directory sub-tree encryption • Each file gets unique key • Client decrypts data
  12. © Hortonworks Inc. 2017 HDFS Encryption Page 14 • Each

    zone has master key • Each file gets unique sub-key • HDFS stores sub-key encrypted with master key • Client uses sub-key to decrypt file • Noticeable performance impact
  13. © Hortonworks Inc. 2017 KeyProvider API Page 15 • Key

    management • Allows 3rd party plugins • Named keys • Key versions and key rolling • Ranger provides a Key Management Server
  14. © Hortonworks Inc. 2017 Ranger KMS Ranger KMS Transparent Data

    Encryption in HDFS NN NN A B C D HDFS Client HDFS Client A B C D A B C D DN DN DN Benefits  Selective encryption of relevant files/folders  Fine grained access controls  Transparent to end application w/o changes  Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
  15. © Hortonworks Inc. 2017 Threat: Sensitive Tabular Data Page 17

    • Need to protect sensitive columns • Example - Personal information • Restrict rows based on filters • Example – Different Regions • HDFS permissions don’t work • Need Hive Server 2
  16. © Hortonworks Inc. 2017 Configuring Hive Server 2 Page 20

    • Configure Ranger security policies • Or use Hive SQL Standard Auth • Restrict permissions in HDFS • Only Hive has access • All normal users use Hive Server 2
  17. © Hortonworks Inc. 2017 Attribute-Based Access Control (ABAC) Page 21

    • Ranger policies allow you to control access with ABAC. • Combination of the subject, action, resource, and environment • Uses descriptive attributes: AD group, Atlas-based tags or classifications, geo-location, etc. • Consistent with NIST 800-162 • Avoid role proliferation and manageability issues
  18. © Hortonworks Inc. 2017 Data Masking and Anonymization Page 22

    • Some users need limited access • Very difficult problem • Ranger’s dynamic masking • Redact – Replace character sets • Hash – Replace with hash • Tokenization – Replace with ID • User defined
  19. © Hortonworks Inc. 2017 ORC Column Encryption Page 23 •

    Encrypts just the sensitive columns • Columns may use different keys • Uses Ranger KMS • Local key for each column & file • Unencrypted data can be statically masked • Follow progress in ORC-14.
  20. © Hortonworks Inc. 2017 Tabular Access from Spark Page 24

    • Require fine-grain access control • Hive has created LLAP • Live, Long & Process • Caches hot columns and partitions • Runs sub-query on daemons • Created an LLAP Context for Spark
  21. © Hortonworks Inc. 2017 Apache Atlas – Data Governance Page

    25 • As data and clusters grow need to track the data. • Atlas provides Data Governance • What data do you have? • Where did it come from? • Extensible data modelling • Integrates with other system
  22. © Hortonworks Inc. 2017 Apache Atlas: Classification • Categorize and

    curate data assets for easier discovery • Associate context with data– Governance, Security, Business GOVERNANCE GOVERNANCE SECURITY SECURITY BUSINESS BUSINESS
  23. Metadata Catalog Search : Basic Search for a hive_table classified

    as ‘PII’ and name starting with ‘prov’ Filter by Data Asset type Filter by Data Asset type Filter by Classification Filter by Classification Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum* Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum*
  24. Metadata Catalog Search : Advanced Filter by Data asset type

    Filter by Data asset type Search for a hive_table named ‘employees’ and owner ‘hive’ DSL search with SQL like syntax Select columns from impressions table in raw database hive_column where table.name=‘impressions’ and table.db.name = ‘raw’ DSL query string DSL query string
  25. © Hortonworks Inc. 2017 Key Take Aways Page 30 •

    Think about security in terms of threats. • Think holistically about security. • Consider encryption and masking. • Create Data Catalog with Atlas. • Identify and classify data. • Understand data propagation.