• Need Linux Container Executor • Small setuid executable • Enables isolation between users • Also provides local file protection • In the works, Docker containers • Stronger task killing • Does not require user accounts
Page 5 • Need Kerberos • Provides mutual authentication • User logs in to get ticket • Delegation tokens authenticate jobs • Good for 1 day, renewable for 7 • Renewed and cancelled by YARN
6 • Need Kerberos • HDFS permissions are critical • ACLs add additional flexibility • Define user-to-group mapping • Need group directories • Minimize user or global spaces
Page 7 • Need Kerberos integration • Each service needs a keytab • Hadoop’s UserGroupInformation • loginFromKeytab • When you are testing, test for longer than 24 hours
Need Service Level Authorization • Which users can use each service • Apache Ranger simplifies this • Need firewall around cluster • Never let users log in to masters
10 • Super user can watch network traffic • Need wire encryption • RPC encryption • Data transfer • MapReduce Shuffle • Very expensive • Generally not recommended
11 • Intranet hard to control • Need HTTPS encryption for outside • Set up SSL certificates • Create a master certificate to sign all of the others. • Have users add master certificate to their browsers
13 • Attacker is an Hadoop Admin • But not root • Need HDFS Encryption Zones • Directory sub-tree encryption • Each file gets unique key • Client decrypts data
Encryption in HDFS NN NN A B C D HDFS Client HDFS Client A B C D A B C D DN DN DN Benefits Selective encryption of relevant files/folders Fine grained access controls Transparent to end application w/o changes Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
• Need to protect sensitive columns • Example - Personal information • Restrict rows based on filters • Example – Different Regions • HDFS permissions don’t work • Need Hive Server 2
• Configure Ranger security policies • Or use Hive SQL Standard Auth • Restrict permissions in HDFS • Only Hive has access • All normal users use Hive Server 2
• Ranger policies allow you to control access with ABAC. • Combination of the subject, action, resource, and environment • Uses descriptive attributes: AD group, Atlas-based tags or classifications, geo-location, etc. • Consistent with NIST 800-162 • Avoid role proliferation and manageability issues
• Some users need limited access • Very difficult problem • Ranger’s dynamic masking • Redact – Replace character sets • Hash – Replace with hash • Tokenization – Replace with ID • User defined
Encrypts just the sensitive columns • Columns may use different keys • Uses Ranger KMS • Local key for each column & file • Unencrypted data can be statically masked • Follow progress in ORC-14.
• Require fine-grain access control • Hive has created LLAP • Live, Long & Process • Caches hot columns and partitions • Runs sub-query on daemons • Created an LLAP Context for Spark
25 • As data and clusters grow need to track the data. • Atlas provides Data Governance • What data do you have? • Where did it come from? • Extensible data modelling • Integrates with other system
curate data assets for easier discovery • Associate context with data– Governance, Security, Business GOVERNANCE GOVERNANCE SECURITY SECURITY BUSINESS BUSINESS
as ‘PII’ and name starting with ‘prov’ Filter by Data Asset type Filter by Data Asset type Filter by Classification Filter by Classification Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum* Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum*
Filter by Data asset type Search for a hive_table named ‘employees’ and owner ‘hive’ DSL search with SQL like syntax Select columns from impressions table in raw database hive_column where table.name=‘impressions’ and table.db.name = ‘raw’ DSL query string DSL query string
Think about security in terms of threats. • Think holistically about security. • Consider encryption and masking. • Create Data Catalog with Atlas. • Identify and classify data. • Understand data propagation.