Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The future of Hadoop security and its evolution by Alejandro González at Big Data Spain 2017

The future of Hadoop security and its evolution by Alejandro González at Big Data Spain 2017

This talk defines the state of the art for Hadoop security and describes the planned security features to be added. Hadoop initially was not designed with security in mind, multiple security features had being developed for some components without designing and integrated security architecture.


Big Data Spain 2017
16th - 17th November Kinépolis Madrid


Big Data Spain

November 24, 2017


  1. None
  2. The future of Hadoop security and its evolution Alex Gonzalez

    | Cloudera
  3. Do you know how much a security breach could cost?

  4. Security breaches How much stolen records cost? Over of organizations

    report having been compromised by a successful cyber attack in the past 12
  5. 2,900 companies surveyed reported losing customers Beth Jacob CEO Steep

    fines Lost jobs Lost customers Resigned
  6. Security matters

  7. What Hadoop users want to do? Protect Be in compliance

    Anonymize data Migrate to the cloud
  8. Authorization Auditing Encryption at REST Kerberos HW - AUTH Authentication

    methods What the community has done? In-transit Encryption HTTPS Sentry-AUTHZ Authentication Zookeeper Hue Hive Impala Solr HDFS KMS HDFS Flume Oozie HBase Impala Hue Sqoop Hive Impala HBase Accum Sentry HDFS KMS Hue HDFS Logs MapReduce YARN MapReduce YARN HDFS HBase Zookeeper Oozie SPENEGO LDAP SAML Zookeeper HBase Accumulo Hue HDFS UI None Security is very confusing!
  9. What does the community wants to do? Secure by default

    Regulatory compliance Vulnerability free software
  10. Financial institution In transit encryption Authorization Authentication At rest encryption

    Key management
  11. Hadoop in Secure Mode Authentication Data confidentiality Configuration

  12. Security dimensions Perimeter Guarding access to the cluster itself Data

    Protecting data in the cluster from unauthorized visibility Access Defining what users and applications can do with data Visibility Reporting on where data came from and how it’s being used KERBEROS AD LDAP SAML PAM SSSD Ranger Sentry HDFS ACL HDFS Enc Key Management In transit Enc Atlas
  13. What’s next?

  14. Scaling security administration Centralize identity management Common security across multiple

    clusters in the cloud CISO visibility Attribute based access control (ABAC) Single sign on Federating multiple directories Security policies associated with data in object stores Centralized security dashboards
  15. Attribute based access control (ABAC) Scaling security administration

  16. ABAC User Informational asset Environment Subject attributes Environmental attributes Resource

    and action attributes ABAC authorization engine Sentry RS Permit users to … when … if … or … unless ... Policy Permit Deny
  17. ABAC benefits Fewer security policies per object Central policies enforced

    while still delegating administration Better protection since more factors can be used Major canadian bank, Large global bank Large healthcare provider
  18. Phase 1: Single sign on (SSO) Centralize identity management

  19. Authentication Service HBase ... Hue

  20. SSO customer benefits Improves security UX Increases productivity Strengthen security

    Reduces support costs Improves auditing In the long term engineering effort is reduced
  21. Centralized security for cloud Common security across multiple clusters in

    the cloud
  22. Cluster 1 Cluster 2 Cluster 3 Cluster 4 Sentry Sentry

    Sentry Sentry Sentry KMS KMS KMS KMS KMS HMS
  23. User benefits Reduced cost of ownership Increased end-user productivity Faster

  24. Security matters

  25. Thank you!