Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Hadoop Security

Introduction to Hadoop Security

Hadoop is central to a lot of Enterprises wanting to to store and process their sensitive data. Securing Hadoop deployments is more important than ever before. This talk will illustrate practical and production tested techniques on how you can protect Hadoop data from unauthorized access and modification.

Vishnu Vettrivel

August 02, 2016
Tweet

More Decks by Vishnu Vettrivel

Other Decks in Technology

Transcript

  1. Hadoop is everywhere, Security not so much • Hadoop is

    central to a lot of Enterprises • 50% of Hadoop deployments store sensitive data • However not a lot of deployment implement Security • Even less do it right • Data lakes don’t need to be free for all.
  2. HDFS Security: Not an Oxymoron • Fake vs Real •

    Kerberos is the only option • ACLs are your friend
  3. Apache Spark Security • Why Yarn is your friend •

    Spark Security basics • TLS for Spark • Kerberos and User Impersonation
  4. Data Protection • Data in Transit • Data at Rest

    • Configuration Data • Data Lineaging
  5. Hue is your (Security) Daddy ? • Single Sign-on for

    Hue • Managing Policies using Hue • Groups and Users • Apps and more
  6. Kerb(eros) your Enthusiasm • Overview of Kerberos • Usage in

    Hadoop Authentication • Hadoop impersonation • LDAP based directory server integration
  7. HDFS Encryption • Transparent vs Block Level • Know your

    KMS • Impact on tools and apps • Other Gotchas
  8. Data Ingestion Security • Know your Zones • Groups and

    Others • Default Umask • Sharing Data across Zones.
  9. End-user applications Security. • Perimeter Security • Token based Authentication

    • Search and NoSQL access • Centralized Security Access Manager.
  10. Case Study: Data Warehouse • Best Practices • Gotchas •

    Tools and Apps • Realtime Querying • Visualization tools
  11. Conclusion • Security is Hard • Hadoop is not one

    thing • Different Access patterns, Different Security • Consider Data flow • Consider Threat modeling