Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implementing and Running a Secure Datalake from the Trenches

Implementing and Running a Secure Datalake from the Trenches

Talks from Hadoop Summit Europe 2015.

Rémy SAISSY

April 16, 2015
Tweet

More Decks by Rémy SAISSY

Other Decks in Technology

Transcript

  1. Implementing and running a secure datalake from the trenches Hervé

    Brunetaud, Orange France Rémy Saissy, OCTO Technology
  2. Orange in figures 4G in 10 countries @Orange with 70

    000 followers on Twitter more than 7 million fans across all of our local Facebook pages 2400 employees who volunteer in 32 countries for the Orange Foundation 10 million Orange Money customers in 13 countries more than 382,000 customers with fibre 7482 patents in our R&D portfolio 236 million customers worldwide the brand is 20 year-old 4 000 permanent contracts in France between 2013 and 2015 – of which 2500 are younger than 30 9 000 training programmes 1 000 new recruits in 2014 working on very high broadband (fibre and 4G) 450,000 km of submarine cables (enough to circumnavigate the earth 10 times!) 165 000 employees 102,000 in France 60th strongest brand in 2013 780 million Euros invested in research and innovation more than 1million visits on Orange.com
  3. key financial indicators for 2013 turnover broken down by region

    turnover broken down by activity €1.873 billion net profit €0.80* dividend per share *(proposed to Annual General Meeting of Shareholders on 27 may 2014) 9.8% Spain 7.4% Poland 18.1% rest of the world 3% services provided to operators 46.8% France €40.9 billion revenue 14.9% businesses €7,0 billion operational cash flow 17.9% businesses and services provided to operators 45.2% mobile 31.7% internet and fixed line 3.2% mobile equipment sales 2.0% other
  4. over 236 million customers worldwide… (+ 2,4% in one year)

    Our Group provides services for residential customers in 30 countries and for business customers in 220 countries and territories.
  5. 5 From startups to multi-national firms, OCTO gets involved wherever

    information technology plays a decisive role in transforming companies. OCTO in few words Continuous Improvement Sharing The pursuit of joy Fail fast Expertise The key is the team
  6. 6 IT consulting company Established in April 1998 190 employees

    19.5 million in turnover worldwide (2011) Purely organic growth (20% annually) Strong corporate culture Strong values OCTO ID NUMBERS 27% JUNIOR 33% SENIOR 40% DE CONFIRMÉS TURNOVER EMPLOYEES « We want to reproduce wherever possible what made us successful: a vision of IT, strong values and sharp skills. » INTERNATIONAL LOCATIONS EXPERIENCED OUR EXPEREINCED TEAM:
  7. 7 What we do ? We use technology and creativity

    to turn your ideas into reality IT CONSULTING AND EXPERTISE It is the product of an ambitious business vision turned reality thanks to a pragmatic use of technology. DESIGN OF INNOVATIVE APPLICATIONS We are committed to fostering the fruition of your ideas and needs, making them concrete so that you can start benefitting from them in just a few weeks. You can trust us with the implementation of your software products from start to finish. We can also help you to design better applications:
  8. before building the Hadoop platform, a QoS use case has

    been implemented like this… call centers detailed data aggregate s analytical CRM usage data analysts datawarehouse
  9. raw data extract and load filter, transform and aggregate business

    and operational apps the existing solution in terms of technologies… large storage aggregates
  10. … works fine, but we need more are there any

    new use cases based on that data? more historical depth ROI on Teradata not so sure…
  11. with Hadoop, the same use case is now implemented like

    this… call centers analysts usage data datawarehouse aggregate s analytical CRM detailed data
  12. raw data filter, transform and aggregate business and operational apps

    in terms of technologies, new questions are raising aggregates large storage ? ? ? extract and load
  13. 2 SQL access types needed ! batch oriented to process

    huge amounts of data to produce aggregates and perform data mining interactive access to detailed data given a predefined key 1 2
  14. good SQL support good batch capabilities poor interactive queries poor

    SQL Support interactive queries which SQL layer ? good SQL Support interactive queries poor batch capabilities
  15. raw data extract and load filter, transform and aggregate business

    and operational apps chosen technologies aggregates detailed data
  16. evolve from a single usecase architecture to a data lake

    §  challenges –  secure performance for everybody –  high security needs –  choice of data manipulation / visualization tools –  architecture availability
  17. availability : 10+ SPOFs identified how to deal with it?

    §  Ambari managed components §  "external" components
  18. 3rd party tool integration -  Local Unix Accounts -  Crontabs

    to keep Kerberos tokens valid Login using local Unix Accounts at tool startup, retrieve the cluster side Kerberos account for each Local Unix Account Hive JDBC driver relies automatically on the Kerberos token
  19. how to prevent users from accessing prohibited data? § main principles

    –  consider it on a per storage service basis –  model after the least common denominator –  manage at high level (database, namespaces, groups) Posix  Permissions  (UGO)   ACLs   ACLs   SQL  Auth   ACLs   not  working   ACLs   incompa>ble  with   Hive  +  bugs   SQL  Auth   not  working  
  20. how to securely let users use the platform? §  write

    standard documents, presentations, recommendations §  code support scripts to help projects and production teams §  communicate, follow people and projects §  help, explain to people, ensure standards are properly implemented
  21. consolidate §  SLA –  response time –  availability §  security

    –  Apache Ranger –  Apache Knox §  development methodology –  continuous integration –  development framework
  22. § streaming ingestion and processing –  Spark –  Storm §  new

    tools for end users –  data miners –  report consumers §  new development capabilities –  machine learning –  graph processing –  … unlock the possibilities