Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Advice to build a Data Driven Company (Hadoop Summit 2016)

Practical Advice to build a Data Driven Company (Hadoop Summit 2016)

Session i had at the 2016 Hadoop Summit in Dublin. Original pitch : One Hadoop later, do we really feel Data Driven? What has changed since the BI Datawarehouse? What is the ROI of Big Data projects? If the answers to these questions are not obvious, it's because we focus too much on the technical capabilities of Big Data. It's not just about choosing the right technology : being data driven is more a matter of organization. How to generate new product ideas based on data? Where are the data scientists in the company and how are they connected to business and IT? What are the Data Science team rituals? What is the company's culture about data? What are the business objectives and benefits of having Hadoop versus Spark versus in-memory Python? I will focus on very practical experience returns showing how companies can leverage Big Data in an incredibly lean manner. The datalake concept, datalabs, scaled agile, devops and machine learning will be key topics in this talk.

Simon Maby

April 14, 2016
Tweet

More Decks by Simon Maby

Other Decks in Technology

Transcript

  1. 50 AVENUE DES CHAMPS-ÉLYSÉES 75008 PARIS > FRANCE > WWW.OCTO.COM

    HADOOP SUMMIT 2016 - DUBLIN PRACTICAL ADVICE TO BUILD A DATA DRIVEN COMPANY Simon MABY @simonmaby
  2. 3 A continuous improvement of all business processes, through a

    smart use of the data, all the time, everywhere and to all purposes OCTO TECHNOLOGY > THERE IS A BETTER WAY
  3. 4 BEING DATA DRIVEN IS BEING LEAN OCTO TECHNOLOGY >

    THERE IS A BETTER WAY IDEA CODE DATA BUILD MEASURE LEARN
  4. 5 REQUIREMENTS OCTO TECHNOLOGY > THERE IS A BETTER WAY

    IDEA CODE DATA Data must be easily accessible Business must be aware of opportunities to use algorithms Datascience projects should have the lowest time to market possible
  5. 8 Your Datalake is a service to your company. It

    should be managed like a startup Your employees are you first clients. The more they use it, the more you are Data Driven OCTO TECHNOLOGY > THERE IS A BETTER WAY
  6. 9 FOCUS ON USABILITY OVER ARCHITECTURE OCTO TECHNOLOGY > THERE

    IS A BETTER WAY Services Datalake Datalake Team : OPS - DEVs - DESIGNERS End Users and projects Design services for usability and grant support Gather requirements and usage metrics
  7. 10 FOCUS ON USABILITY OVER ARCHITECTURE : EXAMPLES  How

    simple is it to share data to other projects?  How simple is it to suscribe to a data feed?  Is it possible to run a full search on available datasets?  Is it possible to ask other projects for details about their data through a social network?  Auto-completion over SQL request from other projects?  Bookmarking, sharing, upvoting datasets, tagging metadata… OCTO TECHNOLOGY > THERE IS A BETTER WAY
  8. 12 CODE Datascience projects should have the lowest time to

    market possible OCTO TECHNOLOGY > THERE IS A BETTER WAY
  9. 13 EXPLORATION VERSUS PREDICTION OCTO TECHNOLOGY > THERE IS A

    BETTER WAY Explore as quickly as possible Deliver frequently in production
  10. 14 OCTO TECHNOLOGY > THERE IS A BETTER WAY (Not

    so) Big Data Infrastructure (For exploration)
  11. 15 WHAT IF WE GIVE LESS DATA TO OUR ALGORITHMS?

    OCTO TECHNOLOGY > THERE IS A BETTER WAY Cf. Zoltan Prekopcsak, Hadoop Summit EU. 2015
  12. 16 FEATURE TEAMS TO DELIVER CODE READY FOR PRODUCTION OCTO

    TECHNOLOGY > THERE IS A BETTER WAY Business rep. Developer Data Sc.
  13. 17 MESSAGE BROKER TO REUSE DATA FLOWS OCTO TECHNOLOGY >

    THERE IS A BETTER WAY App A App B DW DB X App A App B DW DB X Kafka App C ? ? ? - Custom dev - Data formats? - SLA? - Scheduling? … - Standard format - Prod Ready - Exploration and prod will share same formats
  14. 18 KAPPA ARCHITECTURE : EVERYTHING IS A STREAM OCTO TECHNOLOGY

    > THERE IS A BETTER WAY Stream Data Stream Processing Serving DB Topic Streaming app v1 Streaming app v2 Result data v1 Result data v2 Kafka  Batch jobs are just historical data you send into a streaming app  Application code is decoupled from technical requirements  One shot exploration code respecting the stream abstraction can go in production easily
  15. 20 IDEAS Business must be aware of the opportunities to

    use algorithms OCTO TECHNOLOGY > THERE IS A BETTER WAY
  16. 21 MIX THESE PEOPLE OCTO TECHNOLOGY > THERE IS A

    BETTER WAY Business Knows what is valuable Data Scientist Knows what is feasible Culture & Collaboration
  17. 22 FEATURE TEAMS ONCE AGAIN OCTO TECHNOLOGY > THERE IS

    A BETTER WAY Business rep. Developer Data Sc.
  18. 24 EXPLAIN THEM THAT MACHINE LEARNING IS EASY (IT’S MAGIC)

    OCTO TECHNOLOGY > THERE IS A BETTER WAY
  19. 25 SPEND TIME TOGETHER  Show them the data 

    Pair Programming  Swap roles for one day OCTO TECHNOLOGY > THERE IS A BETTER WAY
  20. 27 OCTO TECHNOLOGY > THERE IS A BETTER WAY Story

    : Octo Datascience Competition Platform
  21. HOW WIDELY DATADRIVEN IS YOUR COMPANY?  Everybody is willing

    to make value out of the available data  Data serves not only the core business but every single function  Data is used in day-to-day activity in real- time OCTO TECHNOLOGY > THERE IS A BETTER WAY
  22. HOW DEEPLY DATADRIVEN IS YOUR COMPANY? OCTO TECHNOLOGY > THERE

    IS A BETTER WAY  You are using cutting edges algorithms to automate processes  You are used to A/B testing based on data every week  You cross multiple data sources to build insights and models