Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Impacto de Big Data en la empresa española

Impacto de Big Data en la empresa española

Dirigida a directivos y analistas de mediana y gran empresa, Big Data Spain celebró una charla previa a la conferencia de la segunda edición del 7y 8 de noviembre del 2013 en CIBALL, Madrid.

Oscar Méndez, co-fundador de Paradigma, www.paradigmatecnologico.com habló de Big Data desde un punto de vista de negocio, y despejó dudas acerca del coste y recursos necesarios para aprovechar esta tecnología.

Las plataformas v2.0 post-Hadoop permiten el despligue rápido y simple de herramientas integradas de data mining, data processing, data analysis y data visualization. Los avances de los últimos 12 meses dejan atrás las limitaciones de sistemas de Business Intelligence tradicionales.

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

October 29, 2013
Tweet

Transcript

  1. BIG DATA IN BUSINESS

  2. Is it a real need or just trendy? Why does

    it apply to my case? Big Data
  3. Petabytes: Google 300 PB, facebook: 45 PB, Yahoo! 180 PB

    Exabytes: U.S. healthcare Zetabytes: 2011, 1.8 ZB created. World Information 9.57 ZB YottaByte, Brontobyte, GeopByte to be reached I do not have such a big volume of data A big European company = Terabytes
  4. But could or will have it: Ever increasing amount of

    data, and more heterogeneous: Ubiquity, mobility, geolocation, social networks, internet, sensors, M2M CRMs, Call Centers, Emails, Documents, logs, voice…
  5. "There were 5 exabytes of information created by the entire

    world between the dawn of civilization and 2003. Now that same amount is created every two days." Google Ceo Eric Schmidt
  6. Unstructured or semi structured data, equal to 85% of available

    data, is not used by companies This represent the new Fuel for companies
  7. 83% of the surveyed companies were able to do things

    with Big Data that seemed impossible to achieve before “The art of possible” “Impossible is not a fact, it’s an opinion”
  8. Value and real ROI are the best KPIs •Increase of

    client acquisitions • Resource optimization • Increase in sales • Customer loyalty
  9. You can’t stay stuck in old paradigms

  10. When to use it?

  11. None
  12. None
  13. None
  14. None
  15. Extract value from data in any point of their life

    cycle • Past: Stored data, Batch mode • Present: Current data flows, Real time • Future: Data and future actions, Predictive
  16. Big volume of data Get value from Unstructured data Get

    value from external data Need for time or cost processsing reduction Need for Data streaming analysis in real time Algorithms, prediction or interactive analysis Transform data into insights and value Transformation to a Data driven company
  17. Customer Pain

  18. “I know I have to change to Big Data but…”

    How do I start to use with? When? Which technology? How do I acquire the knowledge?
  19. How to use it?

  20. Iterative and Cyclical Choose a particular use case with a

    clear ROI and time and budget limits vs Big Bang Avoid building a Big Data generic system and then implementing projects over them
  21. Which Technology?

  22. Bigtable From Big Data 1.0 To Big Data 2.0 Big

    query F1 12 YEARS GAP A Technological Change
  23. ∙ Up to 100x faster than Big Data 1.0 ∙

    Interactive analysis ∙ NoSQL with SQL Interface ∙ No need to change previous way of work CUSTOMER SOLUTION Big Data 2.0
  24. Cloudera CDH4* Hortonworks HDP* EMC Greenplum Datastax Platform EMC Pivotal

    HD Microsoft HDInsight MapR M3-M5-M7 Hadapt platform IBM Inphosphere Biginsight Intel Hadoop Open Source Closed Closed based on Open Source Which technology? Cloudera Impala Hortonworks Stinger MapR Apache Drill Google Big Query Amazon EMR _& Red shift BIG DATA 2.0 BIG DATA 1.0 Stratio NoSQL C-Store Basho Riak VMWare Redis Mencache Apache HBASE Apache Cassandra Apache CouchDB VoltDB Voldemort Espresso HP Vertica Apache Flume Apache S4 Aurora Kafka Storm Scribe Stream Processing StreamBase Platform IBM Inphosphere Streams Hstreaming Platform EsperTech ESPER SQLStream Platform Graph database NEO Techonology Neo4j* Apache Giraph FlockDB Almacenamiento Cassandra FS Apache HDFS EMC Isilon OnFS
  25. Batch of new technologies that allow us to extract value

    out of a dataset which, due to it’s volume, variety or velocity, was not previously exploited From Big Data 1.0 “Set of new technologies that extract value from all the available data of a company” To Big Data 2.0
  26. Use Cases

  27. The Bubble filter

  28. You must enter in the user bubble

  29. Description: Recommendation Engine based not only in the purchase history

    of the customer, but also in their navigation Advantages: Increase in clickthrough Increasing Conversions Increase in sales Antena 3, nubeox : Big Data Recommendation engine Monitoring of Streaming Videos
  30. +160% clicks vs. one size fits all +79% clicks vs.

    randomly selected +43% clicks vs. editor selected Recommended links News Interests Top Searches Description: Customizing homepages based on user navigation Analysis and customization of the homepage and site in real time for each user based on their browsing Modification of contents, highlights, ads, in real time based on user history Advantages: Over 300% increase in clickthrough Creating millions of web pages in real time Increasing Conversions Increase in sales Cost ten times lower than other solutions Customizing Web Sites: Behavioural Customization
  31. Description: Newsletter development, email-marketing or any other sent material segmented

    by individual preferences Analyzes and takes into account: • Financial information and user data • Navigation and usage information from previous marketing shipments • Mobile app data (GPS, payments, browsing of offers…) • Users’ information from the social networks Advantages: Increased clickthrough Increase in conversions and sales Natural language processing – semantics and sentiments Combines private and public data Personalized Marketing with DataShake integration
  32. Description: Complementing the internal data of a company by combining

    the structured and the unstructured data, with the data generated by the web and social networks, allows us to determine the validity of the data of our brand, product or company. The comparison and analysis of internal and external data (web) increases the value of our data and allows us to gain a competitive advantage over our competitors. Advantages:  It allows sales improvement. Improves loyalty. Increases Conversions. Detects errors or data manipulation.  SEO improvement with regards to the users and the public data. Improves marketing and product boosting with regards to trends. Complement private structured data with unstructured and public data Page 32 Big Data
  33. Description: Creation and/or complementation of BI systems and data analytics

    ETL tools and data uploading with a much higher volume than the traditional ones Capacity for analysis and visualization of all types of data, including graphs and new data types Advantages: Ability to work with larger datasets without the need to add or delete Much faster and reliable systems Massive reduction in cost (M € versus k €) Natural language processing – semantics and sentiments A possibility to combine internal data with external data (private and public data) BI and data analytics
  34. Description: Collect mobile data, anonymised and aggregated, to understand how

    segments of the population collectively behave. Trace trends and the behaviours of crowds, not individuals. Use this insight to enlighten the space between organisations and their users, enabling them to improve their propositions, and businesses. Focus: By being able to measure real behaviour, in near real-time, 24/7, 365 days a year, we can show the actual impact on society, therefore enabling businesses and local government to make better decisions. Telefónica Dynamic Insights (Smart Steps)
  35. Description: Analysis of large volumes of data, logs, security systems,

    transactional systems Faster correlation mechanisms and machine learning algorithms allow early detection of attacks and security risks with extra care to false positives Internal fraud detection analyzing data and events from applications and risk operations Advantages: Combines data from transactional systems with the SIEM to help fight fraud Tracks and identifies new fraud methods and trends via user reviews Fraud detection techniques specified through the use of built-in patterns Much larger data volumes and much higher velocity Combines private and public data Security and fraud detection
  36. Description: The Remote Maintenance & Monitoring System (RMMS), provide a

    powerful, scalable and flexible SCADA system to perform and wide range of tasks required by CNS agents such as maintenance, supervision, configuration and operation. Integration of different systems and equipment shall be possible and straightforward using open standard protocols, real time monitoring, data storage, testing, reporting, events notification,… Focus: The main task of the RMMS is to provide complete access to the equipment supervised in order to monitor every single available parameter as a mean of avoiding personnel mobilization to the remote location. Different levels of control over the system are also provided to cover the requirements of supervision, maintenance and control. Five main elements compose the RMM system: • RCSU: Remote Control and Status Unit. • TP: Tower Panel. • RMM: Remote Management & Monitoring. • LMT/RMT: Local / Remote Management Terminal. • CMMS: Central Management & Monitoring System. M2M IoT: PARK AIR SYSTEMS NORWAY (RMMS)
  37. Description: Big Data Search Assist: Search engines optimized for Big

    Data with self-learning improvements based on use Search engines for websites, intranets, apps With instant real-time search, single box with natural language processing, suggestions, highlighting, automatic corrections, “you wanted to say” tips, etc ... Advantages: Easy management for business users: Order of results, filters, etc ... Advanced features of the search engines with a cost ten times lower than other solutions Improved performance and scalability compared to other solutions Easy to integrate and use Search Engines
  38. Description: It gives a full 360 º of a company

    or brand online, showing a tool that integrates the three aspects that define your actual online image: How am I doing on social networks?: Do I know how to usevfacebook, twitter, google +, youtube, linkedin? How many followers do you have, are you an influencer, do you generate content that spreads out? What is my presence and reputation on the Internet: When it comes to me, how do people talk about me, what is said, how does it evolve over time, what is my position on the Internet regarding my competitors in the different aspects that interest me. SEO: Simple and practical analysis of both internal SEO and external SEO to complement and give an integrated view of the above aspects of reputation and social dialogue. Advantages: Real improvement of the company or the product by analysing the evolution over time of the three major aspects that define your online reputation. It improves the negative aspects, and reinforce the positive ones. Increase in sales: Helps optimize and follow marketing campaigns and improve sales. Improving conversions and attracting new customers. ORM and social dialogue
  39. Description: Analyzing various social networks and movements, looking for brand

    penetration, identifying influencers in conversations and a static map of associated terms. Advantages: Entering the social dialogue and hot topics at the right time multiplies by 100 times the viralization View how a social network moves as time goes by Allows to know what that the user is talking about when referring to my products or my brand. Detection of influencers and detractors  Optimal visualization of the information. Identification of the tags used most frequently by the network to improve your SEO. Social Mining
  40. Description: Search the social network comments and mentions of interest

    of a particular issue or event for further evaluation, influencers detection and graphical display of the conversation to facilitate analysis. Advantages: Show real-time event (symposium, forum, seminar, etc..) with visual information.  Get opinions and feelings about a topic in social networks in real time Identify the influencers of a hot topic  Risk detection and prevention  Emotional mining: Know the term that is most popular for some people, brand, event, etc.and this way you can know about the generated feelings by the most important terms. Social Network Tracking
  41. Description: Search the network content and publications on specific subjects

    of our interest, to detect, filter, collect and process relevant information in semi- real time or batch. Associated with the semantic analysis this allows the detection and classification of the contents effectively. Advantages: Allows the generating of sites in a dynamic way without any intervention or exhaustive searches, with the contents collected and categorized. Unifies in a single web all the tasks that users have to do manually, so it saves them money and generates loyalty. Web Content Scraping
  42. Description: Monitoring the download and streamming of videos. Analysis of

    streaming Quality of streaming Peaks of service and bottle neck Advantages: Problems detection and alerts Optimization of service Tracking of campains Tele5: Monitoring of logs for Streaming Videos
  43. Description: Allows you to label and categorize automatically and massively,

    any type of content or information. Advantages: Allows searching, categorization, clustering, and be able to extract value out of information otherwise hardly findable and usable. Utilizes state of the art tools to identify entities, NED systems, NERD. These tools combined with the use of disambiguation of entities using a Big Data system containing the Wikipedia and other sources of information. Speed ​​processing capabilities and data volume superior to that of other systems. Massive information tagging
  44. SUMMARY

  45. Is not about Big Data, is about getting maximum value

    from data: Get all the value data can give Process and analyze new types of data: Unstructured, semi- structured, streams of data Convert data into big insights Become a Data driven company
  46. “the best way to predict the future is to create

    it”
  47. Ride The “Big Data” wave

  48. Q&A