Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Discovery NLP

Summit
September 05, 2018

Data Discovery NLP

Autoservicio de datos con NLP y Deep Learning.

Summit

September 05, 2018
Tweet

More Decks by Summit

Other Decks in Technology

Transcript

  1. BIG DATA Agosto 24 y 25 | Lima – Perú

    2018 ANALYTICS SUMMIT #BIGDATASUMMIT2018
  2. Hola! Soy Ignacio Julio Marrero Hervás • Mis orígenes son

    la Física Experimental, concretamente la Astrofísica • Ingeniería de software desde hace mas de 20 años • En los últimos 8 años nuevas arquitecturas de datos basadas en tecnología Big Data. • 5 años en los ecosistemas de innovación y emprendimiento donde cree una startup en Big Data & Analytics • Actualmente en Accenture como manager y arquitecto Big Data y de soluciones analíticas • En este último año estoy liderando el área de Inteligencia Artificial en Digital Delivery Me puedes encontrar como [email protected]
  3. Autoservicio de datos con NLP y Deep Learning El crecimiento

    de las arquitecturas de datos no siempre sigue las mejores prácticas en cuanto a modelado y arquitectura. Las herramientas de exploración de datos proporcionan funcionalidades que permiten a los analistas buscar y analizar, pero requieren un conocimiento detallado del modelo de datos. Las mejoras en la interacción usuario-herramienta pueden suponer un cambio radical en la productividad y eficiencia de las compañías con un objetivo “Data Centric”. El uso del lenguaje natural es actualmente la forma más directa de comunicación, haciendo mas preciso y productivo el análisis de datos
  4. Índice Visión Clásica de Data Discovery Visión Data Centric Marco

    para un nuevo Data Discovey Caso de Negocio Metodología para el desarrollo del proyecto Arquitectura de la solución Entrenamiento del sistema Modelo Operativo
  5. VISIÓN CLÁSICA DE DATA DISCOVERY Client Team Accenture Data Discovery

    Team 1. Client Team • Business Inputs • Analytics Feedback • Data Access • Infrastructure Access • Data Extraction • Data Requirements • Reviews Data Science Team Business Team Analytics & Data Team Delivery Lead Vendor Support Requirements, Data, Reviews & Feedback Analytic Models, Outcomes 2. Delivery Lead • Scope Management • Delivery Management • Schedule Management • Status Meetings 3. Data Science Team • Domain Experts • Data Provisioning • Big Data Analytics • Data Visualization • Outcome Interpretation • Reviews 4. Platform Support • Environment Provisioning • Environment Support • Vendor Interactions 6. Vendor Support • ACP Support • AWS Support • Hadoop Support • Revolution Support • Tableau Support • … 5. Data Discovery Platform • Hosted and Managed • Tools & Methodology • Automation • Re-usable Assets Platform Support team Infrastructure Support Software Support 1 2 3 4 5 6 Copyright © 2018 Accenture Todos los derechos reservados.
  6. THE NEW DATA SUPPLY CHAIN We address the client data

    needs, meet the challenges and embrace the market trends to achieve the objective of providing a trusted intelligent data foundation CXO, CDO, CIO CHALLENGES BUSINESS & OPERATING NEEDS CONVERGING MARKET TRENDS Copyright © 2018 Accenture Todos los derechos reservados.
  7. Our approach focuses on 5 key capability pillars that constitute

    an end-to-end implementation, providing detailed architectures, designs and technology options. In this runbook, we focus on “Data Curation”. Key DSC Capability Pillars & Our Approach 17 1 2 3 4 5 6 DATA SOURCES DATA CURATION DATA PROVISIONING DATA CAPTURE DATA EXPLORATION DATA CONSUMPTION • Relational Data • Streaming Data • Unstructured Data • Structured Data • Standardization • Data Protection • Data Promotion • Anomaly & Validation • RDBMS • In-Memory Grid • Analytical Appliances • NoSQL • Self Service • Advanced Analytics • BI/Reporting • Analytics • Visualization • Data Staging • Data Ingestion • Metadata • Classification ** Data Curation only covered in this Playbook Copyright © 2018 Accenture Todos los derechos reservados.
  8. Know Data Model Find Data Value Find Data Value Genius

    CASO DE NEGOCIO Copyright © 2018 Accenture Todos los derechos reservados.
  9. 10 We normally define and refer to the following approach

    to deliver Virtual Agent: 1. Scope Analysis 2. Design 3. Build 4. Train & Test • Define purpose (functional scenario) and perimeter (touchpoints, type interaction, perimeter, integration, handover) • Identify HL use case and their business value • Define dialog structure, master questions & related answers • Identify additional capabilities requested (multimedia contents, authentication) • Detail dialog structure and in general UX, including exit strategy and fault management, also involving copy-writer • Involve key stakeholder for contribution and validation (Customer Care, Marketing & Communication) • Assess data required and integrations with legacies • Identify sources & knowledge corpus for creating training set 5. Roll out and continuous improvement • Plan a business simulation and progressive ramp-up • Analyze and classify logs and transcripts in order to train again the model • Extend and refine set of managed scenarios • Coding dialog and configure workflows • Shape decoupling layer component to integrate APIs from legacy • Classify knowledge corpus and train the ML model • Configure and tailor TTS/STT APIs and other added services • Execute performance, integration and functional e2e test • Operatively involve users and stakeholders (e.g. CC Reps) in tests • Use test outcomes to refine and strengthen training of the model METODOLOGÍA PARA EL DESARROLLO DEL PROYECTO Copyright © 2018 Accenture Todos los derechos reservados.
  10. EL MODELO OPERATIVO PROPUESTO PLANTEARÁ UNA METODOLOGÍA ÁGIL PARA EL

    DELIVERY DE DE RPA&AI, ACCENTURE DISPONE DE TEMPLATES TESTEADOS DE SUPORTE A ESTA ENTREGA ÁGIL RPA-AI Backlog Sprint Backlog Prioritización por el Product Owner User Stories Prioritizadas Delivery Sprints (Tasks) Pruebas Sprint Backlog: process segments bundled in bot increment Comité de Révision y Retrospectiva Pruebas de Aceptación Confirmadas Solución en Producción Defectos y Mejoras Process Automation Backlog Implementación Incremental Mantenimiento y Soporte a la solución Daily standup meetings Defectos Analisis Detallada y Diseño e la Solución Feedback – Potential changes ACTIVOS ACELERADORES DE UNA DELIVERY UNIT ÁGIL Matriz de Descubrimiento Questionnário 2- Pager Herramienta de Evaluación de Riesgo Plano de Pruebas Descripción Proceso de Negocio a Automatizar Documento de Diseño de la Solución Mejoras prácticas de Desarrollo de Soluciones de AI Propuesta de Relatório Diario Propuesta de Relatório de Pruebas Manual de Puesta en Producción (…) METODOLOGÍA PARA EL DESARROLLO DEL PROYECTO Copyright © 2018 Accenture Todos los derechos reservados.
  11. Semantic Engine Neural Network Engine Genius API 1 2 Unsupervised:

    Neuronal Network Execution Architecture Training Supervised: Semantic 1 2 1. Genius will consult the neural network first. Once trained, the neural network will be much more efficient and faster than the semantic engine. 2. If the answer does not have the right precision Genius will invoke the semantic engine, much more flexible. 1. The semantic engine will be trained in a supervised way first. Experts will provide model semantics and generate business question. Genius will provide a set of possible alternatives and the expert will choose the right one 2. Augmenting the set of questions from the expert, an unsupervised training of the neural network will be conducted. Genius has two engines to give it intelligence. • The semantic engine will find the fit between the questions with business terms entered by the user and the metadata of the model that fits the question. Last, it will compose the answer to the question as a SQL sentence • The neural network will be trained incrementally from the results of the semantic model and will be able to iteratively learn the question-answer patterns (question-business terms → answer-data architecture) 1 2 1 2 ARQUITECTURA DE LA SOLUCIÓN: INTELLIGENCE ENGINE Copyright © 2018 Accenture Todos los derechos reservados.
  12. Processing of the query • Standardization • Treatment of regular

    expressions • Tokenization and stemming • Classification of concepts • Analysis of N-grams Search of Tables and Columns • Obtaining tables • Obtaining columns Response system • Obtaining tuples • Generation of the tuples file Graph generation • Graph construction process • Ingest and persistency Exploration process • Obtaining the shortest paths Generation of the SQL statement • Chatbot help in the filtering process • Generation of the JOIN and SELECT statement • Generation of WHERE statement The figure below shows the sequence of analytical processes running in the semantic engine. ARQUITECTURA DE LA SOLUCIÓN: SEMANTIC ENGINE Copyright © 2018 Accenture Todos los derechos reservados.
  13. Natural Lenguage Adaptation to syntax of data architectures Banking Sector

    Insurance Sector Bank Bank Insurer Insurer Specific customer Layer Sector/Domain Layer Language Adaptation Layer Natural Language Layer Transfer Learning Deep Learning techiques are used to implement a Neural Network engine that will be splitted in several layers and fuctional ambits. The architecture shown bellow is based on two pillars • MAXIMIZATION The division in layers allows to maximize the Transfer Learning between layers allowing to reuse available resources generated by the community or by third parties. • SPECIALIZATION The division of the upper layers in functional areas allows to specialize the models, minimizing the loss of precision in excessively wide models. ARQUITECTURA DE LA SOLUCIÓN: NEURAL NETWORK ENGINE Copyright © 2018 Accenture Todos los derechos reservados.
  14. Preparation of the entity model Ingestion of entities Consolidation of

    the model Refinement of the model Continuous integration: Regression test Domain Continuous integration: Liberation RELEASE only Question- Answer Training 2 3 4 7 6 5 Integration in the Domain model 8 Ingestion and entity Modeling Domain Consolidation Training Test Modify Commit to entity branch Semantic training of a set of entities Training of the neural network The workflow that is going to be implemented for each domain is shown below: TECHNICAL DESIGN OF OPERATIONAL MODEL PROCESSES 1 ENTRENAMIENTO DEL SISTEMA Copyright © 2018 Accenture Todos los derechos reservados.
  15. Introduce questions Analyze relevance of business terms in the response

    Update labeling model / include new dictionaries Within each domain, the training process will be carried out independently Identify diccionaries associated to the specific metamodel Manual review and enrichment of business terms automatically identified by Genius Automatic metamodel analysis to identify business terms Consolidation of the enriched metamodel DATA INGESTION TRAINING Identification of the metamodel of data to be ingested Validation of System integrity (Automatic) Correct answer Wrong answer Requires involvement of the customer team to complete and customize the task ENTRENAMIENTO DEL SISTEMA: SEMANTIC ENGINE Copyright © 2018 Accenture Todos los derechos reservados.
  16. • Using the question-answers made by the expert in the

    supervised training of the semantic module, the non- expert roles will increase this set by a factor between 2 and 3. 1 2 • Using the augmented set will generate questions automatically, through the combination of terms and filters, which will allow creating a much larger number of questions with different degrees of ambiguity and quality. • The non-expert team validates the results of the semantic engine using analytical tools that will identify the failed tests and the causes of the problem. The data set could be regenerated 4 Assisted increase of the questions set Unattended increase of the questions set Validation of the question-answer set 6 Neural network training 5 Generation of training set • Training will be conducted for each set of tests comparing the results. The confusion matrix will be analyzed and if bias are detected, the generation of data will be iterated and the training will be carried out again. • Training sets will be generated using A / B testing models that mix real questions from the Model Training teams with the self-generated synthetic sets. This will allow to prevent bias in the learning of the network. • A test set will be launched that will obtain the answers of the semantic engine corresponding to each question of the augmented set 3 Getting answers Within each domain, the training process will be carried out independently ENTRENAMIENTO DEL SISTEMA: NEURAL NETWORK ENGINE Copyright © 2018 Accenture Todos los derechos reservados.