Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EY Aleph: Deep Learning applied to jurimetrics practice

EY Aleph: Deep Learning applied to jurimetrics practice

The jurimetrics practice in Brazil, when exists, usually is done with local data from legal offices using data collected from its own cases maintained using time-consuming manual processes. Our approach is based on data collection of all public data (not only cases that we work) combined with our workforce of lawyers to create trained datasets in order to use in natural language processing deep learning models for attributes extraction, such as judge, lawyer, plaintiff, a defendant, requests, and jurisprudence. The organization of unstructured data gathered enabled EY to deliver unprecedented analysis of litigation amounts deposits aiming to drastically reduce the provisions, besides other benefits in the legal chain, especially in labor law.

Michel Fernandes

July 23, 2018
Tweet

More Decks by Michel Fernandes

Other Decks in Technology

Transcript

  1. relevance 283,4 76,9 39,1 Total lawsuits in the balance sheets

    (R$ Billions)1 Tax Civil Labor 32% tax litigation in relation to the market value1 ICMS 90,4 Social Contribution on Net Income 65,2 Income Tax 22,8 PIS/Cofins 15,57 CIDE 11,72 Top Litigation Taxes (R$ Billions)1 1Source: “O contencioso tributário sob a perspectiva corporativa”, Ana Teresa L R Lopes All related data is about 2014 regarding 30 top companies in Brazil
  2. real jurimetrics for labor law legal provisions deal & defense

    models deal pricing operation optimization law firms performance
  3. regional labor tribunal 204 courts in region 2 (São Paulo)

    universal data physical lawsuits over than 7 MM of documents (pdf) available electronic lawsuits all lawsuits from 2015 more structured (html)
  4. putting all together public data collections normalization attribute extractions audit

    visualization scraping rpa deep learning computer vision document conversions unrtf poppler regex machine learning human check support web app web app viztools
  5. rate OUR setup stack defined by ia programing language &

    azure public cloud monthly bill < 1K USD
  6. cosmosdb no friction with data structure alterations, but not good

    with dataviz tools for BI teams nor application stack
  7. mysql perfect to dataviz tools and support the app summary

    from no-sql database, read-only scenario
  8. blob queue table stores objects with possibility of local redundancy

    (3 copies) or global (6 copies) has local redundancy (3 copies of the message) messages expire in 7 days storage of key-value type "No-sql like" does not allow map- reduce operations filters only by key (recommended) raw documents cleaned documents ml models job management orchestration configurations
  9. ai solution in a box cosmos no-sql app insights sql

    aleph admin ruby functions queue blob tables jenkins tfs celery users api mgnt redis cache cloud for b2b customers ey aleph ruby + ember mechanical turk staff
  10. In 20/02/2017 was declared … 20/02/2017 In 20 of December

    of 2017 … 20/12/2017 In the second day of January of two thousand and seventeen … 02/01/2017 In eleventh day of March of 2016 … 01/03/2016 our challenge: real unstructured data…
  11. …. Foundation Extra Hours Worker claims that the hours after

    work were not …. D E C I S I O N Of the additional of unhealthiness. The author worked for the claimed ones … II – FOUNDATION - Rescission sums The author postulates the payment of the amounts resulting from the unmotivated waiver … J u d g e m e n t … moral damages. The requester claimed that during his work at … …and it gets worse REQUESTS CONCLUSION
  12. TYPE OF PHRASE • Requests • Sentences • Other DECISION

    • Granted • Overruled 1 2 classifiers
  13. Text Representation Tokenizers Tested Algorithms WORD COUNT TF-IDF N-GRAMS +

    STOPWORDS UNIGRAM + STEMMING + STOPWORDS UNIGRAM + STOPWORDS REGRESSÃO LOGÍSTICA RANDOM FOREST GRADIENT BOOSTING CLASSIFIER MULTINOMIAL NAÏVE BAYES STOCHASTIC GRADIENT DESCENT SUPPORT VECTOR CLASSIFIER traditional nlp approaches
  14. traditional nlp approaches text representation tokenizer classifiers algorithms WORD COUNT

    TF-IDF N-GRAMS + STOPWORDS UNIGRAM + STEMMING + STOPWORDS UNIGRAM + STOPWORDS LOGISTIC REGRESSION RANDOM FOREST GRADIENT BOOSTING CLASSIFIER MULTINOMIAL NAÏVE BAYES STOCHASTIC GRADIENT DESCENT SUPPORT VECTOR CLASSIFIER
  15. Treinamento Teste first results logistic regression + tf-idf + no

    stopwords + stemming training testing f1-score: 0,90 f1-score: 0,81
  16. TYPE OF PHRASE • Demands • Sentences • Nothing DECISION

    • Granted • Overruled 1 2 DEEP LEARNING TRADITIONAL NLP classifiers
  17. “I do not find the defendant– in light of all

    available evidence and according to the law and the decision of the jury, and so it goes and yadda yadda – to be guilty.” recurrent neural networks