Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Text-Mining: Big Data Analytics voor ongestruct...

Marketing OGZ
September 20, 2022
120

Text-Mining: Big Data Analytics voor ongestructureerde data

Marketing OGZ

September 20, 2022
Tweet

Transcript

  1. Introduction SLIDE / 2 Full-Professor Text-Mining (IR, TM & NLP)

    University of Maastricht Co-creator of various legal technology courses Leiden University & HBO-rechten Chief Data Scientist at iPRO (ZyLAB) https://www.linkedin.com/in/jscholtes/
  2. Unstructured versus Structured data • Most data is (textual) human

    generated data. • This is what we call unstructured data. • Before we can do anything with this type of data, we first need to structure it.
  3. An Example: Financial Transactions 1. Part of the transaction fits

    in a data base (records that forms rows and columns in a data base) 2. Part are emails, (draft) contracts (PDF, Word & TIFF), audio recording, zoom meetings, ….
  4. Big-Data analytics require data to be structured • Eliminate noise

    • Recognize structure • Recognize meaning (semantics, role of a word) • Recognize intent, tone, sentiments, emotions, discourse, etc. • Feeds into data science algorithms and structures such as knowledge graphs • Ultimately, helps to recognize meaning & anomalies
  5. Text Mining Text Mining: The next step in Search Technology

    Finding without knowing exactly what you’re looking for, or finding what apparently isn’t there (or who do not want to be found …).
  6. What is Text Mining? Scientific: Text mining concerns the development

    of various mathematical (statistical & geometrical), linguistic, and machine-learning techniques which allow automatic analysis of unstructured information as well as the extraction of high quality and relevant meta-data, all with the goal to make natural language better searchable. What this means: Any method we can use, and high quality refers to the combination of the relevance and the acquiring of new and interesting insights. What we actually do: Not only finding a needle in a haystack but also finding out where the haystack is and how the needle looks like!
  7. LANGUAGE English CITY New Brunswick, WASHINGTON COMPANY J&J, Johnson &

    Johnson COUNTRY Greece, Poland, Romania, United Kingdom CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD DATE 04-08 DAY Fri, Friday NOUN_GROUP biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations, intense revenue pressure, meaningfu credit, medical device kickbacks, medical devices, multiple businesses, next several days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past year, paying kickbacks, same time, several new positions, similar violations, travel gifts ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan PEOPLES Iraqi PERSON Erik Gordon, Mythili Raman, William Weldon PLACE_REGION Europe PRODUCT Benadryl, Tylenol PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil STATE N.J. TIME 1:32 pm ET TIME_PERIOD 13 years, five years, six months, three years YEAR 2007 PROBLEM "We went to the government to report improper payments and have taken full responsibility for these actions," said William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver the bribes. SENTIMENT giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing foreign officials, what is honest REQUEST make sure it complies with anti-bribery laws across its businesses
  8. Text Mining the Lord of the Rings • Automatic identification

    of key players (custodians) • Automatic identification of locations. • Automatic identification of travel patterns of key players. • Visualize in time.
  9. Big Data Analytics & the Law • Privacy: (redaction aka

    “aflakken”, subject access requests aka “inzage verzoeken” • Competition law aka “mededinging” • Fraud investigations • Legal Fact-Finding missions • Contract analysis (due diligence) • Compliance monitoring & audits • …
  10. 3x more relevant documents than Boolean search Nocomplex queries, just

    review documents 2x total number of relevant documents is all that need to be reviewed Estimateaccurately percentage of all relevant documents found at end Teach the computer what to look for …
  11. Q&A