Kun Lu

Kun Lu

This talk is about applying text-mining on academic publications to extract information such as knowledge graphs of related concepts. The overall goal is to help researchers to better and faster explore a large amount of text documents which in this case are academic publications. Given the text of thousands or millions of publications, I will show 1) how to use natural language processing techniques to extract concepts 2) how to use statistics and bayersian theory to identify related concepts 3) how to use a short-long-term memory mechanism to learn related concepts in a sequential manner. In the end, we will be able to obtain a "knowledge graph" among the concepts through this text-mining process. Part of the results could be found at: www.neuronbit.io. The result of this work can also be useful for semantic search, document classification, information retrieval, etc.

3c3f3f18c25ea5283640ebd23553e7c6?s=128

MunichDataGeeks

September 08, 2016
Tweet

Transcript

  1. Mining Academic Papers Kun Lu

  2. source: www.hallaminternet.com

  3. 26,373,091 on PUBMED 4.011.868 on IEEE as on 2016.09.05

  4. None
  5. None
  6. MOTIVATION SEARCH alone is not enough any more!

  7. MOTIVATION SEARCH gives you meat, MINING gives you kitchen.

  8. MOTIVATION What do I want? “Research intelligence”

  9. non-negative matrix factorization sparse coding independent component analysis (ICA) elastic

    bunch graph matching sparseness constraints face recognition predictive coding fisher linear discriminant model visual cortex rodent brain Gabor feature based classification … #. papers
  10. non-negative matrix factorization independent component analysis (ICA) elastic bunch graph

    matching sparse coding sparseness constraints rodent brain predictive coding visual cortex Gabor feature based classification face recognition fisher linear discriminant model
  11. predictive coding visual cortex non-negative matrix factorization sparseness constraints face

    recognition … sparse coding
  12. non-negative matrix factorization sparse coding independent component analysis (ICA) elastic

    bunch graph matching sparseness constraints face recognition … … Year 06 00 05
  13. MOTIVATION Shorten the learning curve from days to seconds

  14. MINING

  15. Statistical Modelling Natural Language Processing Part-of-Speech (POS) tagging Concepts/Term identification

    Bayersian inference Document frequency Contextual doc. frequency (title-abstract, neighbourhood)
  16. Part-of-speech Tagging We need to find out “Noun Phrases”: …

    One approach to understanding such response properties of visual neurons has been to consider their relationship to the statistical structure of natural images in terms of efficient coding.
  17. Part-of-speech Tagging We need to find out “Noun Phrases”: …

    One approach to understanding such response properties of visual neurons has been to consider their relationship to the statistical structure of natural images in terms of efficient coding.
  18. Part-of-speech Tagging Different approaches: Simply delimited by stop-words (noise words

    like “a”, “of”, etc.) Python NLTK (NLP toolkit) • Not always correct ◦ 'mechanism linking ...', 'alter gene expression', 'whereas’ all tagged as nouns Train a prediction model • Define features and calculate the statistics Look-up table based (using online dictionary)
  19. This paper provides an introduction to mixed-effects models for the

    analysis of repeated measurement data with subjects and items as crossed random effects. A worked-out example of how to use recent software for mixed-effects modeling is provided. Simulation studies illustrate the advantages offered by mixed-effects analyses compared to traditional analyses based on quasi-F tests, by-subjects analyses, combined by-subjects and by-items analyses, and random regression. Applications and possibilities across a range of domains of inquiry are discussed.
  20. “the -” “- is” “- that” end-of-sentence “-tion” Elementary Features

    => Learned Combinatorial Features => Selection criterion - if a feature increases the certainty Example: I(f3) = I({f1,f2}) - I({f1,f2,f3}), take “f3” is I(f3)>0 i.e. learning is based on “Positive Information” or “decreased Entropy”
  21. Part-of-speech Tagging POS look-up table: Crawl online dictionary Example: “drive”

    - [v, n], “driven” - [past participle] “generate” - [v]
  22. Part-of-speech Tagging • Summary ◦ Do not aim at 100%

    accuracy now ▪ (high recall, low precision) ◦ Statistics will help later ▪ (high recall, high precision)
  23. Term Identification (Adv* Adj* Noun+)+ Use POS look-up table for

    pruning, example: • If last word W has ending “-ed” and Lookup(W) = {past participle}, then drop it
  24. Statistical Modelling Natural Language Processing Part-of-Speech (POS) tagging Concepts/Term identification

    Bayersian inference Document frequency Contextual doc. frequency (title-abstract, neighbourhood)
  25. Document Frequency DF(black) = 0.8 DF(yellow) = 0.5 DF(red) =

    0.4 DF(blue) = 0.2
  26. filtering is optional

  27. Co-occurrence statistics Along these lines, a number of studies have

    attempted to train unsupervised learning algorithms on natural images in the hope of developing receptive fields with similar properties, but none has succeeded in producing a full set that spans the image space and contains all three of the above properties. Data structure: a dictionary in Python
  28. Score of statistical significance “Score” of statistical significance =>

  29. Equivalent to using “cosine distance” and “mutual information”

  30. For Term_i and its candidate set of co-occurred terms {Term_j}:

    select Selection
  31. Contextual co-occurrence Not just bag-of-words or bag-of-terms: Terms in TITLE

    ⇒ Terms in ABSTRACT Neighborhood: within left/right X words/sentences
  32. Examples see www.neuronbit.io

  33. None
  34. None
  35. None
  36. None
  37. None
  38. What you need to know What you might need to

    know What you dont know Thank you!
  39. None
  40. Deep learning allows computational models that are composed of multiple

    processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
  41. [('red', 1.0), ('algal', 0.287), ('yellow', 0.211), ('blue', 0.190), ('panda', 0.188),

    ('orange', 0.176), ('dwarfs', 0.168), ('pink', 0.168), ('pigments', 0.167), ('algae', 0.156), ('green', 0.143), ('hemoglobin', 0.109), ('colors', 0.108), ('colours', 0.104), ('chloroplasts', 0.102), ('fox', 0.100), ('chloroplast', 0.096), ('wines', 0.091), ('white', 0.087), ('colour', 0.082), ('deer', 0.082466379091601122), ('grape', 0.081), ('black', 0.072), ('wore', 0.071), ('brown', 0.069), ('color', 0.069), ('flag', 0.069), ('oak', 0.068), ('gray', 0.068), ('blood', 0.067), ('wine', 0.054), ('flowers', 0.046), ('cameras', 0.046), ('light', 0.037), ('star', 0.035), ('cells', 0.030812517156189954), ('gold', 0.028138331669238448), ('skin', 0.0218), ('silver', 0.0197), ('arms', 0.015723270440251572), ('cell', 0.01505955665184512), ('symbol', 0.014), ('variety', 0.0125), ('turn', 0.012), ('derived', 0.0109), ('fish', 0.011), ('top', 0.0106), ('species', 0.0097), ('seen', 0.0086708334838686372), ('produce', 0.0084), ('plants', 0.00819), ('volume', 0.00812), ('wide', 0.00802), ('small', 0.007995), ('line', 0.006218), ('called', 0.00572), ('typically', 0.0005), ('described', 0.00052), ('short', 0.00051), ('low', 0.00047), ('region', 0.000460), ('right', 0.00036), ('common', 0.00033), ('large', 0.00032), ('high', 0.000322), ('found', 0.000315), ('word', 0.000307), ('study', 0.00030636490654032251), ('type', 0.000302), ('made', 0.00028), ('name', 0.000271), ('example', 0.00025246098934666376), ('known', 0.000245), ('years', 0.000227), ('different', 0.000212), ('part', 0.000208), ('use', 0.000179), ('people', 0.00015285), ('used', 0.000141), ('one', 0.000126)]
  42. DF(receptive fields) / DF(developing receptive fields) > a