GATE General Architecture for Text Engineering

GATE General Architecture for Text Engineering

GATE is a Natural Language Processing Platform written in Java. Take a look to know what it can do for you.

06d430e53a3f4b2b205748fd1cf190a1?s=128

Ahmed Magdy

June 01, 2012
Tweet

Transcript

  1. GATE General Architecture for Text Engineering Presented by Ahmed Magdy

    Ezzeldin
  2. What is Text Engineering? • Text or Language Engineering means

    applying scientific principles to the design, construction and maintenance of tools to help deal with information that has been expressed in natural languages (the languages that people use for communicating with one another).
  3. Applications • Automatic summarization • Co-reference resolution • Discourse analysis

    (elaboration, explanation, contrast, question, statement, assertion) • Machine translation • Morphological segmentation • Named entity recognition • Natural language generation • Natural language understanding • Optical character recognition (OCR) • Part-of-speech tagging • Parsing • Question answering • Relationship extraction • Sentiment analysis (Polarity) • Speech recognition (Speech segmentation) • Sentence breaking, Word segmentation, Topic segmentation • Word sense disambiguation
  4. What is GATE? • General Architecture for Text Engineering •

    Java suite of NLP tools • University of Sheffield • Initial Release 1995 (17 years ago) • Last Stable Release 6.1 May 6, 2011 • Languages : English, Spanish, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian. • Accepted Input Formats TXT, HTML, XML, Doc, PDF and Java Serial, PostgreSQL, Lucene, Oracle Databases • GATE Developer which is a GATE graphical user interface, like Eclipse for Java programmers, provides a graphical environment for research and development of language processing software.
  5. Gate Components and APIs

  6. ANNIE GATE Application • A Nearly-New Information Extraction System •

    Example Application for English Language Engineering • A set of modules: • Tokenizer • Gazetteer • Sentence splitter • Part-of-speech tagger • Named entities transducer • Co-reference tagger.
  7. ANNIE Architecture

  8. Demos • ANNIE Gazetteer: A list lookup component. The list

    files are located in $GATE_HOME/plugins/ANNIE/resources/gazetteer • JAPE Transducer: JAPE is a Java Annotation Patterns Engine. JAPE provides finite state transduction over annotations based on regular expressions. Example files are located in $GATE_HOME/plugins/ANNIE/resources/NE • ANNIE NE Transducer: (ANNIE named entity grammar) a semantic tagger based on the JAPE language.
  9. Mimir • Provides indexing and searching the linguistic and semantic

    information generated by GATE Demo
  10. Installing Mimir

  11. • Open GATE and Load ANNIE Systems with Defaults •

    Then click the Manage CREOLE Plug-ins
  12. Add Mimir Client Path • Add Mimir as a Plugin

    and set mimir-client directory
  13. • Make sure Mimir Plugin is loaded now and every

    time you open GATE
  14. • Add Mimir Indexing PR to Processing Resources

  15. • Create a New Corpus from Language Resource

  16. • Right Click the Corpus and populate it with Documents

  17. Edit the Default Index Template • Open http://localhost:8080/mimir-demo in your

    browser and go to the configuration page • Then go to the Index Templates section and manage them • Then Click on the default Index Template to edit it.
  18. Add some annotations to the Default Index Template

  19. Add a new Index

  20. Edit the Index you created and set the Scorer Algorithm

    (1) (2) (3)
  21. Copy the Index URL

  22. Paste Index URL in Mimir and Run ANNIE on the

    Corpus
  23. Double click any document and check Annotations yourself

  24. Close and Search the Index (1) (2)

  25. Example Query

  26. Thank you

  27. References http://gate.ac.uk http://www.wikipedia.com GATE Website (it is huge) Mother of

    all Knowledge