Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Huib Zuidervaart: The ePistolarium

Huib Zuidervaart: The ePistolarium

More Decks by Cultures of Knowledge: Networking the Republic of Letters, 1550-1750

Transcript

  1. Huib J. Zuidervaart Cultures of Knowledge The Hague, The Netherlands

    Oxford, 30 October 2013 The ePistolarium: Experiences in the development of a digital tool for the research of 17th-century scholarly correspondences
  2. 2 The presentation of today: • I. Sketch of the

    process of development (2008-2013) of the ePistolarium (purpose, participating parties) • II. Casus with automatically generated co-citations
  3. 4 Participants: • Descartes Centre –Utrecht University • Huygens ING

    (KNAW), The Hague • Amsterdam University • Royal Library, The Hague • Data Archiving and Networked Services (DANS) (KNAW), The Hague Funding: • NWO – (Dutch governmental funding) • Additional: CLARIN NL & CLARIN-EU
  4. 5 Ambitions & Questions: • How can we combine and

    structure different sets of letters of 17th-century scholars, in such a way that the production, circulation and use of knowledge can be analyzed and visualized in a wider international context?
  5. 6 Ambitions & Questions: • How can we combine and

    structure different sets of letters of 17th-century scholars, in such a way that the production, circulation and use of knowledge can be analyzed and visualized in a wider international context? • Can we recognize the themes and stakeholders that were important at the time in the scholarly debates in space and time?
  6. 7 Required: a bulk of transcriptions of letters (‘data’) Making

    transcriptions is too time consuming for bulk Only the development of the tool was funded
  7. 8 Solution: OCR-files of edited correspondence Caspar van Baarle (Barlaeus)

    505 Isaac Beeckman 21 René Descartes 727 Hugo de Groot (Grotius) 8.034 Christiaan Huygens 3.080 Constantijn Huygens 7.119 Antoni van Leeuwenhoek 282 Dirck Rembrantsz van Nierop 80 Jan Swammerdam 172 Total 20.020
  8. 9 Problems: • No uniformity in structure of the printed

    editions • Corpus is not uniform in language (Latin, English, German, French, Italian, Dutch) • No uniformity in spelling • No uniformity in available metadata !!! • What to do with figures and formula? • Etcetera …. • CURATION REQUIERED A LOT OF (HAND-)WORK
  9. 11 Problems to tackle: • Automatic language identification. • Spelling

    normalization. • Removal of definite and indefinite articles, conjugations, etc. • Topic modelling and keyword analysis. • Named Entity Recognition. • Network analysis. • Co-citation module. • Several visualization modules. • … • (most work is done by Dr. Walter Ravenek, our major ICT-developer)
  10. 12 Topic Modelling A document is divided into topics; A

    topic is a group of words. The co-occurrence of a word with a topic can be mathematically analyzed.
  11. 13 Topic Modelling A document is divided into topics; A

    topic is a group of words. The co-occurrence of a word with a topic can be mathematically analyzed. Topics are used to identify in the corpus : • Similar words • comparable documents • Documents that resemble a text fragment that the researcher offers to the tool • First experiments were carried out with Latent Dirichlet Allocation (LDA)
  12. 14 Latent Dirichlet allocation (LDA) LDA is a generative mathematical

    model, proposed in 2003, that explains why some parts of the data are related. LDA is a variant of Probabilistic latent semantic analysis (PLSA), a statistical technique for the analysis of the relations between data. However, LDA disregards the mutual proximity of words..
  13. 16 Evaluation after almost two years of work: 20 International

    historians of science were asked in a workshop: “is this something you could use in your historical research?”
  14. 17 Is this useful? No, not really !!!!!!!!!!!! • Give

    me the texts; than I can figure it out myself. • The tool provides no suggestions which I recognize as being important. • I cannot recognize words that the tool presents as existing in the text. • I can not search the texts in similar way as in Google.
  15. 18 Continuation: • Experiments were done with different methods of

    ‘topic modelling’, in combination with language technology. (e.g. mutual proximity of words) • The possibilities of how to search were enhanced. • Historical researchers were more involved in the development of the tool (frequent test sessions). • Several suggestions for a user interface and for visualizations were followed.
  16. 21 Result: the ePistolarium • A combined corpus of letters

    (unfortunately in many cases far too small for really good results). • Facetted search & ‘google search’. • The tool provides new suggestions after each search. • The possibility of looking for related paragraphs. • Visualizations & co-citations. • No ‘collaboratory’. • No annotations. • But very convenient to trace some discussions.
  17. 23 Suggestions of new keywords after each search Calculated from

    the whole corpus of letters in a certain language
  18. 25 Suggestions of new keywords after each search However, the

    quality of the suggestions is at the moment language dependent  Good for French; Bad for English!
  19. 26 Language search word # results start end Latin perspicillum

    5 1655 1657 perspicillum + suggested variations 19 1643 1665 perspicil* 23 1625 1665 Latin Telescopia 25 1652 1687 Latin Telescopia + suggested variations 92 1629 1693 English Telescope 50 1637 1692 no suggested variations Lat & Eng Telescop* 203 1629 1693 French Lunette 195 1622 1693 no suggested variations Lunet* 413 1622 1693 Dutch Verrekijker 2 1645 1665 Verrekijker + suggested variations 15 1645 1701 Verrek* 12 1637 1701 Italian occhiale 6 1660 1660 no suggested variations Italian conspiciliorum 1 1659 Problem !
  20. 27 Resultaat: ePistolarium • A combined corpus of letters (unfortunately

    in many cases far too small for really good results). • Facetted search & ‘google search’. • The tool provides new suggestions after every search. • The possibility of looking for related paragraphs. • Visualizations & co-citations. • No ‘collaboratory’ . • No annotations. • But very convenient to trace some discussions.
  21. 33 co-citation analysis • What is a co-citation? • In

    short: when a number of persons are found together in the same paragraph, this coincidence is coined a ‘co-citation’. • Since the introduction of ‘co-citation’ (in the years 1970), co-citation analysis has become an important method for the study of the structure of a scientific debate.
  22. 34 co-citation analysis • The ePistolarium tool generates automatically co-citations

    from each paragraph in the corpus of letters. • This has been made possible by the use of software driven ‘Named-Entity Recognition’, to be followed by semi- automatical identification of the names of persons in the letters.
  23. 35 co-citation analysis • To test the usefulness of this

    functionality, I have looked to whom the ePistolarium identifies as the mayor figures in the scientific debate concerning the discovery of the ring structure and the moon of Saturn.
  24. 36 co-citation analysis • To test the usefulness of this

    functionality, I have looked to whom the ePistolarium identifies as the mayor figures in the scientific debate concerning the discovery of the ring structure and the moon of Saturn. • In this debate Christiaan Huygens has played a crucial role.
  25. 37 co-citation analysis • To test the usefulness of this

    functionality, I have looked to whom the ePistolarium identifies as the mayor figures in the scientific debate concerning the discovery of the ring structure and the moon of Saturn. • In this debate Christiaan Huygens has played a crucial role. . • Christiaan’s correspondence is part of the ePistolarium. So this case is very suited to test the co-citation functionality in the ePistolarium tool.
  26. 38 co-citation analysis • Moreover, this case has been studied

    very carefully in the past (Van Helden: 11 large papers from the years 1968-1996) Made with Wordle.com: Van Heldens papers together.
  27. 39 co-citation analysis • Such a check of ‘already available

    knowledge is necessary to evaluate the reliability of the ePistolarium as a research tool in such a complicated corpus of letters in various languages, partly also in ancient spelling.
  28. 41 The case of Saturn: I. > 1616: planet with

    handles II. 1642 Planet WITHOUT handles
  29. 42 The case of Saturn: I. > 1616: planet with

    handles II. 1642 planet WITHOUT handles III. 1642-1654: confusion: observations are collected
  30. 44 The case of Saturn: V. 1671/1672 Cassini discovers Moons

    2 & 3 (Iapetus & Rhea) VI. 1675 Discovery of the separation in Saturn’s ring (Cassini - division) VII. 1684 Cassini discovers Moons 4 & 5 (Tethys & Dione)
  31. 48 The case of Saturn: | | | Ring &

    Moon 1 Moon 2/3 Moon 4/5 letters in the ePistolarium with search ‘Saturn*’ (total 395 letters) ePistolarium By hand
  32. 52 III. PERIOD 1640-1654 • ePistolarium: search word: ‘Saturn*’ -

    15 letters • Co-citations with threshold 10: • None of the astronomical stakeholders is mentioned. • Problem: ‘Die[m] / Diebus Saturni’ = Saterday
  33. 53 III. PERIOD 1640-1654 • search word: Saturn* NOT "die

    Saturni" NOT "diem Saturni" NOT "diebus Saturni" • 6 letters • Only Galilei is mentioned as Saturn´s first observer. • Problem: Saturn can also be used in an astrological context
  34. 56 IV. PERIOD 1654-1659 • ePistolarium: search word: ‘Saturn*’ -

    155 letters • Co-citations with threshold 10:
  35. 57 IV. PERIOD 1654-1659 Identified by ePistolarium & Literature 1.

    Descartes - provided theoretical framework 2. Boulliau - Observer & astronomer 3. Gassendi - Observer & astronomer 4. Hevelius - Observer & astronomer 5. De Montmort - mathematician; leader of the ‘Montmor Academy’ 6. De Roberval - mathematician with his own theory of Saturn 7. Heinsius - Go-between 8. Van Schooten - Go-between. 9. Fermat - mathematician – worked on ellipses 10. Pascal - mathematician – worked on ellipses
  36. 58 V. PERIOD 1660-1670 • ePistolarium: search word: ‘Saturn*’ -

    158 letters • Co-citations with threshold 10:
  37. 59 V. PERIOD 1660-1670 Identified by ePistolarium & Literature 1.

    Boulliau - Observer & astronomer 2. Hevelius - Observer & astronomer 3. Huygens - Discussed about Saturn’s ring 4. De Bessy - Discussed about Saturn’s ring 5. Wren - Discussed about Saturn’s ring 6. Fabri - Opposed Huygens’ observations 7. Divini - Telescope maker; opposed Huygens’ observations 8. Boyle - Go-between 9. De Medici - Dedication of the ‘Systema Saturni’ 10. Thévenot - Leader of a informal scholarly society in Paris 11. Vossius - Go-between
  38. 62 VI. PERIOD 1671-1685 • ePistolarium: search word: ‘Saturn*’ -

    48 letters • Co-citations with threshold 10:
  39. 63 VI. PERIOD 1671-1685 Identified by ePistolarium & Literature 1.

    Cassini - astronomer (discoverer of the moons of Saturn) 2. Campani - constructor of large telescopes 3. Colbert - Appointed Cassini at the Observatoire 4. Wallis - mathematician; discussed the shape of Saturn 5. Descartes - provided cosmological model of explanation 6. Copernicus - provided cosmological model of explanation 7. La Hire - French astronomer 8. Alhazen - Arab natural philosopher & mathematician 9. Sluze - mathematician; worked on curves 10. Catelan - author of a book on scales 11. Mariotte - ‘physicist’ (worked on watches) 12. Galois - ‘physicist’ (worked on mechanics)
  40. 64 CONCLUSIONS • Co-citations in the ePistolarium reveal the most

    important players in the discussions on Saturn. • In a few cases person not mentioned earlier in the literature on Saturn, still seems to have been involved in some way.
  41. 65 CONCLUSIONS • For a correct interpretation intrinsic historical expertise

    of time and persons remains a prerequisite. Een verkeerde conclusie is anders zo getrokken:
  42. 66 CONCLUSIONS • For a correct interpretation intrinsic historical expertise

    of time and persons remains a prerequisite. Otherwise a false conclusion is drawn very easy:
  43. 68 CONCLUSIONS - Opportunities Digital tools, such as the ePistolarium

    can: • provide an acceleration and enlargement of the possibilities for research in the humanities. • generate (in the future) new questions and answers. • offer a quick visual overview of relevant stakeholders and places (now already) • …..
  44. 69 CONCLUSIONS - Threats • The present corpus is too

    small for relevant results. • What is not digital available, is missed! • Expectations must not be too high (the question must fit the tool). • Digital maintenance and expansion of project financed tools have not been guaranteed satisfactory (at least in our case). • Standardisation of metadata is essential. • International cooperation is required. • …..
  45. 70 Future of Project ??? • Implementation of collaboratory for

    data enrichment and annotation and new user interfaces in a virtual research environment • Linking letters to other: documents such as: • Notes, working papers of scholars: • Grotius Information Master • Early periodicals to study the impact of the letter format on the development of the periodical • Adding more letters and metadata to create a critical mass necessary to test/falsify existing theories of the Republic of Letters and to develop new questions