Computer-assisted approaches in the humanities

Computer-assisted approaches in the humanities

Talk held at the workshop "Research questions in the humanities as challenges to computer science" (Max Planck Institute for the History of Science, Berlin, 2017/12/06-07).

E01961dd2fbd219a30044ffe27c9fb70?s=128

Johann-Mattis List

December 06, 2017
Tweet

Transcript

  1. 1.

    Computer-Assisted Approaches in the Humanities Reconciling Computational and Classical Research

    Johann-Mattis List and Simon Greenhill Matchmaking Workshop, December 6/7, 2017, Berlin Max Planck Institute for the Science of Human History
  2. 4.

    • They download the data from some server... • they

    rearrange the data according to their needs...
  3. 5.

    • They download the data from some server... • they

    rearrange the data according to their needs... • using simple command line tools...
  4. 6.

    • They download the data from some server... • they

    rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software...
  5. 7.

    • They download the data from some server... • they

    rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software... • they refine the data manually, if needed...
  6. 8.

    • They download the data from some server... • they

    rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software... • they refine the data manually, if needed... • they write a paper...
  7. 9.

    • They download the data from some server... • they

    rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software... • they refine the data manually, if needed... • they write a paper... • and publish it, along with their derived dataset.
  8. 10.
  9. 12.

    • They hire a programmer to help them build a

    database... • the programmer creates an online interface, so they can easily insert the data...
  10. 13.

    • They hire a programmer to help them build a

    database... • the programmer creates an online interface, so they can easily insert the data... • but they end up using Excel, since the interface is too complicated to be used...
  11. 14.

    • They hire a programmer to help them build a

    database... • the programmer creates an online interface, so they can easily insert the data... • but they end up using Excel, since the interface is too complicated to be used... • and they know how to use Excel anyway (more or less)...
  12. 15.

    • They hire a programmer to help them build a

    database... • the programmer creates an online interface, so they can easily insert the data... • but they end up using Excel, since the interface is too complicated to be used... • and they know how to use Excel anyway (more or less)... • the programmer writes an upload routine to import data from Excel...
  13. 17.

    • they promise to their colleagues that they will publish

    the database soon... • but the programmer has left the project in order to join Google or Facebook...
  14. 18.

    • they promise to their colleagues that they will publish

    the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server...
  15. 19.

    • they promise to their colleagues that they will publish

    the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server... • many colleagues know the URL and use the data occasionally...
  16. 20.

    • they promise to their colleagues that they will publish

    the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server... • many colleagues know the URL and use the data occasionally... • but they cannot use it officially, since they don’t know how to quote it...
  17. 21.

    • they promise to their colleagues that they will publish

    the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server... • many colleagues know the URL and use the data occasionally... • but they cannot use it officially, since they don’t know how to quote it... • the linguists decide to apply for more funding to finish the database.
  18. 23.

    • They hire a programmer to build a database... the

    programmer creates an online interface, so they can easily insert the data... but they end up using Excel, since the interface is too complicated to be used... and they know how to use Excel anyway (more or less)... the programmer writes an upload routine to import data from Excel...
  19. 24.
  20. 26.

    Problems of Computational Approaches in the Humanities We have •

    a strong divide between computational and classical experts...
  21. 27.

    Problems of Computational Approaches in the Humanities We have •

    a strong divide between computational and classical experts... • who often mistrust each other...
  22. 28.

    Problems of Computational Approaches in the Humanities We have •

    a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases...
  23. 29.

    Problems of Computational Approaches in the Humanities We have •

    a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases... • or as people who abuse their data for shiny but senseless publications...
  24. 30.

    Problems of Computational Approaches in the Humanities We have •

    a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases... • or as people who abuse their data for shiny but senseless publications... • and computational scientists seeing classical scientists as stubborn relics from the last century...
  25. 31.

    Problems of Computational Approaches in the Humanities We have •

    a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases... • or as people who abuse their data for shiny but senseless publications... • and computational scientists seeing classical scientists as stubborn relics from the last century... • who don’t understand the usefulness of computational methods.
  26. 33.

    Problems of Computational Approaches in the Humanities Computational scientists often

    lack understanding • for the specifics of the problems in the humanities...
  27. 34.

    Problems of Computational Approaches in the Humanities Computational scientists often

    lack understanding • for the specifics of the problems in the humanities... • where scholars have been working for centuries on their individual problems...
  28. 35.

    Problems of Computational Approaches in the Humanities Computational scientists often

    lack understanding • for the specifics of the problems in the humanities... • where scholars have been working for centuries on their individual problems... • and often have gained great insights into those problems...
  29. 36.

    Problems of Computational Approaches in the Humanities Computational scientists often

    lack understanding • for the specifics of the problems in the humanities... • where scholars have been working for centuries on their individual problems... • and often have gained great insights into those problems... • and rightfully demand that computational scientists respect the nature of their problems...
  30. 37.

    Problems of Computational Approaches in the Humanities Computational scientists often

    lack understanding • for the specifics of the problems in the humanities... • where scholars have been working for centuries on their individual problems... • and often have gained great insights into those problems... • and rightfully demand that computational scientists respect the nature of their problems... • and take classical approaches seriously.
  31. 39.

    Problems of Computational Approaches in the Humanities Classical scientists often

    do not understand • that computational approaches do not threaten their jobs...
  32. 40.

    Problems of Computational Approaches in the Humanities Classical scientists often

    do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far...
  33. 41.

    Problems of Computational Approaches in the Humanities Classical scientists often

    do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far... • but instead could offer the chance to gain new insights...
  34. 42.

    Problems of Computational Approaches in the Humanities Classical scientists often

    do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far... • but instead could offer the chance to gain new insights... • or to speed up the tedious process of qualitative analysis...
  35. 43.

    Problems of Computational Approaches in the Humanities Classical scientists often

    do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far... • but instead could offer the chance to gain new insights... • or to speed up the tedious process of qualitative analysis... • by providing practical help in tasks which even classical scientists will consider as repetitive and boring.
  36. 46.

    General Misunderstandings What computational scientists misunderstand or ignore: • problems

    in the humanities can be extremely hard • big data approaches do not necessarily work on small data
  37. 47.

    General Misunderstandings What computational scientists misunderstand or ignore: • problems

    in the humanities can be extremely hard • big data approaches do not necessarily work on small data • black box approaches are satisfying for industry applications but not for scientific endeavour
  38. 48.

    General Misunderstandings What computational scientists misunderstand or ignore: • problems

    in the humanities can be extremely hard • big data approaches do not necessarily work on small data • black box approaches are satisfying for industry applications but not for scientific endeavour • being a mathematician does not qualify one automatically to solve problems in the humanities
  39. 50.

    General Misunderstandings What classical scientists misunderstand or ignore: • computational

    approaches do not exclude qualitative approaches • computational solutions can increase the consistency of “manual” data inspection
  40. 51.

    General Misunderstandings What classical scientists misunderstand or ignore: • computational

    approaches do not exclude qualitative approaches • computational solutions can increase the consistency of “manual” data inspection • computational approaches may provide a fresh perspective on long-standing problems in the humanities
  41. 52.

    General Misunderstandings What classical scientists misunderstand or ignore: • computational

    approaches do not exclude qualitative approaches • computational solutions can increase the consistency of “manual” data inspection • computational approaches may provide a fresh perspective on long-standing problems in the humanities • there is no reason to be proud if one doesn’t understand basic mathematics
  42. 55.

    Instead of computer-based vs. classical computer-less approaches, we need a

    paradigm of computer-assisted approaches as they are already common in biology and other disciplines.
  43. 58.

    Main Features of CAAH • data must be human- and

    machine-readable, scholars should never loose the contact to the original data (compare the talk by Robert Forkel)
  44. 59.

    Main Features of CAAH • data must be human- and

    machine-readable, scholars should never loose the contact to the original data (compare the talk by Robert Forkel) • interfaces must be lightweight and not disguise the nature of the real data
  45. 60.

    Main Features of CAAH • data must be human- and

    machine-readable, scholars should never loose the contact to the original data (compare the talk by Robert Forkel) • interfaces must be lightweight and not disguise the nature of the real data • software must be adapted to the specific needs of research in the humanities and produce transparent results that can be manually inspected and corrected by the human researchers
  46. 66.

    What We Can Learn From Biologists • foster training in

    computational basics (command line, shell programming, interfaces, data handling)
  47. 67.

    What We Can Learn From Biologists • foster training in

    computational basics (command line, shell programming, interfaces, data handling) • propagate standard formats for data
  48. 68.

    What We Can Learn From Biologists • foster training in

    computational basics (command line, shell programming, interfaces, data handling) • propagate standard formats for data • offer solutions for collaborative data storage accompanying publications
  49. 69.

    What We Can Learn From Biologists • foster training in

    computational basics (command line, shell programming, interfaces, data handling) • propagate standard formats for data • offer solutions for collaborative data storage accompanying publications • foster interdisciplinary teams in which classical and computational scientists collaborate
  50. 71.

    Collaborative Potential for the MM Workshop • rethink the big

    data vs. small data problem and the challenge for computer science
  51. 72.

    Collaborative Potential for the MM Workshop • rethink the big

    data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks)
  52. 73.

    Collaborative Potential for the MM Workshop • rethink the big

    data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workflows for similar tasks (e.g., digitization)
  53. 74.

    Collaborative Potential for the MM Workshop • rethink the big

    data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workflows for similar tasks (e.g., digitization) • improve computational training of scholars (with a focus on basic tasks like command line)
  54. 75.

    Collaborative Potential for the MM Workshop • rethink the big

    data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workflows for similar tasks (e.g., digitization) • improve computational training of scholars (with a focus on basic tasks like command line) • promote new scientific profiles (scholars who can bridge the gap and have training in classical and computational approaches)
  55. 76.

    Collaborative Potential for the MM Workshop • rethink the big

    data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workflows for similar tasks (e.g., digitization) • improve computational training of scholars (with a focus on basic tasks like command line) • promote new scientific profiles (scholars who can bridge the gap and have training in classical and computational approaches)