Computer-assisted approaches in the humanities

Computer-Assisted Approaches in the Humanities Reconciling Computational and Classical Research
Johann-Mattis List and Simon Greenhill Matchmaking Workshop, December 6/7, 2017, Berlin Max Planck Institute for the Science of Human History

What do biologists do, if they want to make an
analysis involving data?

• They download the data from some server...

• They download the data from some server... • they
rearrange the data according to their needs...

rearrange the data according to their needs... • using simple command line tools...

rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software...

rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software... • they reﬁne the data manually, if needed...

rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software... • they reﬁne the data manually, if needed... • they write a paper...

rearrange the data according to their needs... • using simple command line tools... • they run an analysis with some software... • they reﬁne the data manually, if needed... • they write a paper... • and publish it, along with their derived dataset.

What do linguists do, if they want to make an

• They hire a programmer to help them build a
database...

database... • the programmer creates an online interface, so they can easily insert the data...

database... • the programmer creates an online interface, so they can easily insert the data... • but they end up using Excel, since the interface is too complicated to be used...

database... • the programmer creates an online interface, so they can easily insert the data... • but they end up using Excel, since the interface is too complicated to be used... • and they know how to use Excel anyway (more or less)...

database... • the programmer creates an online interface, so they can easily insert the data... • but they end up using Excel, since the interface is too complicated to be used... • and they know how to use Excel anyway (more or less)... • the programmer writes an upload routine to import data from Excel...

• they promise to their colleagues that they will publish
the database soon...

the database soon... • but the programmer has left the project in order to join Google or Facebook...

the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server...

the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server... • many colleagues know the URL and use the data occasionally...

the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server... • many colleagues know the URL and use the data occasionally... • but they cannot use it ofﬁcially, since they don’t know how to quote it...

the database soon... • but the programmer has left the project in order to join Google or Facebook... • the demo version still runs on an old server... • many colleagues know the URL and use the data occasionally... • but they cannot use it ofﬁcially, since they don’t know how to quote it... • the linguists decide to apply for more funding to ﬁnish the database.

What do philologists do, if they want to make an

• They hire a programmer to build a database... the
programmer creates an online interface, so they can easily insert the data... but they end up using Excel, since the interface is too complicated to be used... and they know how to use Excel anyway (more or less)... the programmer writes an upload routine to import data from Excel...

Problems of Computational Approaches in the Humanities We have

Problems of Computational Approaches in the Humanities We have •
a strong divide between computational and classical experts...

a strong divide between computational and classical experts... • who often mistrust each other...

a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases...

a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases... • or as people who abuse their data for shiny but senseless publications...

a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases... • or as people who abuse their data for shiny but senseless publications... • and computational scientists seeing classical scientists as stubborn relics from the last century...

a strong divide between computational and classical experts... • who often mistrust each other... • with classical scientists seeing computational scientists as servants to create their databases... • or as people who abuse their data for shiny but senseless publications... • and computational scientists seeing classical scientists as stubborn relics from the last century... • who don’t understand the usefulness of computational methods.

Problems of Computational Approaches in the Humanities Computational scientists often
lack understanding

lack understanding • for the speciﬁcs of the problems in the humanities...

lack understanding • for the speciﬁcs of the problems in the humanities... • where scholars have been working for centuries on their individual problems...

lack understanding • for the speciﬁcs of the problems in the humanities... • where scholars have been working for centuries on their individual problems... • and often have gained great insights into those problems...

lack understanding • for the speciﬁcs of the problems in the humanities... • where scholars have been working for centuries on their individual problems... • and often have gained great insights into those problems... • and rightfully demand that computational scientists respect the nature of their problems...

lack understanding • for the speciﬁcs of the problems in the humanities... • where scholars have been working for centuries on their individual problems... • and often have gained great insights into those problems... • and rightfully demand that computational scientists respect the nature of their problems... • and take classical approaches seriously.

Problems of Computational Approaches in the Humanities Classical scientists often
do not understand

do not understand • that computational approaches do not threaten their jobs...

do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far...

do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far... • but instead could offer the chance to gain new insights...

do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far... • but instead could offer the chance to gain new insights... • or to speed up the tedious process of qualitative analysis...

do not understand • that computational approaches do not threaten their jobs... • or question the work they have done so far... • but instead could offer the chance to gain new insights... • or to speed up the tedious process of qualitative analysis... • by providing practical help in tasks which even classical scientists will consider as repetitive and boring.

General Misunderstandings Between the Two Camps Taken from: https://xkcd.com/1831/ Thanks
to Matthew Scarborough for sharing.

General Misunderstandings What computational scientists misunderstand or ignore: • problems
in the humanities can be extremely hard

in the humanities can be extremely hard • big data approaches do not necessarily work on small data

in the humanities can be extremely hard • big data approaches do not necessarily work on small data • black box approaches are satisfying for industry applications but not for scientiﬁc endeavour

in the humanities can be extremely hard • big data approaches do not necessarily work on small data • black box approaches are satisfying for industry applications but not for scientiﬁc endeavour • being a mathematician does not qualify one automatically to solve problems in the humanities

General Misunderstandings What classical scientists misunderstand or ignore: • computational
approaches do not exclude qualitative approaches

approaches do not exclude qualitative approaches • computational solutions can increase the consistency of “manual” data inspection

approaches do not exclude qualitative approaches • computational solutions can increase the consistency of “manual” data inspection • computational approaches may provide a fresh perspective on long-standing problems in the humanities

approaches do not exclude qualitative approaches • computational solutions can increase the consistency of “manual” data inspection • computational approaches may provide a fresh perspective on long-standing problems in the humanities • there is no reason to be proud if one doesn’t understand basic mathematics

But how can we integrate the two camps?

. . . . By shifting the paradigm!

Instead of computer-based vs. classical computer-less approaches, we need a
paradigm of computer-assisted approaches as they are already common in biology and other disciplines.

Computer-Assisted Approaches in the Humanities Computer-Assisted Language Comparison (List 2017-2022,
ERC STG)

Main Features of CAAH • data must be human- and
machine-readable, scholars should never loose the contact to the original data (compare the talk by Robert Forkel)

machine-readable, scholars should never loose the contact to the original data (compare the talk by Robert Forkel) • interfaces must be lightweight and not disguise the nature of the real data

machine-readable, scholars should never loose the contact to the original data (compare the talk by Robert Forkel) • interfaces must be lightweight and not disguise the nature of the real data • software must be adapted to the speciﬁc needs of research in the humanities and produce transparent results that can be manually inspected and corrected by the human researchers

Examples for CAAH: CALC-Project (List 2017-2022) EDICTOR: Etymological Dictionary Editor
(List 2017)

Examples for CAAH: CALC-Project (List 2017-2022) Data underlying the EDICTOR

Examples for CAAH: CALC-Project (List 2017-2022) LingPy software for sequence
comparison in linguistics (List et al. 2017)

Examples for CAAH: CALC-Project (List 2017-2022) Cookbook: Recipies for LingPy
(List 2016)

What We Can Learn From Biologists

What We Can Learn From Biologists • foster training in
computational basics (command line, shell programming, interfaces, data handling)

computational basics (command line, shell programming, interfaces, data handling) • propagate standard formats for data

computational basics (command line, shell programming, interfaces, data handling) • propagate standard formats for data • offer solutions for collaborative data storage accompanying publications

computational basics (command line, shell programming, interfaces, data handling) • propagate standard formats for data • offer solutions for collaborative data storage accompanying publications • foster interdisciplinary teams in which classical and computational scientists collaborate

Collaborative Potential for the MM Workshop

Collaborative Potential for the MM Workshop • rethink the big
data vs. small data problem and the challenge for computer science

data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks)

data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workﬂows for similar tasks (e.g., digitization)

data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workﬂows for similar tasks (e.g., digitization) • improve computational training of scholars (with a focus on basic tasks like command line)

data vs. small data problem and the challenge for computer science • foster a smart application of machine learning tools (no proof of concept but actual help for data-pre-processing in computer-assisted frameworks) • discuss generalizability of workflows for similar tasks (e.g., digitization) • improve computational training of scholars (with a focus on basic tasks like command line) • promote new scientific profiles (scholars who can bridge the gap and have training in classical and computational approaches)

http://calc.digling.org Danke für Ihre Aufmerksamkeit!

Computer-assisted approaches in the humanities

Computer-assisted approaches in the humanities

More Decks by Johann-Mattis List

Other Decks in Science

Featured

Transcript