Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deciphering Maya Hieroglyphic Writing using Scala

Deciphering Maya Hieroglyphic Writing using Scala

The field of Mesoamerican hieroglyphic writing systems has been the scenario of ground-breaking decipherments in recent decades. One of the key aspects of this success relies on the application of methodological approaches developed and tested in other writing systems in the world. The purpose of this presentation is to show how some of these methodologies, from Natural Language Processing and Hermeneumatics, can be successfully applied to the Maya hieroglyphic writing system thanks to the power and versatility of Scala. In particular, the research framework developed makes extensive use of the facilities that Scala provides to build external Domain Specific Languages (DSL). These external DSL can be utilized by epigraphers to provide accurate transcriptions and transliterations of hieroglyphic texts, allowing them to obtain relevant quantitative and qualitative results from the linguistic corpora.

Ignacio Cases

June 11, 2013
Tweet

Other Decks in Research

Transcript

  1. Introduction to Objective-C Michael Jurewitz Developer Tools Evangelist 2 SCALA

    DAYS 2013 Ignacio Cases Department of Anthropology University at Albany State University of New York Deciphering Maya Hieroglyphic Writing using Scala Alfonso Lacadena Universidad Complutense de Madrid
  2. A Crash Course Decipherment of Maya Writing 1950's Yurii Knorozov

    ▪ Made assumptions about the nature of the writing system by analysing statistical distribution of signs and comparing to other scripts ▪ Logograms ▪ Phonetic signs ▪ Used a XVI c. document by Diego de Landa, a Spanish franciscan that contained an "alphabet" and some examples ▪ Used pre-Columbian codices to provide some decipherments using a substitution method
  3. Deciphering Maya Hieroglyphic Writing using Scala In Site Use of

    Scala In site ▪ calculation and reconstruction of calendrics: Spire ▪ multi-ambient photography ▪ photomosaic composition
  4. wa-WA'-ni wa'-wan-i 'walked' ti-su-tz'i-li ti' suutz'il '...' 'a-pa-ka-la Aj Pakal

    'Aj Pakal' TAN-na Tahn 'Tahn' yi-[chi]NAL y-ichnal 'in the company of' IX-pa-ka-la Ix Pakal 'Ix Pakal' TUN-ni ? Tuun ? 'Stone ?' A10: A11: A12: A13: A14: A15: A16:
  5. Corpus Linguistics Objectives Solve or provide insights on essential aspects

    ▪ What is the percentage of signs deciphered? ▪ What are the most frequent signs? ▪ What are the less frequent signs? ▪ What is the proportion of logograms and syllables ▪ over time? ▪ inside a text? ▪ depends on genre? ▪ What are the typical collocation of signs (signs-in-context)? ▪ What are the inferred spelling rules? ▪ And many more!
  6. Corpus Linguistics Objectives Construction ▪ Corpus for Classic Mayan texts

    ▪ Tools to study the writing system and the language in a corpus–based approach ▪ Quantitatively ▪ e.g. statistic analysis ▪ Qualitatively ▪ e.g. concordances Draws on previous work by Lacadena and Cases (n.d.)
  7. Corpus Linguistics Objectives Disciplines ▪ Linguistics ▪ Philological and Historic

    Critical Methods ▪ Semiotics ▪ Artificial Intelligence ▪ Machine Learning ▪ Computational Linguistics / Natural Language Processing ▪ Hermeneumatics ▪ The process of developing the framework makes us thinking on the nature of the problem
  8. wa-WA'-ni wa'-wan-i 'walked' ti-su-tz'i-li ti' suutz'il '...' 'a-pa-ka-la Aj Pakal

    'Aj Pakal' TAN-na Tahn 'Tahn' yi-[chi]NAL y-ichnal 'in the company of' IX-pa-ka-la Ix Pakal 'Ix Pakal' TUN-ni ? Tuun ? 'Stone ?' A10: A11: A12: A13: A14: A15: A16:
  9. Corpus Construction Transcriptions ▪ Typically encode information that it is

    epigraphically important ▪ But not always (we still need epigraphic drawings) ▪ Need for a way to feed the system with data ▪ Easy to generate by non programmer epigraphers ▪ Easy to maintain ▪ Easy to export –non locking technologies ▪ Sufficiently annotated ▪ Sufficiently rigorous –own experience (own corpora, others corpora) ▪ Normalised
  10. Corpus Construction Transcriptions ▪ Transliterations can be obtained through a

    set of rules ▪ proposed by scholars ▪ learned using supervised learning (logistic regression and neural networks) Transcriptions + Spelling Rules = Transliterations wa-['i]ja WI'-na-li wa'iij wi'naal
  11. Source Transcription Transcription Header /*! @site: CML @mon: Urn 26

    @object: Spine @objectOther: 3 @facture: @collation: ic @colVersion: 0.1 @since: 20/6/2012 @notes: @references: Zender (xxxx) @biblio: @imageFile: @drawingAuthor: Marc U. Zender @textDisposition: column @textDimensions: 1x13 */
  12. Source Transcription Transcription Body A1: {13-#AJAW} // Tzolk'in Eroded A2:

    18-HUL-OHL-la A3: CHUM TUN-ni A4: 'u-17-WINIK-HAB' A5: wa-['i]ja [K'IN]TUN-ni A6: wa-['i]ja WI'-na-li A7: tu-13-TUN-ni // $1:TUN? $2:ni? [...]
  13. Source Transcription Transcriptions Two things needed ▪ A proper ontology

    for the signs ▪ Logogram, syllabogram, diacritics, etc ▪ Defines a Grapheme Object Notation ▪ Research problem per se ▪ A set of rules governing the combination and production of graphemic chains ▪ Important for the epigraphic field ▪ Defines the external DSL
  14. Source Transcription Grapheme Object Notation WINIK T683 Moon 20 reading

    value catalog entries tag Numeric value / morphograph
  15. Implementation Details Status Monad Status of knowledge of the reading

    value ▪ Status Monad ▪ Known ▪ Uncertain ▪ NotKnown
  16. Source Transcription Transcription as a Domain Specific Language ▪ Parser

    implemented using parsing combinators (Odersky, Spoon & Venners 2010) ▪ Abstract Syntax Tree (AST) ▪ Implemented using Algebraic Data Types (Odersky, Spoon & Venners 2010) ▪ Scalaz Tree (Kleisli, Huet Zipper)
  17. Source Transcription Production Rules document ::= {block} block ::= (blockIdentifier

    ":") graphemicChains graphemicChains ::= {graphemicChain, "/"} graphemicChain ::= {grapheme | compound, "-"} compound ::= ("[" grapheme "]" grapheme) | (grapheme "[" grapheme "]") grapheme ::= mora | morphogram morphogram ::= ["'"] morphographicString mora ::= ["'"] moraString morphographicString ⟶ [A-Z0-9']+ moraString ⟶ [a-z']+ blockIdentifier ⟶ [p?A-Z0-9]+
  18. Source Transcription Parsing /*! @site: CML @mon: Urn 26 @object:

    Spine @objectOther: 3 @facture: @collation: ic @colVersion: 0.1 @since: 20/6/2012 @notes: @references: Zender (xxxx) @biblio: @imageFile: @drawingAuthor: Marc U. Zender @textDisposition: column @textDimensions: 1x13 */ A1: {13-#AJAW} // Tzolk'in Eroded A2: 18-HUL-OHL-la A3: CHUM / TUN-ni A4: 'u-17-WINIK-HAB' A5: wa-['i]ja / [K'IN]TUN-ni A6: wa-['i]ja / WI'-na-li A7: tu-13-TUN-ni // $1:TUN? $2:ni? [...] Parser SyntaxTree
  19. Corpus Linguistics Objectives Redux Solve or provide insights on essential

    aspects ▪ What is the percentage of signs deciphered? ▪ What are the most frequent signs? ▪ What are the less frequent signs? ▪ What is the proportion of logograms and syllables ▪ over time? ▪ inside a text? ▪ depends on genre? ▪ What are the typical collocation of signs (signs-in-context)? ▪ What are the inferred spelling rules? ▪ And many more!
  20. Quantitative Analysis Sign Frequency CML - PAL 96G 0 8

    15 23 30 la AJAW na li ka K'INICH mu TUN ta b'a 13 u ku TZ'AK 5 tzi TZUTZ 10 hi 17 chi MO' ne ki TE' 18 CHAK 9 ISIG WAH HUL WI' MUWAN b'o yu 15 JU'N nu yo B'IH HA' WAXAK JAL ICH'AK MAHK
  21. Quantitative Analysis Sign Frequency CML - PAL 96G 0 8

    15 23 30 la u ya AJAW wa ni na ja ti li WINIKHAB' ji ka CHUM le K'INICH i B'AK mu WINIK HAB' TUN K'UH AJ ta a 7 b'a NAH pa 13 K'AN
  22. a e i o u ' b' ch ch' h

    j k k' l m n p p' s t t' tz tz' w x y Quantitative Analysis Sign Frequency Western Maya Lowlands
  23. Quantitative Analysis Graphemic Distribution 0 8 15 23 30 0

    3 6 9 12 15 18 21 24 27 30 33 36 39 0 0,25 0,50 0,75 1,00 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Gráfica 10 glyph number glyph number CML Urn 26 Spine 2 Corpus Analysis Morphographicity 0 8 15 23 30 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Logogram Phonogram 0 0,25 0,50 0,75 1,00 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Gráfica 10 Logogram Relative Phonogram Relative glyph number glyph number CML Urn 26 Spine 2
  24. Quantitative Analysis Graphemic Distribution glyph number glyph number 0 8

    15 23 30 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 0 0,25 0,50 0,75 1,00 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 Gráfica 10 CML Urn 26 Spine 3 Corpus Analysis Morphographicity 0 8 15 23 30 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Logogram Phonogram 0 0,25 0,50 0,75 1,00 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Gráfica 10 Logogram Relative Phonogram Relative glyph number glyph number CML Urn 26 Spine 2
  25. Quantitative Analysis Graphemic Distribution 0 8 15 23 30 1

    4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 Logogram Phonogram 0 0,25 0,50 0,75 1,00 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 Gráfica 10 CML Urn 26 Spine 4 Corpus Analysis Morphographicity 0 8 15 23 30 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Logogram Phonogram 0 0,25 0,50 0,75 1,00 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Gráfica 10 Logogram Relative Phonogram Relative glyph number glyph number CML Urn 26 Spine 2
  26. Quantitative Analysis Graphemic Distribution Corpus Analysis Morphographicity 0 8 15

    23 30 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Logogram Phonogram 0 0,25 0,50 0,75 1,00 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Gráfica 10 Logogram Relative Phonogram Relative glyph number glyph number CML Urn 26 Spine 2 0 100 200 300 400 0 100 200 300 400 500 600 0 0,25 0,50 0,75 1,00 Gráfica 10 PAL TC Main
  27. Quantitative Analysis Graphemic Distribution PAL TFC Main Corpus Analysis Morphographicity

    0 8 15 23 30 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Logogram Phonogram 0 0,25 0,50 0,75 1,00 0 3 6 9 12 15 18 21 24 27 30 33 36 39 Gráfica 10 Logogram Relative Phonogram Relative glyph number glyph number CML Urn 26 Spine 2 0 75 150 225 300 1 76 151 226 301 376 0 0,25 0,50 0,75 1,00 1 76 151 226 301 376 Gráfica 10
  28. Quantitative Analysis Frequency vs Rank in CML and PAL 0

    29 58 86 115 1 20 39 58 77 96 115 134 153 172 191 210 Frequency vs Rank in CML Urn 26 and PAL (96G + TC + TFC)
  29. Overview Zipf coefficients Zipf's Law λ ~ 0.98, 1.026, 1.218

    a ~ 127, 183.95, 816.38 Miller (1965) randomly generated texts also exihibits Zipf's law Li (1992) Zipf's law is related to the particular representation Ferrer-i-Cancho & Solé (2001)
  30. Requirements Concordancer Concordance for WAY Unkown provenience Partial count #2

    -la-ja ta-AJAW-le G3: B'AK-le WAY -la H3: AJ-pi-tzi-la-wa-la G4: J1: TA-AJAW-le I2: B'AK-le wa- WAY -la J2: AJ-pi-tzi-la-wa-la I3: PAL – TempleoftheCross, ShrineFaçade Partial count #1 CH-KAN-B'AHLAM-ma M3: B'AK-le- WAY -wa M4: AJ-pi-tzi-la-OHL M5: ' PAL – Temple of the Cross, Alfardas Partial count #1 K'INICH-KAN-B'AHLAM J1: B'AK- WAY -la I2: 'u-MIHIN-li J2: K'INIC PAL – Temple of the Foliated Cross, Main Partial count #3 a D7: 2-PIK C8: 2-'AJAW D8: 3- WAY -HAB' C9: 'u-TZAK D9: K'UH C10 L-HUN G3: tu-'u-B'AH H3: B'AK- WAY -wa-la G4: K'INICH-[KAN-B'AHLA H10: K'INICH-?-?-wa G11: 'IK'- WAY -CHAHK H11: AKAN-YAX-ja G12: I
  31. Deciphering Maya Hieroglyphic Writing using Scala Credits Nacho en la

    noche courtesy of Clemence Lallemand. Izaña Observatory (IAC) photograph by Jonay González Hernández. Writing Systems in Mesoamerica courtesy of Alfonso Lacadena. Monkeys and scorpion photographs by Julio Cotom. Tarantula and deck photographs by Divina Perla. 96G tablet photograph from mesoweb.org. Temple of the Inscriptions (Palenque) photograph from Wikipedia (http://en.wikipedia.org/Palenque). K6751 photograph courtesy of Justin Kerr (http://mayavase.com). Dresden Codex from a facsimile of the Förstemann Edition. Knorozov figures courtesy of Harri Kettunen and Cristophe Helmke. Comalcalco Urn 26 photographs and drawings courtesy of Marc Zender and Arqlgo. Ricardo Armijo. Palenque's epigraphic drawings courtesy of David Stuart, Linda Schele, and Merle Green Robertson. How to dial telephones from "Training film for users of the new dial telephones", public domain film from the Library of Congress Prelinger Archive, edited by Jeff Quitney. Switchboard caption from AT&T Archives (http://techchannel.att.com). Logos from their respective owners (typesafe.com; cappuccino-project.org; lesscss.org; www.w3.org/html/logo; www.mongodb.org).
  32. Deciphering Maya Hieroglyphic Writing using Scala Credits Unless unintended omission,

    all other photographs and illustrations by Ignacio Cases. The author wishes to thank Proyecto Arqueológico Comalcalco (Ricardo Armijo), Proyecto Arqueológico Calakmul (Ramón Carrasco), Proyecto Arqueológico Río Bec ( Philippe Nondedeo, Dominique Michelet), and Proyecto Arqueológico Petén Norte Naachtún (Carlos Morales-Aguilar, Philippe Nondedeo, Dominique Michelet).