Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales

Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales

Presentation of Aereo: a project made for the 2019 DX Lab Fellowship of the State Library of New South Wales.

More info at: https://dxlab.sl.nsw.gov.au/blog/building-aereo

Mauricio Giraldo

July 02, 2020
Tweet

More Decks by Mauricio Giraldo

Other Decks in Research

Transcript

  1. Mauricio Giraldo Arteaga - July 2020
    An experimental bird’s eye view of the
    digital collections from the State Library of NSW

    View Slide

  2. hello

    View Slide

  3. View Slide

  4. Universidad de los Andes 2000

    View Slide

  5. View Slide

  6. Carnegie Mellon University 2011

    View Slide

  7. View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. i was there — Photo: Paula Bray

    View Slide

  13. View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. collection.sl.nsw.gov.au/digital

    View Slide

  25. about 2 million at the time of the fellowship
    about 4 million images

    View Slide

  26. flickr.com/photos/prasenberg - Patrick Rasenberg

    View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. bounded by the physical space
    serendipitous exploration

    View Slide

  32. how does this work digitally?

    View Slide

  33. View Slide

  34. 554,030 results

    View Slide

  35. *not really possible due to technical limitations in pagination depth
    13,000+ “pages”*

    View Slide

  36. this is a common paradigm

    View Slide

  37. “pagination”

    View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. 13,000+ pages

    View Slide

  43. you miss the forest for the trees

    View Slide

  44. View Slide

  45. pages.gseis.ucla.edu/faculty/bates/berrypicking.html

    View Slide

  46. - Marcia J. Bates
    “the query is satisfied not by a single final
    retrieved set, but by a series of selections of
    individual references and bits of information at
    each stage of the ever-modifying search”

    View Slide

  47. research as a serendipitous process

    View Slide

  48. constrained by small set of results on view
    digital serendipity is limited

    View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. see the forest and the trees

    View Slide

  54. View Slide

  55. View Slide

  56. “big picture” vs. detail

    View Slide

  57. *that i could research, design, and implement in eight weeks
    this is my attempt at addressing this*

    View Slide

  58. View Slide

  59. “An experimental bird’s eye view of the
    digital collections from the State
    Library of NSW”

    View Slide

  60. but first some background

    View Slide

  61. View Slide

  62. View Slide

  63. dependent on text metadata

    View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. comprehensive metadata is hard

    View Slide

  69. we could automate metadata creation

    View Slide

  70. we could automate metadata creation
    som
    e*
    *the one that computers are good at

    View Slide

  71. already being done at the Library

    View Slide

  72. sl.nsw.gov.au/blogs/tiger-using-artificial-intelligence-discover-our-collections

    View Slide

  73. collection.sl.nsw.gov.au

    View Slide

  74. View Slide

  75. View Slide

  76. computers get (a lot) wrong

    View Slide

  77. View Slide

  78. View Slide

  79. View Slide

  80. *there are way more problematic tags (that reflect algorithm biases)
    take it with a grain of salt*

    View Slide

  81. Microsoft COCO: Common Objects in Context

    View Slide

  82. computer “learns” how to “see”

    View Slide

  83. View Slide

  84. = “dog”

    View Slide

  85. an issue in a collection with centuries-old stuff
    cannot see…
    what it hasn’t seen before

    View Slide

  86. *non-W.E.I.R.D. stuff (Western, Educated, Industrialized, Rich, “Democratic”)
    cannot see…
    what its creator didn’t train* it to see

    View Slide

  87. useful for finding similar-looking things, not for the categories themselves
    but it is consistent

    View Slide

  88. View Slide

  89. View Slide

  90. with computers, avoiding closed or proprietary algorithms
    a way to group images together

    View Slide

  91. image-net.org

    View Slide

  92. ml4a.github.io/guides - Gene Kogan

    View Slide

  93. View Slide

  94. View Slide

  95. or, in math parlance, uniform manifold approximation and projection (umap)
    “similarity metadata”

    View Slide

  96. dhlab.yale.edu/projects/pixplot

    View Slide

  97. dhlab.yale.edu/projects/pixplot

    View Slide

  98. View Slide

  99. View Slide

  100. SFMoMA Artscope - Stamen (2014)

    View Slide

  101. bertspaan.nl/semia - Bert Spaan

    View Slide

  102. amnh-sciviz.github.io/image-collection - Brian Foo

    View Slide

  103. we can also use color as metadata

    View Slide

  104. View Slide

  105. mkweb.bcgsc.ca/colorsummarizer

    View Slide

  106. View Slide

  107. artsexperiments.withgoogle.com/artpalette

    View Slide

  108. collection.cooperhewitt.org

    View Slide

  109. publicdomain.nypl.org/pd-visualization - Brian Foo

    View Slide

  110. View Slide

  111. “I propose enhancing the Manuscripts, Oral History and
    Pictures Catalogue with computer-generated metadata to
    create new pathways for patron exploration. By focusing on
    the 300,000+ digitised image set that is currently
    available…”

    View Slide

  112. “I propose enhancing the Manuscripts, Oral History and
    Pictures Catalogue with computer-generated metadata to
    create new pathways for patron exploration. By focusing on
    the 300,000+ digitised image set that is currently
    available…”

    View Slide

  113. but seemed technically achievable in a modern web browser
    larger than other sets that i had seen

    View Slide

  114. little did i know…

    View Slide

  115. the scale of the dataset was a big challenge in this project
    2 million images…

    View Slide

  116. View Slide

  117. emphasis in visual materials
    1,199,477 things., photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  118. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  119. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  120. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  121. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  122. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  123. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  124. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  125. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  126. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  127. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  128. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  129. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  130. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  131. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  132. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  133. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  134. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  135. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View Slide

  136. “unsorted”, year, colour, similarity

    View Slide

  137. “unsorted”, year, colour, similarity

    View Slide

  138. “unsorted”, year, colour, similarity

    View Slide

  139. four different ways of looking at the forest
    “unsorted”, year, colour, similarity

    View Slide

  140. View Slide

  141. View Slide

  142. View Slide

  143. page 1
    5 pages

    View Slide

  144. 8 pages
    page 1

    View Slide

  145. 27 pages
    page 1

    View Slide

  146. 688 pages
    page 1

    View Slide

  147. 27,749 pages

    View Slide

  148. View Slide

  149. demo

    View Slide

  150. View Slide

  151. View Slide

  152. View Slide

  153. View Slide

  154. View Slide

  155. View Slide

  156. View Slide

  157. View Slide

  158. View Slide

  159. View Slide

  160. View Slide

  161. View Slide

  162. View Slide

  163. View Slide

  164. View Slide

  165. View Slide

  166. View Slide

  167. View Slide

  168. View Slide

  169. View Slide

  170. View Slide

  171. serendipitous exploration

    View Slide

  172. View Slide

  173. View Slide

  174. View Slide

  175. unbounded by physical space

    View Slide

  176. try this at home… memory limitations apply
    pushes the limits of web browsers

    View Slide

  177. more sorting criteria, custom groupings, filtering from library metadata
    future work?

    View Slide

  178. the process

    View Slide

  179. View Slide

  180. The Process of Design Squiggle by Damien Newman, thedesignsquiggle.com

    View Slide

  181. ended up using 1.2 million… still a lot for any web browser, let alone a smartphone
    2 million images! ⏳

    View Slide

  182. View Slide

  183. View Slide

  184. View Slide

  185. View Slide

  186. View Slide

  187. View Slide

  188. View Slide

  189. single pixels can convey a lot of information… but i wanted everything
    these are all “just” pixels

    View Slide

  190. - Frustrated Fellow
    “but how am i going to show
    two million images‽”

    View Slide

  191. View Slide

  192. View Slide

  193. View Slide

  194. View Slide

  195. View Slide

  196. technical details

    View Slide

  197. …with lots of data cleanup and cloud computing time
    Python + Node.js
    for image similarity and color palettes

    View Slide

  198. cloud-hosted static assets using Library’s API for image details
    Three.js + Vue.js
    for Aereo interface

    View Slide

  199. github.com/slnsw/dxlab-fellowship-2019

    View Slide

  200. dxlab.sl.nsw.gov.au/blog/building-aereo

    View Slide

  201. Special thanks to
    •DX Lab team: Paula Bray, Kaho Cheung, and Luke Dearnley
    •Web team: Jenna Bain and Robertus Johansyah
    •Mitchell Librarian Richard Neville
    •State Librarian Dr John Vallance
    •Staff from the State Library of New South Wales
    •The State Library of NSW Foundation

    View Slide

  202. Acknowledgments
    •Cyril Diagne
    •Douglas Duhaime - Yale DH Lab
    •Gene Kogan
    •Mario Klingemann
    •Ricardo Cabello - Three.js

    View Slide

  203. Mauricio Giraldo Arteaga
    @mgiraldo

    View Slide