Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales

Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales

Presentation of Aereo: a project made for the 2019 DX Lab Fellowship of the State Library of New South Wales.

More info at: https://dxlab.sl.nsw.gov.au/blog/building-aereo

7aff8f547184534da3ca2e14e63a68a8?s=128

Mauricio Giraldo

July 02, 2020
Tweet

Transcript

  1. Mauricio Giraldo Arteaga - July 2020 An experimental bird’s eye

    view of the digital collections from the State Library of NSW
  2. hello

  3. None
  4. Universidad de los Andes 2000

  5. None
  6. Carnegie Mellon University 2011

  7. None
  8. None
  9. None
  10. None
  11. None
  12. i was there — Photo: Paula Bray

  13. None
  14. None
  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. collection.sl.nsw.gov.au/digital

  25. about 2 million at the time of the fellowship about

    4 million images
  26. flickr.com/photos/prasenberg - Patrick Rasenberg

  27. None
  28. None
  29. None
  30. None
  31. bounded by the physical space serendipitous exploration

  32. how does this work digitally?

  33. None
  34. 554,030 results

  35. *not really possible due to technical limitations in pagination depth

    13,000+ “pages”*
  36. this is a common paradigm

  37. “pagination”

  38. None
  39. None
  40. None
  41. None
  42. 13,000+ pages

  43. you miss the forest for the trees

  44. None
  45. pages.gseis.ucla.edu/faculty/bates/berrypicking.html

  46. - Marcia J. Bates “the query is satisfied not by

    a single final retrieved set, but by a series of selections of individual references and bits of information at each stage of the ever-modifying search”
  47. research as a serendipitous process

  48. constrained by small set of results on view digital serendipity

    is limited
  49. None
  50. None
  51. None
  52. None
  53. see the forest and the trees

  54. None
  55. None
  56. “big picture” vs. detail

  57. *that i could research, design, and implement in eight weeks

    this is my attempt at addressing this*
  58. None
  59. “An experimental bird’s eye view of the digital collections from

    the State Library of NSW”
  60. but first some background

  61. None
  62. None
  63. dependent on text metadata

  64. None
  65. None
  66. None
  67. None
  68. comprehensive metadata is hard

  69. we could automate metadata creation

  70. we could automate metadata creation som e* *the one that

    computers are good at
  71. already being done at the Library

  72. sl.nsw.gov.au/blogs/tiger-using-artificial-intelligence-discover-our-collections

  73. collection.sl.nsw.gov.au

  74. None
  75. None
  76. computers get (a lot) wrong

  77. None
  78. None
  79. None
  80. *there are way more problematic tags (that reflect algorithm biases)

    take it with a grain of salt*
  81. Microsoft COCO: Common Objects in Context

  82. computer “learns” how to “see”

  83. None
  84. = “dog”

  85. an issue in a collection with centuries-old stuff cannot see…

    what it hasn’t seen before
  86. *non-W.E.I.R.D. stuff (Western, Educated, Industrialized, Rich, “Democratic”) cannot see… what

    its creator didn’t train* it to see
  87. useful for finding similar-looking things, not for the categories themselves

    but it is consistent
  88. None
  89. None
  90. with computers, avoiding closed or proprietary algorithms a way to

    group images together
  91. image-net.org

  92. ml4a.github.io/guides - Gene Kogan

  93. None
  94. None
  95. or, in math parlance, uniform manifold approximation and projection (umap)

    “similarity metadata”
  96. dhlab.yale.edu/projects/pixplot

  97. dhlab.yale.edu/projects/pixplot

  98. None
  99. None
  100. SFMoMA Artscope - Stamen (2014)

  101. bertspaan.nl/semia - Bert Spaan

  102. amnh-sciviz.github.io/image-collection - Brian Foo

  103. we can also use color as metadata

  104. None
  105. mkweb.bcgsc.ca/colorsummarizer

  106. None
  107. artsexperiments.withgoogle.com/artpalette

  108. collection.cooperhewitt.org

  109. publicdomain.nypl.org/pd-visualization - Brian Foo

  110. None
  111. “I propose enhancing the Manuscripts, Oral History and Pictures Catalogue

    with computer-generated metadata to create new pathways for patron exploration. By focusing on the 300,000+ digitised image set that is currently available…”
  112. “I propose enhancing the Manuscripts, Oral History and Pictures Catalogue

    with computer-generated metadata to create new pathways for patron exploration. By focusing on the 300,000+ digitised image set that is currently available…”
  113. but seemed technically achievable in a modern web browser larger

    than other sets that i had seen
  114. little did i know…

  115. the scale of the dataset was a big challenge in

    this project 2 million images…
  116. None
  117. emphasis in visual materials 1,199,477 things., photographs, negatives, published maps,

    drawings, objects, pictures, ephemera, journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  118. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  119. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  120. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  121. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  122. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  123. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  124. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  125. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  126. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  127. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  128. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  129. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  130. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  131. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  132. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  133. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  134. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  135. 1,199,477 manuscripts, photographs, negatives, published maps, drawings, objects, pictures, ephemera,

    journals, prints, medals, unpublished maps, paintings, coins, architectural drawings, newspapers, posters, and stamps.
  136. “unsorted”, year, colour, similarity

  137. “unsorted”, year, colour, similarity

  138. “unsorted”, year, colour, similarity

  139. four different ways of looking at the forest “unsorted”, year,

    colour, similarity
  140. None
  141. None
  142. None
  143. page 1 5 pages

  144. 8 pages page 1

  145. 27 pages page 1

  146. 688 pages page 1

  147. 27,749 pages

  148. None
  149. demo

  150. None
  151. None
  152. None
  153. None
  154. None
  155. None
  156. None
  157. None
  158. None
  159. None
  160. None
  161. None
  162. None
  163. None
  164. None
  165. None
  166. None
  167. None
  168. None
  169. None
  170. None
  171. serendipitous exploration

  172. None
  173. None
  174. None
  175. unbounded by physical space

  176. try this at home… memory limitations apply pushes the limits

    of web browsers
  177. more sorting criteria, custom groupings, filtering from library metadata future

    work?
  178. the process

  179. None
  180. The Process of Design Squiggle by Damien Newman, thedesignsquiggle.com

  181. ended up using 1.2 million… still a lot for any

    web browser, let alone a smartphone 2 million images! ⏳
  182. None
  183. None
  184. None
  185. None
  186. None
  187. None
  188. None
  189. single pixels can convey a lot of information… but i

    wanted everything these are all “just” pixels
  190. - Frustrated Fellow “but how am i going to show

    two million images‽”
  191. None
  192. None
  193. None
  194. None
  195. None
  196. technical details

  197. …with lots of data cleanup and cloud computing time Python

    + Node.js for image similarity and color palettes
  198. cloud-hosted static assets using Library’s API for image details Three.js

    + Vue.js for Aereo interface
  199. github.com/slnsw/dxlab-fellowship-2019

  200. dxlab.sl.nsw.gov.au/blog/building-aereo

  201. Special thanks to •DX Lab team: Paula Bray, Kaho Cheung,

    and Luke Dearnley •Web team: Jenna Bain and Robertus Johansyah •Mitchell Librarian Richard Neville •State Librarian Dr John Vallance •Staff from the State Library of New South Wales •The State Library of NSW Foundation
  202. Acknowledgments •Cyril Diagne •Douglas Duhaime - Yale DH Lab •Gene

    Kogan •Mario Klingemann •Ricardo Cabello - Three.js
  203. Mauricio Giraldo Arteaga @mgiraldo