Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales

Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales

Presentation of Aereo: a project made for the 2019 DX Lab Fellowship of the State Library of New South Wales.

More info at: https://dxlab.sl.nsw.gov.au/blog/building-aereo

Mauricio Giraldo

July 02, 2020
Tweet

More Decks by Mauricio Giraldo

Other Decks in Research

Transcript

  1. Mauricio Giraldo Arteaga - July 2020
    An experimental bird’s eye view of the
    digital collections from the State Library of NSW

    View full-size slide

  2. Universidad de los Andes 2000

    View full-size slide

  3. Carnegie Mellon University 2011

    View full-size slide

  4. i was there — Photo: Paula Bray

    View full-size slide

  5. collection.sl.nsw.gov.au/digital

    View full-size slide

  6. about 2 million at the time of the fellowship
    about 4 million images

    View full-size slide

  7. flickr.com/photos/prasenberg - Patrick Rasenberg

    View full-size slide

  8. bounded by the physical space
    serendipitous exploration

    View full-size slide

  9. how does this work digitally?

    View full-size slide

  10. 554,030 results

    View full-size slide

  11. *not really possible due to technical limitations in pagination depth
    13,000+ “pages”*

    View full-size slide

  12. this is a common paradigm

    View full-size slide

  13. “pagination”

    View full-size slide

  14. 13,000+ pages

    View full-size slide

  15. you miss the forest for the trees

    View full-size slide

  16. pages.gseis.ucla.edu/faculty/bates/berrypicking.html

    View full-size slide

  17. - Marcia J. Bates
    “the query is satisfied not by a single final
    retrieved set, but by a series of selections of
    individual references and bits of information at
    each stage of the ever-modifying search”

    View full-size slide

  18. research as a serendipitous process

    View full-size slide

  19. constrained by small set of results on view
    digital serendipity is limited

    View full-size slide

  20. see the forest and the trees

    View full-size slide

  21. “big picture” vs. detail

    View full-size slide

  22. *that i could research, design, and implement in eight weeks
    this is my attempt at addressing this*

    View full-size slide

  23. “An experimental bird’s eye view of the
    digital collections from the State
    Library of NSW”

    View full-size slide

  24. but first some background

    View full-size slide

  25. dependent on text metadata

    View full-size slide

  26. comprehensive metadata is hard

    View full-size slide

  27. we could automate metadata creation

    View full-size slide

  28. we could automate metadata creation
    som
    e*
    *the one that computers are good at

    View full-size slide

  29. already being done at the Library

    View full-size slide

  30. sl.nsw.gov.au/blogs/tiger-using-artificial-intelligence-discover-our-collections

    View full-size slide

  31. collection.sl.nsw.gov.au

    View full-size slide

  32. computers get (a lot) wrong

    View full-size slide

  33. *there are way more problematic tags (that reflect algorithm biases)
    take it with a grain of salt*

    View full-size slide

  34. Microsoft COCO: Common Objects in Context

    View full-size slide

  35. computer “learns” how to “see”

    View full-size slide

  36. an issue in a collection with centuries-old stuff
    cannot see…
    what it hasn’t seen before

    View full-size slide

  37. *non-W.E.I.R.D. stuff (Western, Educated, Industrialized, Rich, “Democratic”)
    cannot see…
    what its creator didn’t train* it to see

    View full-size slide

  38. useful for finding similar-looking things, not for the categories themselves
    but it is consistent

    View full-size slide

  39. with computers, avoiding closed or proprietary algorithms
    a way to group images together

    View full-size slide

  40. image-net.org

    View full-size slide

  41. ml4a.github.io/guides - Gene Kogan

    View full-size slide

  42. or, in math parlance, uniform manifold approximation and projection (umap)
    “similarity metadata”

    View full-size slide

  43. dhlab.yale.edu/projects/pixplot

    View full-size slide

  44. dhlab.yale.edu/projects/pixplot

    View full-size slide

  45. SFMoMA Artscope - Stamen (2014)

    View full-size slide

  46. bertspaan.nl/semia - Bert Spaan

    View full-size slide

  47. amnh-sciviz.github.io/image-collection - Brian Foo

    View full-size slide

  48. we can also use color as metadata

    View full-size slide

  49. mkweb.bcgsc.ca/colorsummarizer

    View full-size slide

  50. artsexperiments.withgoogle.com/artpalette

    View full-size slide

  51. collection.cooperhewitt.org

    View full-size slide

  52. publicdomain.nypl.org/pd-visualization - Brian Foo

    View full-size slide

  53. “I propose enhancing the Manuscripts, Oral History and
    Pictures Catalogue with computer-generated metadata to
    create new pathways for patron exploration. By focusing on
    the 300,000+ digitised image set that is currently
    available…”

    View full-size slide

  54. “I propose enhancing the Manuscripts, Oral History and
    Pictures Catalogue with computer-generated metadata to
    create new pathways for patron exploration. By focusing on
    the 300,000+ digitised image set that is currently
    available…”

    View full-size slide

  55. but seemed technically achievable in a modern web browser
    larger than other sets that i had seen

    View full-size slide

  56. little did i know…

    View full-size slide

  57. the scale of the dataset was a big challenge in this project
    2 million images…

    View full-size slide

  58. emphasis in visual materials
    1,199,477 things., photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  59. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  60. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  61. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  62. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  63. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  64. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  65. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  66. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  67. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  68. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  69. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  70. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  71. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  72. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  73. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  74. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  75. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  76. 1,199,477 manuscripts, photographs, negatives, published
    maps, drawings, objects, pictures, ephemera, journals,
    prints, medals, unpublished maps, paintings, coins,
    architectural drawings, newspapers, posters, and stamps.

    View full-size slide

  77. “unsorted”, year, colour, similarity

    View full-size slide

  78. “unsorted”, year, colour, similarity

    View full-size slide

  79. “unsorted”, year, colour, similarity

    View full-size slide

  80. four different ways of looking at the forest
    “unsorted”, year, colour, similarity

    View full-size slide

  81. page 1
    5 pages

    View full-size slide

  82. 8 pages
    page 1

    View full-size slide

  83. 27 pages
    page 1

    View full-size slide

  84. 688 pages
    page 1

    View full-size slide

  85. 27,749 pages

    View full-size slide

  86. serendipitous exploration

    View full-size slide

  87. unbounded by physical space

    View full-size slide

  88. try this at home… memory limitations apply
    pushes the limits of web browsers

    View full-size slide

  89. more sorting criteria, custom groupings, filtering from library metadata
    future work?

    View full-size slide

  90. The Process of Design Squiggle by Damien Newman, thedesignsquiggle.com

    View full-size slide

  91. ended up using 1.2 million… still a lot for any web browser, let alone a smartphone
    2 million images! ⏳

    View full-size slide

  92. single pixels can convey a lot of information… but i wanted everything
    these are all “just” pixels

    View full-size slide

  93. - Frustrated Fellow
    “but how am i going to show
    two million images‽”

    View full-size slide

  94. technical details

    View full-size slide

  95. …with lots of data cleanup and cloud computing time
    Python + Node.js
    for image similarity and color palettes

    View full-size slide

  96. cloud-hosted static assets using Library’s API for image details
    Three.js + Vue.js
    for Aereo interface

    View full-size slide

  97. github.com/slnsw/dxlab-fellowship-2019

    View full-size slide

  98. dxlab.sl.nsw.gov.au/blog/building-aereo

    View full-size slide

  99. Special thanks to
    •DX Lab team: Paula Bray, Kaho Cheung, and Luke Dearnley
    •Web team: Jenna Bain and Robertus Johansyah
    •Mitchell Librarian Richard Neville
    •State Librarian Dr John Vallance
    •Staff from the State Library of New South Wales
    •The State Library of NSW Foundation

    View full-size slide

  100. Acknowledgments
    •Cyril Diagne
    •Douglas Duhaime - Yale DH Lab
    •Gene Kogan
    •Mario Klingemann
    •Ricardo Cabello - Three.js

    View full-size slide

  101. Mauricio Giraldo Arteaga
    @mgiraldo

    View full-size slide