Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Greig Roulston: Find All the Books

Greig Roulston: Find All the Books

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Greig Roulston:
Find All the Books
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
@ Kiwi PyCon 2014 - Saturday, 13 Sep 2014 - Track 1
http://kiwi.pycon.org/

**Audience level**

Novice

**Description**

This is the story of how I stumbled into using Python, took on a project that is harder then I first thought and learned more then I ever wanted to know about bibliographic records, py(MARC), and how MongoDB blew my mind.

**Abstract**

The Digitisation team at National Library of New Zealan are looking at NZ books, lots of books. What needs to be digitised, what is already digitised and how do we get access to these books. This has been a journey of learning the archaic arts of MARC, learning and loving Python and working out how to handle large amounts of Library Data. This is still a work in progress, 2 years later and what started as checking data sets against each other in excel, has grown into me falling in love with Dictionary’s (JSON), making Flask web interfaces to explore data, Map Reducing data sets, making sweet graphs with MatPlotLib, using Python when ever I get a chance and even teaching some of what I have learned to other colleagues.

**YouTube**

https://www.youtube.com/watch?v=VOEbhYxWimU

6b880a0b67fac54c42c77fe70d97334d?s=128

New Zealand Python User Group

September 13, 2014
Tweet

Transcript

  1. None
  2. Greig Roulston Digitisation Advisor National Library of New Zealand

  3. None
  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. @willsmith

  11. None
  12. The Will Smith Problem • People have made incorrect assumptions

    • With the right metadata the problem becomes easier to solve. • Books like people often share the same title/name.
  13. In the Beginning

  14. Analysis of Publications NZ Bibliography data done by a consultant

    I casually offered to try make sense of it
  15. What if I matched the Publication NZ records to the

    Hathi Trust digitised Records?
  16. How hard could it be?

  17. None
  18. None
  19. Something something python

  20. First outputs are text files of found or not found

    Hardly the most elegant solution require scripts to make sense of scripts
  21. www.regexr.com Greig’s Protips

  22. Need more Data

  23. None
  24. None
  25. None
  26. None
  27. None
  28. 46,229 About NZ 144,237 By NZ’ers 736,732 Published In NZ

  29. None
  30. jsoneditoronline.org Greig’s Protips

  31. You have literally just described a database

  32. None
  33. None
  34. None
  35. Still not the right data

  36. None
  37. None
  38. None
  39. None
  40. None
  41. None
  42. None
  43. None
  44. None
  45. None
  46. None
  47. So hot right now

  48. None
  49. @NLNZ #WeWantJam Or find Jay Gattuso (may or not have

    a horse or giant jar)
  50. Thanks Email: Greig.roulston@dia.govt.nz Twitter: @dontforgettheeye

  51. Slide1, http://memegenerator.net/instance/51533840 Slide2, http://architecturenow.co.nz/articles/national-library-of-new-zealand-te-puna-matauranga-o-aotearoa/ Slide3, http://natlib.govt.nz/records/22798019 Slide4, http://natlib.govt.nz/records/23253356\ Slide5, http://natlib.govt.nz/records/23065985

    Slide3-5, http://natlib.govt.nz/records/30632322 Slide7, http://ih3.redbubble.net/image.4085178.0530/fc,550x550,white.jpg Slide8, http://imgur.com/axT5wzG Slide9, http://hellogiggles.com/wp-content/uploads/2014/06/11/fresh1.jpg Slide10, http://digital.scaengage.com.au/wp-content/uploads/2014/07/59430-47171-oculus_test.jpg Slide12, http://0.media.collegehumor.cvcdn.com/11/90/09c3036c0937a03fd0b994ea7cc6d9a1-will-smith-woo-ha-ha-ha-ha.jpg Slide12, http://files.tested.com/photos/2012/03/16/36496-redheads_teaser.jpg Slide13, http://i188.photobucket.com/albums/z143/lapazfarm/big-bang_zpsf14873f2.jpg Slide14, http://www.wallchan.com/images/sandbox/23528-lightbulb-head-suit.jpg Slide14, https://warosu.org/data/ck/img/0055/63/1404337713984.jpg Slide15, http://newsnetwork.mayoclinic.org/files/2014/03/shutterstock_99149363.jpg Slide15, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide16, http://i.dailymail.co.uk/i/pix/2012/08/22/article-0-14A5FD65000005DC-17_634x955.jpg Slide17, http://ericbessart.files.wordpress.com/2012/08/article-0-14a5fd74000005dc-559_634x965.jpg?w=788 Slide18, http://blog.contextures.com/wp-content/uploads/2013/06/datavalcombosheet16.png Slide20, http://www.pensacolafishingforum.com/attachments/f22/56035d1343769169t-snake-id-python-snake-bite-jpg Slide22, http://img1.wikia.nocookie.net/__cb20120229220214/memoryalpha/en/images/0/04/Mister_Tricorder.jpg Slide23, http://anthropology.si.edu/cm/images/documentation/card.jpg Slide26, http://img706.imageshack.us/img706/5834/41toomuchinformation.jpg Slide26, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide27, http://www.powercorpsigns.com/oclc_01_op_800x539.jpg Slide27, http://fc07.deviantart.net/fs71/f/2011/348/6/6/snata__s_hat_by_artush-d4j38dx.png Slide35, https://d10k7k7mywg42z.cloudfront.net/assets/537e3f004f720a475c000049/data_copy.jpg Slide36, http://forexvill.com/wp-content/uploads/2014/04/o-YOUNG-BUSINESSMAN-SAD-facebook.jpg Slide36, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg
  52. Slide37, http://www2u.biglobe.ne.jp/~perry/page/journey/discography/album_cover/79in_the_beginning.jpg Slide38, http://cicespa.com/images/o-WHITE-PEOPLE-BUSINESS-SUITS-facebook.jpg Slide38, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide39, http://www.freesoftwaremagazine.com/files/nodes/3027/c20081015_broken_monitor.jpg Slide40, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg

    Slide40, http://images.halloweencostumes.com/products/16756/1-1/crash-test-dummy-costume.jpg Slide40, http://www.sabanginn.com/new_button.png Slide41, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide41, http://www.spirithalloween.com/images/spirit/products/interactivezoom/processed/01154426.interactive.a.jpg Slide41, http://www.sabanginn.com/new_button.png Slide42, http://www.huffingtonpost.com/2013/09/22/dating-advice_n_3972232.html Slide42, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide43, http://www.glamour.com/images/sex-love-life/2013/1/17-couple-cuddling-in-bed-main.jpg Slide43, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide44, http://i.kinja-img.com/gawker-media/image/upload/s--PLmiycHN--/c_fit,fl_progressive,q_80,w_636/x8qj3zkuxwsyhtfjpeml.jpg Slide44, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide45, http://www.wallseemly.com/wallpapers/2013/03/Apple-mac-computer-monitor-old-version-apple-2-evolution-900x1440.jpg Slide45, http://www.sabanginn.com/new_button.png Slide45, http://4.bp.blogspot.com/-S8onmAxUz6k/Up4W2noSiGI/AAAAAAAAGS0/emXcuTsI3Jw/s1600/crash-test_1506820i.jpg Slide46, http://www.hotmusiccharts.com/blog/wp-content/uploads/2012/09/journey-dont-stop-believin.jpg Slide47, http://forum.maplewoodonline.com/uploads/thumbnails/FileUpload/66/5527d84758f5df4f8227e177ced5c1.jpg Slide47, http://natlib.govt.nz/records/23127627