Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HathiTrust Research Center overview and tutorial

HathiTrust Research Center overview and tutorial

Workshop on the HathiTrust Research Center presented for THATCamp Gaineville in April 2014.

Harriett Green

April 24, 2014
Tweet

More Decks by Harriett Green

Other Decks in Education

Transcript

  1. Outline • HathiTrust and HathiTrust Research Center overview • How

    to Use the HTRC Portal – Workset Builder – Algorithm Analysis • Opportunities to connect you with the HathiTrust Research Center
  2. HathiTrust “Wow” Numbers • 11,135,776 total volumes • 5,801,121 book

    titles • 290,893 serial titles • 3,897,521,600 pages • 499 terabytes • 132 miles • 9,048 tons • Public Domain: 3,743,574 volumes(~34% of total) http://www.hathitrust.org
  3. Board of Governors Executive Committee Executive Director HathiTrust Digital Library

    90+ partners University of Illinois Indiana University HathiTrust Research Center University of Michigan Data Copy #1 Data Copy #2 Indiana University
  4. Why Worksets? • The result of a first-level, rough filter

    • Better scale for intensive analytics • Provides essential scope for certain analytics – Word frequency scope over Bacon’s essays • Some tools (are trained to) work best on a narrow, homogeneous work-set • Eliminate noise that would otherwise arise by asking questions across whole of HT
  5. Looking into the future • Non-consumptive research on copyrighted texts

    • Bookworm tool development: http://sandbox.htrc.illinois.edu/bookworm/ • Improvement of metadata through Workset Creation for Scholarly Analysis (WCSA) study • Documentation and user guides forthcoming soon
  6. Acknowledgements: HTRC Team • HTRC @ Illinois (GSLIS and the

    University Library): Stephen Downie, Tim Cole, Loretta Auvil, Sayan Bhattacharyya, Boris Capitanu, Colleen Fallaw, Katrina Fenlon, Harriett Green, Peter Organisciak, Megan Senseney, Craig Willis • Indiana University: led by Beth Plale