Overview of Jerome

A quick overview Nick Jackson, Alex Bilbie, Paul Stainthorp, Chris
Leach @jacksonj04, @alexbilbie, @pstainthorp, @chrisl1953

Today’s XKCD

The Jerome Philosophy 3 Just get it done

If I had asked people what they wanted, they would
have said faster horses. Henry Ford 4

Getting Data Out, Then In 5 or: Why Horizon Sucks,
And Why Mongo Is Cool

Horizon • Getting data out of Horizon in a sensible
way is impossible without sacrifices at the full moon. • MARC isn’t a sensible way, but it’s the best we’ve got. • marcout.exe splurges 99% good stuff, 1% nonsense. 6

Parsing It All • File_MARC is a PEAR (PHP) library
which takes care of it all. • Requires some TLC on the output to deal with character encoding (MARC-8 sucks). • Build a huge array of stuff for each record, using MARC tags as index names. 7

Sensible Output • Storing the whole parsed MARC is useful,
but; • The vast majority of users don’t care about the MARC record. • They just want the information needed to find and cite a work. • Extract this ‘simple’ information in its own array. 8

MongoDB 101 • Document database: No fixed data structure. •
Accepts pure JSON as the input method. • PHP library accepts nested arrays and JSONifies them for you. • Makes getting data into the database dead easy. 9

Getting At The Good Stuff 10 Retrieving items from Jerome

Mongo Makes APIs Easy • Mongo also accepts queries as
JSON. • {“bib”:21084} • {“simple”:{“title”:“Problems with badgers?”}} • Users can use preformed query fields, or potentially write their own. • No need to add complex query builders to APIs. 11

Searching in 0.000 seconds 12 Finding what you’re looking for
with Sphinx

Sphinx 101 • Lightning fast search server. • Indexes at
up to 15MB per second per core. • Searches 1,000,000 record, 1.2GB testing index at 500 searches a second. • Largest known index is 5 billion records. • Powers Craigslist. 13

XML in, Index Out • Export an XML file from
Mongo (about 4 seconds). • Sphinx will also happily index SQL databases. • Tell the Sphinx indexer to reindex it (0.5 seconds). • We have around 64585 records in our test set, searchable on title, author, ISBN and Dewey number. 14

Searching... • Supports quite complex query forms. • OR, NOT,
exact form, field-specific, strict order, proximity, quorum, phrase... • Supports custom field weightings. • Even does SQL queries. • Average search completes in under 0.0005 secs. 15

Distributed Goodness • Sphinx adds horizontal scaling in the form
of distributed indexes. • Can also be used to provide ‘universal search’, since indexes can be non-homogenous. • ePrints, blog posts, journals and more are indexed individually but can be searched collectively. 16

It’s Demo Time! 17

Everything you're about to see on screen was generated on
the computer inside this bag... Steve Jobs 18

19 Not a server

To The Web! • Portal: Specific work • Portal: Live
search • API: Specific work • API: Works matching a Dewey number • API: Searching 20

What Else? The Labs 21

What’s Next For Jerome? • 3rd party integration (Amazon, Google,
Copac, MOSAIC, LibraryThing...) • E-Journals integration (including full TOC searching) • EPrints integration (including full summary, and possibly full text searching) 22

Want More? • Follow our progress on Twitter: #Jerome •
Read our blog: http://jerome.blogs.lincoln.ac.uk 23

Overview of Jerome

Overview of Jerome

Nick Jackson

More Decks by Nick Jackson

Other Decks in Technology

Featured

Transcript

A quick overview Nick Jackson, Alex Bilbie, Paul Stainthorp, Chris

Today’s XKCD

The Jerome Philosophy 3 Just get it done

If I had asked people what they wanted, they would

Getting Data Out, Then In 5 or: Why Horizon Sucks,

Horizon • Getting data out of Horizon in a sensible

Parsing It All • File_MARC is a PEAR (PHP) library

Sensible Output • Storing the whole parsed MARC is useful,

MongoDB 101 • Document database: No fixed data structure. •

Getting At The Good Stuff 10 Retrieving items from Jerome

Mongo Makes APIs Easy • Mongo also accepts queries as

Searching in 0.000 seconds 12 Finding what you’re looking for

Sphinx 101 • Lightning fast search server. • Indexes at

XML in, Index Out • Export an XML file from

Searching... • Supports quite complex query forms. • OR, NOT,

Distributed Goodness • Sphinx adds horizontal scaling in the form

It’s Demo Time! 17

Everything you're about to see on screen was generated on

19 Not a server

To The Web! • Portal: Specific work • Portal: Live

What Else? The Labs 21

What’s Next For Jerome? • 3rd party integration (Amazon, Google,

Want More? • Follow our progress on Twitter: #Jerome •