Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Don't Be Afraid of Elasticsearch." (The University of Washington)

Elastic Co
December 03, 2015

"Don't Be Afraid of Elasticsearch." (The University of Washington)

From proof of concept to implementation in a minimal amount of time, here is a story of how a small team of engineers at the University of Washington introduced Elasticsearch in its development stack to replace the Oracle WebCenter content search engine, how it made their life easier, and how it improved the user experience.

Maxime Deravet | Elastic{ON} Tour Seattle | December 3, 2015

Elastic Co

December 03, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Who am I ? > From Belgium > JAVA Software

    engineer > Working at UW-IT since March 2015: > UW Information Technology is the central IT organization for the University of Washington, providing critical technology support to all three campuses, UW medical centers and global research operations > 600 employees in central UW-IT > several thousand distributed IT employees at UW > User of ElasticSearch for the past 2 years
  2. Disclaimer • In this presentation, you won’t see : •

    A 1,000 node ElasticSearch cluster. • Tips to optimize an ES cluster to handle 1,000 PB of data. • Tips to make ES respond within the ms.
  3. However ! • You will see: • How easily a

    small team ( 3 dev ) was able to introduce ElasticSearch into their application stack. • How this change made their life easier, and their clients happier.
  4. Our Project : Replacing the 10 years old Facilities Services

    document management system • The Facilities Services keep track of all documents related to the buildings at the University of Washington, since 1861.
  5. Our Project : Replacing Facilities Services’ 10-year-old document management system

    • Functionalities : • Scan old paper documents • Upload the new electronic files • Add metadata • Search and download documents • They have different levels of security, with the lowest one being read- only access to almost anyone with a UW account.
  6. Facilities : a specific tenant on our shared Oracle WebCenter

    Content installation. • Multiples “tenants” - or UW departments - are using the same WCC installation. • Every tenant needs a specific profile configuration. • The metadata is in a single table, shared by all the tenants.
  7. Oracle WebCenter Content limitations • The search engine ( Oracle

    Text Search ) is pretty rigid. • Searchable fields needs to be defined in advance. • Configuration by profile is not supported. • By example : if facets are used, the same facets will be defined for all executed queries, regardless of the tenant’s profile. • Search is not as fast as we would like. • Reindexing is relatively slow, as the whole database is reindexed instead of the profile of interest.
  8. Why do we want to improve our search engine ?

    • The Facilities UI can be used by anyone at UW. • People using the search UI won’t be trained to do so. • There isn’t any specific pattern on how the users search for documents. => we wanted a “Google” like search : • with a single search box. • with pertinent results. • Fast
  9. Here comes ElasticSearch • There must be an easier way

    to search. • We started a proof-of-concept with ES.
  10. Proof of concept • We called the REST API on

    top of WCC to index all the Facilities documents in ES (~170,000)
  11. Proof of concept (2) • We indexed the JSON as-is

    in Elasticsearch. • And we started executing a few simple “Query String Queries” on “_all”
  12. It works ! • And it’s fast ! • ~

    50ms by query. ( Was ~ 1-2s for the same type of queries in WCC ) • Slightly slower with 3 Aggregations, but most of the time it’s below 100ms • ElasticSearch: 2 node cluster, with 2 cores and 1Gb of memory • Oracle 12c Data Guard: 4 cpu and 24GB of memory
  13. Implementation Details : A few microservices • Content-API : •

    Provides CRUD over WCC. • Sends every document update to RabbitMQ. • Indexer : • Reads from the Queue, send all updates to ElasticSearch. • Allows us to do a full rebuild of the indexes
  14. Implementation Details : A few microservices (2) • Search-API :

    • Provides a thin layer of abstraction over ElasticSearch APIs • Handles access control : • Based on UW identity and access management system. • Overrides queries by adding a user-specific “Terms filter” based on the user’s access privileges. • Forbids access to non-aliased indexes.
  15. Implementation Details : ElasticSearch configuration • ElasticSearch Configuration : •

    3 node cluster in production. • Mostly out of the box. • A few mapping customizations • “.raw” fields for aggregations • “.lowercase” field for alphabetical sorting • Search-as-you-type analyzer
  16. Conclusions • First version of the implementation in ~1 man-month.

    • Good feedback from the user; the search is fast and accurate. • Good ElasticSearch documentation makes it easy to learn. • We will expand this implementation for our next customer. • You should try it !