Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Don't Be Afraid of Elasticsearch." (The University of Washington)

Elastic Co
December 03, 2015

"Don't Be Afraid of Elasticsearch." (The University of Washington)

From proof of concept to implementation in a minimal amount of time, here is a story of how a small team of engineers at the University of Washington introduced Elasticsearch in its development stack to replace the Oracle WebCenter content search engine, how it made their life easier, and how it improved the user experience.

Maxime Deravet | Elastic{ON} Tour Seattle | December 3, 2015

Elastic Co

December 03, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Don't Be Afraid of Elasticsearch
    Maxime Deravet

    Java Software Engineer | University of Washington
    1

    View Slide

  2. Who am I ?
    > From Belgium
    > JAVA Software engineer
    > Working at UW-IT since March 2015:
    > UW Information Technology is the central IT organization for the
    University of Washington, providing critical technology support to
    all three campuses, UW medical centers and global research
    operations
    > 600 employees in central UW-IT
    > several thousand distributed IT employees at UW
    > User of ElasticSearch for the past 2 years

    View Slide

  3. Disclaimer
    • In this presentation, you won’t see :
    • A 1,000 node ElasticSearch cluster.
    • Tips to optimize an ES cluster to handle 1,000 PB of data.
    • Tips to make ES respond within the ms.

    View Slide

  4. However !
    • You will see:
    • How easily a small team ( 3 dev ) was able to introduce
    ElasticSearch into their application stack.
    • How this change made their life easier, and their clients happier.

    View Slide

  5. Our Project : Replacing the 10 years old Facilities Services
    document management system
    • The Facilities Services keep track of all documents related to the
    buildings at the University of Washington, since 1861.

    View Slide

  6. Facilities: One of the oldest document in the system

    View Slide

  7. Our Project : Replacing Facilities Services’ 10-year-old
    document management system
    • Functionalities :
    • Scan old paper documents
    • Upload the new electronic files
    • Add metadata
    • Search and download documents
    • They have different levels of security, with the lowest one being read-
    only access to almost anyone with a UW account.

    View Slide

  8. Facilities : a specific tenant on our shared Oracle WebCenter
    Content installation.
    • Multiples “tenants” - or UW departments - are using the same WCC
    installation.
    • Every tenant needs a specific profile configuration.
    • The metadata is in a single table, shared by all the tenants.

    View Slide

  9. Oracle WebCenter Content limitations
    • The search engine ( Oracle Text Search ) is pretty rigid.
    • Searchable fields needs to be defined in advance.
    • Configuration by profile is not supported.
    • By example : if facets are used, the same facets will be defined
    for all executed queries, regardless of the tenant’s profile.
    • Search is not as fast as we would like.
    • Reindexing is relatively slow, as the whole database is reindexed instead
    of the profile of interest.

    View Slide

  10. Why do we want to improve our search engine ?
    • The Facilities UI can be used by anyone at UW.
    • People using the search UI won’t be trained to do so.
    • There isn’t any specific pattern on how the users search for documents.
    => we wanted a “Google” like search :
    • with a single search box.
    • with pertinent results.
    • Fast

    View Slide

  11. Here comes ElasticSearch
    • There must be an easier way to search.
    • We started a proof-of-concept with ES.

    View Slide

  12. Proof of concept
    • We called the REST API on top of WCC to index all the
    Facilities documents in ES (~170,000)

    View Slide

  13. Proof of concept (2)
    • We indexed the JSON as-is in Elasticsearch.
    • And we started executing a few simple “Query String
    Queries” on “_all”

    View Slide

  14. It works !
    • And it’s fast !
    • ~ 50ms by query. ( Was ~ 1-2s for the same type of queries
    in WCC )
    • Slightly slower with 3 Aggregations, but most of the time it’s
    below 100ms
    • ElasticSearch: 2 node cluster, with 2 cores and 1Gb of memory
    • Oracle 12c Data Guard: 4 cpu and 24GB of memory

    View Slide

  15. Implementation Details : A few microservices
    • Content-API :
    • Provides CRUD over WCC.
    • Sends every document update to RabbitMQ.
    • Indexer :
    • Reads from the Queue, send all updates to
    ElasticSearch.
    • Allows us to do a full rebuild of the indexes

    View Slide

  16. Implementation Details : A few microservices (2)
    • Search-API :
    • Provides a thin layer of abstraction over ElasticSearch APIs
    • Handles access control :
    • Based on UW identity and access management
    system.
    • Overrides queries by adding a user-specific “Terms
    filter” based on the user’s access privileges.
    • Forbids access to non-aliased indexes.

    View Slide

  17. Implementation Details : ElasticSearch configuration
    • ElasticSearch Configuration :
    • 3 node cluster in production.
    • Mostly out of the box.
    • A few mapping customizations
    • “.raw” fields for aggregations
    • “.lowercase” field for alphabetical sorting
    • Search-as-you-type analyzer

    View Slide

  18. Architecture Overview

    View Slide

  19. Demo Time !

    View Slide

  20. Conclusions
    • First version of the implementation in ~1 man-month.
    • Good feedback from the user; the search is fast and
    accurate.
    • Good ElasticSearch documentation makes it easy to
    learn.
    • We will expand this implementation for our next customer.
    • You should try it !

    View Slide

  21. Questions ?
    Maxime Deravet :
    [email protected]
    • @maximeder

    View Slide