"Don't Be Afraid of Elasticsearch." (The University of Washington)

Don't Be Afraid of Elasticsearch Maxime Deravet  Java Software Engineer
| University of Washington 1

Who am I ? > From Belgium > JAVA Software
engineer > Working at UW-IT since March 2015: > UW Information Technology is the central IT organization for the University of Washington, providing critical technology support to all three campuses, UW medical centers and global research operations > 600 employees in central UW-IT > several thousand distributed IT employees at UW > User of ElasticSearch for the past 2 years

Disclaimer • In this presentation, you won’t see : •
A 1,000 node ElasticSearch cluster. • Tips to optimize an ES cluster to handle 1,000 PB of data. • Tips to make ES respond within the ms.

However ! • You will see: • How easily a
small team ( 3 dev ) was able to introduce ElasticSearch into their application stack. • How this change made their life easier, and their clients happier.

Our Project : Replacing the 10 years old Facilities Services
document management system • The Facilities Services keep track of all documents related to the buildings at the University of Washington, since 1861.

Facilities: One of the oldest document in the system

Our Project : Replacing Facilities Services’ 10-year-old document management system
• Functionalities : • Scan old paper documents • Upload the new electronic files • Add metadata • Search and download documents • They have different levels of security, with the lowest one being read- only access to almost anyone with a UW account.

Facilities : a specific tenant on our shared Oracle WebCenter
Content installation. • Multiples “tenants” - or UW departments - are using the same WCC installation. • Every tenant needs a specific profile configuration. • The metadata is in a single table, shared by all the tenants.

Oracle WebCenter Content limitations • The search engine ( Oracle
Text Search ) is pretty rigid. • Searchable fields needs to be defined in advance. • Configuration by profile is not supported. • By example : if facets are used, the same facets will be defined for all executed queries, regardless of the tenant’s profile. • Search is not as fast as we would like. • Reindexing is relatively slow, as the whole database is reindexed instead of the profile of interest.

Why do we want to improve our search engine ?
• The Facilities UI can be used by anyone at UW. • People using the search UI won’t be trained to do so. • There isn’t any specific pattern on how the users search for documents. => we wanted a “Google” like search : • with a single search box. • with pertinent results. • Fast

Here comes ElasticSearch • There must be an easier way
to search. • We started a proof-of-concept with ES.

Proof of concept • We called the REST API on
top of WCC to index all the Facilities documents in ES (~170,000)

Proof of concept (2) • We indexed the JSON as-is
in Elasticsearch. • And we started executing a few simple “Query String Queries” on “_all”

It works ! • And it’s fast ! • ~
50ms by query. ( Was ~ 1-2s for the same type of queries in WCC ) • Slightly slower with 3 Aggregations, but most of the time it’s below 100ms • ElasticSearch: 2 node cluster, with 2 cores and 1Gb of memory • Oracle 12c Data Guard: 4 cpu and 24GB of memory

Implementation Details : A few microservices • Content-API : •
Provides CRUD over WCC. • Sends every document update to RabbitMQ. • Indexer : • Reads from the Queue, send all updates to ElasticSearch. • Allows us to do a full rebuild of the indexes

Implementation Details : A few microservices (2) • Search-API :
• Provides a thin layer of abstraction over ElasticSearch APIs • Handles access control : • Based on UW identity and access management system. • Overrides queries by adding a user-specific “Terms filter” based on the user’s access privileges. • Forbids access to non-aliased indexes.

Implementation Details : ElasticSearch configuration • ElasticSearch Configuration : •
3 node cluster in production. • Mostly out of the box. • A few mapping customizations • “.raw” fields for aggregations • “.lowercase” field for alphabetical sorting • Search-as-you-type analyzer

Architecture Overview

Demo Time !

Conclusions • First version of the implementation in ~1 man-month.
• Good feedback from the user; the search is fast and accurate. • Good ElasticSearch documentation makes it easy to learn. • We will expand this implementation for our next customer. • You should try it !

Questions ? Maxime Deravet : • [email protected] • @maximeder

"Don't Be Afraid of Elasticsearch." (The Univer...

"Don't Be Afraid of Elasticsearch." (The University of Washington)

Elastic Co

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript

Don't Be Afraid of Elasticsearch Maxime Deravet  Java Software Engineer

Who am I ? > From Belgium > JAVA Software

Disclaimer • In this presentation, you won’t see : •

However ! • You will see: • How easily a

Our Project : Replacing the 10 years old Facilities Services

Facilities: One of the oldest document in the system

Our Project : Replacing Facilities Services’ 10-year-old document management system

Facilities : a specific tenant on our shared Oracle WebCenter

Oracle WebCenter Content limitations • The search engine ( Oracle

Why do we want to improve our search engine ?

Here comes ElasticSearch • There must be an easier way

Proof of concept • We called the REST API on

Proof of concept (2) • We indexed the JSON as-is

It works ! • And it’s fast ! • ~

Implementation Details : A few microservices • Content-API : •

Implementation Details : A few microservices (2) • Search-API :

Implementation Details : ElasticSearch configuration • ElasticSearch Configuration : •

Architecture Overview

Demo Time !

Conclusions • First version of the implementation in ~1 man-month.

Questions ? Maxime Deravet : • [email protected] • @maximeder