Streamlining Healthcare & Research at UCLA with Elasticsearch

UCLA Health Date: March, 7th 2017 @UCLAHealth Streamlining Healthcare and
Research: The Story of Elasticsearch at UCLA Health Shehzad Sheikh, Manager, Analytical Solutions Paul Tung, Solutions Architect Vivek Katakwar, Analytics Application Developer

Agenda 2 1 Who we are 2 Use Case 3
Architecture and Infrastructure 4 Elasticsearch Implementation 5 Demo

WHO WE ARE

4 hospitals 250+ outpatient practices 60,000 hospital encounters 1.9 mil
outpatient visits 795 Inpatient beds 2013, 2014, & 2015

Larry Page { } If you’re changing the world, you’re
working on important things. You’re excited to get up in the morning…

USE CASE: PROVIDING ADVANCED SEARCH FEATURES ON PATHOLOGY DATA

Patient Case Data (master records) Case details (child records)

Background • Before 2013, UCLA had > 63 discrete systems
to manage and support patient care at hospitals and clinics. - Lots of manual duplication of data across systems. • In 2013, UCLA implemented an EMR (Electronic Medical Record) from Epic - we call it UCLA CareConnect - Eliminated manual data duplication. Replaced numerous legacy systems with one integrated system - All data in one place • Still some limitations – No way to search on Pathology cases

Why do we need Case Search? • Crucial business need
to be able to search on Pathology cases for QA/QC, College of American Pathology (CAP) inspections, clinical trials, and to have advanced text search • The EMR did not have this advanced search capability within their pathology application • The Pathology Department considered this to be a must-have feature - This missing feature has held some institutions back from implementing that part of their EMR. 9

What is Case Search? • Custom developed application that enables
advanced search of Pathology Cases powered by Elasticsearch • Deeply integrated with our EMR (Epic) – UCLA CareConnect. • Used by Clinicians and Lab Technicians to search lab results. 10

Why did we choose Elasticsearch? • High performance, distributed. •
Provides Google-like search with advanced features - Fuzzy search, ranking, suggestions, autocomplete, text highlighting • Ability to search structured, semi-structured, and unstructured text/documents (Document Store) • Scalability as a platform to provide search not only for our current project, but also for other use cases 12

Some of the many companies using Elasticsearch: 13

ARCHITECTURE AND INFRASTRUCTURE

Elasticsearch Implementation • Elasticsearch 2.3.2 – Planning to upgrade to
5.2.1 • Production: - Web Tier ๏ 2 Web Servers behind load balancer • @ 8GB / 2vCPU / vCPU / 40GB SSD SAN storage • Session Affinity/Sticky Sessions - Elastic Tier - 5 Node Cluster ๏ 2 Client Nodes – Shared with Web Servers • Heap size= 4GB ๏ 3 Master/Data Nodes • @ 64GB Ram/4 vCPU/ 300GB SSD SAN storage • Heap size = 30500m • Non-Production: Dev/Test similar to Production without HA 15

16 Load Balancer Web Application Web Services API ElasticSearch Client
Node Web Application Case Search Application Server Web Services API Case Search API Server ElasticSearch Master/Data Nodes NetScaler Case List Search Application Server Architecture Production oAuth Authentication Server NetScaler oAuth Authentication Servers Logstash Production DB Compliance Audit Log

Elasticsearch Configuration • Lessons Learned (Thanks Jared!): - Set the
heap size < 30500m if you have 64GB Ram - SSL issues ๏ Conversion from PFX to JKS- Long story short, get your certs in the correct format to begin with – otherwise use KeyStore Explorer - Separate your installation directories from your data - Keep your installation generic, and feed in your configuration through the command line on startup - Use NSSM (Non-Sucking Service Manager) to run Elastic and Kibana as services.

ELASTICSEARCH IMPLEMENTATION

Mapping Challenges

20 This is a sample image Parent Child Mapping

21 Pros: Pros and Cons of Parent Child Mapping Cons:
• The parent document and all of its children must live on the same shard which leads to hot-spots. • To include parent information always have to use inner_hits and traversing through the resulted hits is very time consuming and it is not efficient. • On update to a parent document there is lots of disk IO read/write due to deletion of an existing record and creating a new one. • Parent-child queries can be 5 to 10 times slower than the equivalent nested query! • The parent document can be updated without re- indexing the children. • Good for frequent update to child documents without effecting parent document. • Child documents can be returned as the results of a search request. • Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.

22 PUT /company { "mappings": { "branch": {}, "employee": {
"_parent": { "type": "branch" } } } } Sample: Child Document (Employee) Company (Top Level Parent) --- Branch (Second Level Parent) --- Employee(Child)

23 This is a sample image De-Normalizing Data

24 Pros: Pros and Cons of De-normalization Cons: • Biggest
disadvantage is that the index will be bigger and there are more indexed fields. This usually isn’t a huge problem. The data written to disk is highly compressed, and disk space is cheap. Elasticsearch can happily cope with the extra data. • The more important issue is that, if the parent or child record needs to be updated you need to re- index the whole document. • The advantage of data de-normalization is speed. Because each document contains all of the information that is required to determine whether it matches the query, there is no need for expensive joins.

25 PUT /patientcaseindex { "mappings": { "patientcase": { "properties": {
… "BillingSummary": { "properties": { "ChargeCodeDescription": { "type": "string", "index": "not_analyzed" }, "ChargeCodeId": { "type": "string", "index": "not_analyzed" }, "Quantity": { "type": "integer" } … Sample: Object Array Type (BillingSummary)

26 This is a sample image Nested Object

27 Object Array storage: Stored in Elastic (key-value pair) Actual
data

28 Need of nested objects: • When we use object
arrays, these JSON documents in Elastic indexes are flattened in a Key-Value format which loses document correlation, this leads to false positives results when you are looking for a unique set of data. "comments.name": [ alice, john, smith, white ], "comments.comment": [ article, great, like, more, please, this ], • Nested objects are designed to solve this problem where each nested object is stored as hidden separate document. • By indexing each nested document separately the fields within documents maintain their relationships. Now it’s possible to match documents when match is found with in the same document. • Here is reference from Elastic: • https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html

29 Nested Object storage: Actual data Stored in Elastic (hidden
documents)

30 Pros: Pros and Cons of Nested Objects Cons: •
Nested objects are stored as separate hidden documents, we can’t query them directly. Instead we have to use nested query to access them. • To add, change, or delete a nested document, the whole document must be re-indexed. This becomes more costly the more nested documents there are. • Search requests return the whole document, not just the matching nested documents. • It cannot be used when you need a complete separation between the main document and its associated entities. • The nested type is a specialized version of the object array data type that allows arrays of objects to be indexed and queried independently of each other. • Nested query and filter provide fast query-time joins.

31 PUT /patientcaseindex { "mappings": { "patientcase": { "properties": {
… "CaseResults": { "type": "nested", "properties": { "ResultSection": { "type": "string", "index": "not_analyzed" }, "ResultText": { "type": "string", "term_vector": "with_positions_offsets" } … Sample: Nested Type Mapping (CaseResults)

What else we have implemented?

33 "settings { "index": { "analysis": { "filter": { "name_synonyms":
{ "type": "synonym", "synonyms_path": “path//synonyms.txt" }}, "analyzer": { "name_synonym": { "filter": ["name_synonyms", "standard", "lowercase"], "type": "custom", "tokenizer": "standard" } … Synonym Tokenizer

34 { "query": { "match": { "PatientLastName": { "query": "smth",
"fuzziness": 1, "operator": "and" } } } } Fuzzy Search

35 "settings": { "index": { "analysis": { "tokenizer": { "ngram_tokenizer":
{ "token_chars": ["letter", "digit"], "min_gram": "2", "type": "ngram", "max_gram": "15" }}, "analyzer": { "autocomplete": { "filter": "lowercase", "tokenizer": "ngram_tokenizer" }}}}} Autocomplete Tokenizer

36 { "query": { "query_string": { "query": “818*", "fields": [
“Fax.raw", “Phone.raw" ] } } } Multi-field Search

• Range • Wildcard • Term • Terms • Match
• Phrase Match Other Query Types we used 37

APPLICATION DEMO

Date Range Auto Complete Advanced Search Multi- field Search options
Results Tab Search Tab

41 Questions? Shehzad Sheikh Manager, Analytic Solutions – [email protected] Vivek
Katakwar Analytics Application Developer – [email protected] Paul Tung Solutions Architect – [email protected]

www.elastic.co

Streamlining Healthcare & Research at UCLA with...

Streamlining Healthcare & Research at UCLA with Elasticsearch

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript