Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streamlining Healthcare & Research at UCLA with Elasticsearch

Streamlining Healthcare & Research at UCLA with Elasticsearch

As healthcare institutions generate more data, they need a way to search through electronic health records (EHR) and find meaningful insights. UCLA Health has chosen Elasticsearch as its tool of choice to index, search, and produce more thorough, actionable results for clinicians and researchers.

Vivek Katakwar l Analytics Application Developer l UCLA Health
Shehzad Sheikh l Manager, Analytical Solutions l UCLA Health
Paul Tung l Solutions Architect l UCLA Health


Elastic Co

March 08, 2017


  1. UCLA Health Date: March, 7th 2017 @UCLAHealth Streamlining Healthcare and

    Research: The Story of Elasticsearch at UCLA Health Shehzad Sheikh, Manager, Analytical Solutions Paul Tung, Solutions Architect Vivek Katakwar, Analytics Application Developer
  2. Agenda 2 1 Who we are 2 Use Case 3

    Architecture and Infrastructure 4 Elasticsearch Implementation 5 Demo

  4. 4 hospitals 250+ outpatient practices 60,000 hospital encounters 1.9 mil

    outpatient visits 795 Inpatient beds 2013, 2014, & 2015
  5. Larry Page { } If you’re changing the world, you’re

    working on important things. You’re excited to get up in the morning…

  7. Patient Case Data (master records) Case details (child records)

  8. Background • Before 2013, UCLA had > 63 discrete systems

    to manage and support patient care at hospitals and clinics. - Lots of manual duplication of data across systems. • In 2013, UCLA implemented an EMR (Electronic Medical Record) from Epic - we call it UCLA CareConnect - Eliminated manual data duplication. Replaced numerous legacy systems with one integrated system - All data in one place • Still some limitations – No way to search on Pathology cases
  9. Why do we need Case Search? • Crucial business need

    to be able to search on Pathology cases for QA/QC, College of American Pathology (CAP) inspections, clinical trials, and to have advanced text search • The EMR did not have this advanced search capability within their pathology application • The Pathology Department considered this to be a must-have feature - This missing feature has held some institutions back from implementing that part of their EMR. 9
  10. What is Case Search? • Custom developed application that enables

    advanced search of Pathology Cases powered by Elasticsearch • Deeply integrated with our EMR (Epic) – UCLA CareConnect. • Used by Clinicians and Lab Technicians to search lab results. 10
  11. None
  12. Why did we choose Elasticsearch? • High performance, distributed. •

    Provides Google-like search with advanced features - Fuzzy search, ranking, suggestions, autocomplete, text highlighting • Ability to search structured, semi-structured, and unstructured text/documents (Document Store) • Scalability as a platform to provide search not only for our current project, but also for other use cases 12
  13. Some of the many companies using Elasticsearch: 13


  15. Elasticsearch Implementation • Elasticsearch 2.3.2 – Planning to upgrade to

    5.2.1 • Production: - Web Tier ๏ 2 Web Servers behind load balancer • @ 8GB / 2vCPU / vCPU / 40GB SSD SAN storage • Session Affinity/Sticky Sessions - Elastic Tier - 5 Node Cluster ๏ 2 Client Nodes – Shared with Web Servers • Heap size= 4GB ๏ 3 Master/Data Nodes • @ 64GB Ram/4 vCPU/ 300GB SSD SAN storage • Heap size = 30500m • Non-Production: Dev/Test similar to Production without HA 15
  16. 16 Load Balancer Web Application Web Services API ElasticSearch Client

    Node Web Application Case Search Application Server Web Services API Case Search API Server ElasticSearch Master/Data Nodes NetScaler Case List Search Application Server Architecture Production oAuth Authentication Server NetScaler oAuth Authentication Servers Logstash Production DB Compliance Audit Log
  17. Elasticsearch Configuration • Lessons Learned (Thanks Jared!): - Set the

    heap size < 30500m if you have 64GB Ram - SSL issues ๏ Conversion from PFX to JKS- Long story short, get your certs in the correct format to begin with – otherwise use KeyStore Explorer - Separate your installation directories from your data - Keep your installation generic, and feed in your configuration through the command line on startup - Use NSSM (Non-Sucking Service Manager) to run Elastic and Kibana as services.

  19. Mapping Challenges

  20. 20 This is a sample image Parent Child Mapping

  21. 21 Pros: Pros and Cons of Parent Child Mapping Cons:

    • The parent document and all of its children must live on the same shard which leads to hot-spots. • To include parent information always have to use inner_hits and traversing through the resulted hits is very time consuming and it is not efficient. • On update to a parent document there is lots of disk IO read/write due to deletion of an existing record and creating a new one. • Parent-child queries can be 5 to 10 times slower than the equivalent nested query! • The parent document can be updated without re- indexing the children. • Good for frequent update to child documents without effecting parent document. • Child documents can be returned as the results of a search request. • Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.
  22. 22 PUT /company { "mappings": { "branch": {}, "employee": {

    "_parent": { "type": "branch" } } } } Sample: Child Document (Employee) Company (Top Level Parent) --- Branch (Second Level Parent) --- Employee(Child)
  23. 23 This is a sample image De-Normalizing Data

  24. 24 Pros: Pros and Cons of De-normalization Cons: • Biggest

    disadvantage is that the index will be bigger and there are more indexed fields. This usually isn’t a huge problem. The data written to disk is highly compressed, and disk space is cheap. Elasticsearch can happily cope with the extra data. • The more important issue is that, if the parent or child record needs to be updated you need to re- index the whole document. • The advantage of data de-normalization is speed. Because each document contains all of the information that is required to determine whether it matches the query, there is no need for expensive joins.
  25. 25 PUT /patientcaseindex { "mappings": { "patientcase": { "properties": {

    … "BillingSummary": { "properties": { "ChargeCodeDescription": { "type": "string", "index": "not_analyzed" }, "ChargeCodeId": { "type": "string", "index": "not_analyzed" }, "Quantity": { "type": "integer" } … Sample: Object Array Type (BillingSummary)
  26. 26 This is a sample image Nested Object

  27. 27 Object Array storage: Stored in Elastic (key-value pair) Actual

  28. 28 Need of nested objects: • When we use object

    arrays, these JSON documents in Elastic indexes are flattened in a Key-Value format which loses document correlation, this leads to false positives results when you are looking for a unique set of data. "comments.name": [ alice, john, smith, white ], "comments.comment": [ article, great, like, more, please, this ], • Nested objects are designed to solve this problem where each nested object is stored as hidden separate document. • By indexing each nested document separately the fields within documents maintain their relationships. Now it’s possible to match documents when match is found with in the same document. • Here is reference from Elastic: • https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
  29. 29 Nested Object storage: Actual data Stored in Elastic (hidden

  30. 30 Pros: Pros and Cons of Nested Objects Cons: •

    Nested objects are stored as separate hidden documents, we can’t query them directly. Instead we have to use nested query to access them. • To add, change, or delete a nested document, the whole document must be re-indexed. This becomes more costly the more nested documents there are. • Search requests return the whole document, not just the matching nested documents. • It cannot be used when you need a complete separation between the main document and its associated entities. • The nested type is a specialized version of the object array data type that allows arrays of objects to be indexed and queried independently of each other. • Nested query and filter provide fast query-time joins.
  31. 31 PUT /patientcaseindex { "mappings": { "patientcase": { "properties": {

    … "CaseResults": { "type": "nested", "properties": { "ResultSection": { "type": "string", "index": "not_analyzed" }, "ResultText": { "type": "string", "term_vector": "with_positions_offsets" } … Sample: Nested Type Mapping (CaseResults)
  32. What else we have implemented?

  33. 33 "settings { "index": { "analysis": { "filter": { "name_synonyms":

    { "type": "synonym", "synonyms_path": “path//synonyms.txt" }}, "analyzer": { "name_synonym": { "filter": ["name_synonyms", "standard", "lowercase"], "type": "custom", "tokenizer": "standard" } … Synonym Tokenizer
  34. 34 { "query": { "match": { "PatientLastName": { "query": "smth",

    "fuzziness": 1, "operator": "and" } } } } Fuzzy Search
  35. 35 "settings": { "index": { "analysis": { "tokenizer": { "ngram_tokenizer":

    { "token_chars": ["letter", "digit"], "min_gram": "2", "type": "ngram", "max_gram": "15" }}, "analyzer": { "autocomplete": { "filter": "lowercase", "tokenizer": "ngram_tokenizer" }}}}} Autocomplete Tokenizer
  36. 36 { "query": { "query_string": { "query": “818*", "fields": [

    “Fax.raw", “Phone.raw" ] } } } Multi-field Search
  37. • Range • Wildcard • Term • Terms • Match

    • Phrase Match Other Query Types we used 37

  39. Date Range Auto Complete Advanced Search Multi- field Search options

    Results Tab Search Tab
  40. None
  41. 41 Questions? Shehzad Sheikh Manager, Analytic Solutions – SSheikh@mednet.ucla.edu Vivek

    Katakwar Analytics Application Developer – VKatakwar@mednet.ucla.edu Paul Tung Solutions Architect – PTung@mednet.ucla.edu
  42. www.elastic.co