Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streamlining Healthcare & Research at UCLA with...

Elastic Co
March 08, 2017

Streamlining Healthcare & Research at UCLA with Elasticsearch

As healthcare institutions generate more data, they need a way to search through electronic health records (EHR) and find meaningful insights. UCLA Health has chosen Elasticsearch as its tool of choice to index, search, and produce more thorough, actionable results for clinicians and researchers.

Vivek Katakwar l Analytics Application Developer l UCLA Health
Shehzad Sheikh l Manager, Analytical Solutions l UCLA Health
Paul Tung l Solutions Architect l UCLA Health

Elastic Co

March 08, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. UCLA Health Date: March, 7th 2017 @UCLAHealth Streamlining Healthcare and

    Research: The Story of Elasticsearch at UCLA Health Shehzad Sheikh, Manager, Analytical Solutions Paul Tung, Solutions Architect Vivek Katakwar, Analytics Application Developer
  2. Agenda 2 1 Who we are 2 Use Case 3

    Architecture and Infrastructure 4 Elasticsearch Implementation 5 Demo
  3. 4 hospitals 250+ outpatient practices 60,000 hospital encounters 1.9 mil

    outpatient visits 795 Inpatient beds 2013, 2014, & 2015
  4. Larry Page { } If you’re changing the world, you’re

    working on important things. You’re excited to get up in the morning…
  5. Background • Before 2013, UCLA had > 63 discrete systems

    to manage and support patient care at hospitals and clinics. - Lots of manual duplication of data across systems. • In 2013, UCLA implemented an EMR (Electronic Medical Record) from Epic - we call it UCLA CareConnect - Eliminated manual data duplication. Replaced numerous legacy systems with one integrated system - All data in one place • Still some limitations – No way to search on Pathology cases
  6. Why do we need Case Search? • Crucial business need

    to be able to search on Pathology cases for QA/QC, College of American Pathology (CAP) inspections, clinical trials, and to have advanced text search • The EMR did not have this advanced search capability within their pathology application • The Pathology Department considered this to be a must-have feature - This missing feature has held some institutions back from implementing that part of their EMR. 9
  7. What is Case Search? • Custom developed application that enables

    advanced search of Pathology Cases powered by Elasticsearch • Deeply integrated with our EMR (Epic) – UCLA CareConnect. • Used by Clinicians and Lab Technicians to search lab results. 10
  8. Why did we choose Elasticsearch? • High performance, distributed. •

    Provides Google-like search with advanced features - Fuzzy search, ranking, suggestions, autocomplete, text highlighting • Ability to search structured, semi-structured, and unstructured text/documents (Document Store) • Scalability as a platform to provide search not only for our current project, but also for other use cases 12
  9. Elasticsearch Implementation • Elasticsearch 2.3.2 – Planning to upgrade to

    5.2.1 • Production: - Web Tier ๏ 2 Web Servers behind load balancer • @ 8GB / 2vCPU / vCPU / 40GB SSD SAN storage • Session Affinity/Sticky Sessions - Elastic Tier - 5 Node Cluster ๏ 2 Client Nodes – Shared with Web Servers • Heap size= 4GB ๏ 3 Master/Data Nodes • @ 64GB Ram/4 vCPU/ 300GB SSD SAN storage • Heap size = 30500m • Non-Production: Dev/Test similar to Production without HA 15
  10. 16 Load Balancer Web Application Web Services API ElasticSearch Client

    Node Web Application Case Search Application Server Web Services API Case Search API Server ElasticSearch Master/Data Nodes NetScaler Case List Search Application Server Architecture Production oAuth Authentication Server NetScaler oAuth Authentication Servers Logstash Production DB Compliance Audit Log
  11. Elasticsearch Configuration • Lessons Learned (Thanks Jared!): - Set the

    heap size < 30500m if you have 64GB Ram - SSL issues ๏ Conversion from PFX to JKS- Long story short, get your certs in the correct format to begin with – otherwise use KeyStore Explorer - Separate your installation directories from your data - Keep your installation generic, and feed in your configuration through the command line on startup - Use NSSM (Non-Sucking Service Manager) to run Elastic and Kibana as services.
  12. 21 Pros: Pros and Cons of Parent Child Mapping Cons:

    • The parent document and all of its children must live on the same shard which leads to hot-spots. • To include parent information always have to use inner_hits and traversing through the resulted hits is very time consuming and it is not efficient. • On update to a parent document there is lots of disk IO read/write due to deletion of an existing record and creating a new one. • Parent-child queries can be 5 to 10 times slower than the equivalent nested query! • The parent document can be updated without re- indexing the children. • Good for frequent update to child documents without effecting parent document. • Child documents can be returned as the results of a search request. • Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.
  13. 22 PUT /company { "mappings": { "branch": {}, "employee": {

    "_parent": { "type": "branch" } } } } Sample: Child Document (Employee) Company (Top Level Parent) --- Branch (Second Level Parent) --- Employee(Child)
  14. 24 Pros: Pros and Cons of De-normalization Cons: • Biggest

    disadvantage is that the index will be bigger and there are more indexed fields. This usually isn’t a huge problem. The data written to disk is highly compressed, and disk space is cheap. Elasticsearch can happily cope with the extra data. • The more important issue is that, if the parent or child record needs to be updated you need to re- index the whole document. • The advantage of data de-normalization is speed. Because each document contains all of the information that is required to determine whether it matches the query, there is no need for expensive joins.
  15. 25 PUT /patientcaseindex { "mappings": { "patientcase": { "properties": {

    … "BillingSummary": { "properties": { "ChargeCodeDescription": { "type": "string", "index": "not_analyzed" }, "ChargeCodeId": { "type": "string", "index": "not_analyzed" }, "Quantity": { "type": "integer" } … Sample: Object Array Type (BillingSummary)
  16. 28 Need of nested objects: • When we use object

    arrays, these JSON documents in Elastic indexes are flattened in a Key-Value format which loses document correlation, this leads to false positives results when you are looking for a unique set of data. "comments.name": [ alice, john, smith, white ], "comments.comment": [ article, great, like, more, please, this ], • Nested objects are designed to solve this problem where each nested object is stored as hidden separate document. • By indexing each nested document separately the fields within documents maintain their relationships. Now it’s possible to match documents when match is found with in the same document. • Here is reference from Elastic: • https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
  17. 30 Pros: Pros and Cons of Nested Objects Cons: •

    Nested objects are stored as separate hidden documents, we can’t query them directly. Instead we have to use nested query to access them. • To add, change, or delete a nested document, the whole document must be re-indexed. This becomes more costly the more nested documents there are. • Search requests return the whole document, not just the matching nested documents. • It cannot be used when you need a complete separation between the main document and its associated entities. • The nested type is a specialized version of the object array data type that allows arrays of objects to be indexed and queried independently of each other. • Nested query and filter provide fast query-time joins.
  18. 31 PUT /patientcaseindex { "mappings": { "patientcase": { "properties": {

    … "CaseResults": { "type": "nested", "properties": { "ResultSection": { "type": "string", "index": "not_analyzed" }, "ResultText": { "type": "string", "term_vector": "with_positions_offsets" } … Sample: Nested Type Mapping (CaseResults)
  19. 33 "settings { "index": { "analysis": { "filter": { "name_synonyms":

    { "type": "synonym", "synonyms_path": “path//synonyms.txt" }}, "analyzer": { "name_synonym": { "filter": ["name_synonyms", "standard", "lowercase"], "type": "custom", "tokenizer": "standard" } … Synonym Tokenizer
  20. 34 { "query": { "match": { "PatientLastName": { "query": "smth",

    "fuzziness": 1, "operator": "and" } } } } Fuzzy Search
  21. 35 "settings": { "index": { "analysis": { "tokenizer": { "ngram_tokenizer":

    { "token_chars": ["letter", "digit"], "min_gram": "2", "type": "ngram", "max_gram": "15" }}, "analyzer": { "autocomplete": { "filter": "lowercase", "tokenizer": "ngram_tokenizer" }}}}} Autocomplete Tokenizer
  22. 36 { "query": { "query_string": { "query": “818*", "fields": [

    “Fax.raw", “Phone.raw" ] } } } Multi-field Search
  23. • Range • Wildcard • Term • Terms • Match

    • Phrase Match Other Query Types we used 37
  24. 41 Questions? Shehzad Sheikh Manager, Analytic Solutions – [email protected] Vivek

    Katakwar Analytics Application Developer – [email protected] Paul Tung Solutions Architect – [email protected]