Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Thomson Reuters, Research, Journalism, Finance, & Elastic

Elastic Co
February 17, 2016

Thomson Reuters, Research, Journalism, Finance, & Elastic

From log analysis to ensure systems are functioning, to a search platform for insight into financial trends and journalism assets – the Elastic Stack is an integral part of Thomson Reuters. Learn about how they customize it to fit their needs through tooling, custom plugins, predictive monitoring, and more.

Elastic Co

February 17, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Vsu Subramanian, VP Platform Engineering Jonathan Wentz, Lead Software Engineer

    February 17th, 2016 The TR Elasticsearch Ecosystem
  2. 5 THOMSON REUTERS •  Thomson Reuters is the world’s leading

    source of intelligent information for businesses and professionals •  We combine industry expertise and innovative technology to deliver critical information to leading decision makers •  We are powered by the world’s most trusted news organization •  We serve professionals in the financial and risk, legal, tax and accounting, intellectual property and science and media markets
  3. 6 TR BUSINESSES Tax & Accounting Financial & Risk Thomson

    Reuters Tax & Accounting is the leading global provider of integrated tax compliance and accounting information, software and services for professionals in accounting firms, corporations, law firms and government. Intellectual Property & Science Legal Thomson Reuters Intellectual Property & Science is the leading provider of comprehensive intellectual property (IP) and scientific information, decision support tools, and services that enable governments, academia, publishers, corporations and law firms to discover, develop and deliver innovations. Thomson Reuters Legal is the leading provider of critical information, decision support tools, software and services to legal, investigation, business and government professionals around the world. We offer a broad range of online services that utilize our databases of legal, regulatory, news and business information. Thomson Reuters Financial & Risk is the leading provider of regulatory and operational risk management solutions. These solutions deliver critical news, information and analytics, enable transactions, and bring together communities that allow trading, investing, financial and corporate professionals to connect. Reuters News Powered by more than 2,600 journalists reporting in 20 languages from bureaus around the world, Reuters is the world’s largest international news organization
  4. THE SEARCH INFORMATION PLATFORM Published Documents Content Production & Editorial

    Responsible for creating and managing definitive content and preparing it for product use Document Retrieval Metadata & Navigation Services News Cases Statutes Codes Tax Laws Filings Patents Briefs Content Management Systems Search Information Platform Search platform Responsible for scalable and performant shared access to content via services including search, metadata access and navigation services. Enables content co-mingling & sharing Search
  5. SEARCH IS VERY IMPORTANT TO US •  Natural Language /

    Boolean •  Fielded Searching •  Ranked Search Results •  Date/Time Searching •  Phrase Searching •  Wildcard •  Metadata Searching •  Non-English Searching –  20 + languages •  Duplicate Document Detection •  Search Faceting/Aggregations •  Snippets/Highlighting •  Phonetic Searching •  Subset Searching •  Numeric Searching •  Alerting – notification when documents with interest have changed Complex full-text query semantics tunes for research-based use cases across the full commingled set of documents
  6. ABOUT OUR TEAM •  Long history & experience with search

    technologies •  We build, manage & support a mission critical information platform –  Petabytes of content –  For e.g: in Legal : 0.5 billion searches per day –  99.99 % SLA •  Search expertise –  Proprietary & elasticsearch elasticsearch Proprietary Search Solution Product D Product C Product B Product A Product Y Product X
  7. ELASTIC – USE CASES •  Full Text Searching of unstructured

    data with complex semantics •  Lookups and aggregations of structured data •  Real time Search •  Non – English Searching •  Log aggregations & analysis (ELK) •  Products/Projects across all business units –  1500 + nodes , 70 + clusters
  8. CHALLENGES – HOW DO WE ? •  Size & provision

    elastic clusters quickly •  Manage & administer all the clusters efficiently •  Manage capacity proactively •  Provide cross site redundancy •  Apply granular security •  Prevent or mitigate “bad stuff” effectively •  Operate at scale with a small team •  Enable faster time to market
  9. Goal – SEARCH AS A SERVICE •  Build a search

    platform that –  is easy to use –  is reliable, scalable & fully operationalized –  Enables rapid go to market •  Build the ecosystem around search to build an intelligent information platform Provide an enterprise search platform built on elasticsearch to accelerate time to market & enablw our products to grow and thrive.
  10. THE PROBLEM •  Long story short, we manage Elasticsearch… – 

    Directly responsible for •  16 partners/projects •  78 clusters •  1,100 nodes –  Total TR body count tally is •  21 partners/projects •  107 clusters •  1,747 nodes •  We’ve seen a lot of Elasticsearch
  11. EXAMPLE USE CASE •  Clusters vary drastically depending on business

    case •  Example: –  3 master nodes –  3 client nodes –  5 data servers, each with 4 elasticsearch instances –  Can service over 340,000 index requests in a minute across over 1,800 indices
  12. BACKGROUND •  Began in late 2013 •  Arose from monitoring

    needs (alerting) •  Transitioned to security needs in 2014 due to the sensitive needs of data •  Needed support for internal authentication, monitoring systems, etc.
  13. AUTOMATED SNAPSHOT/RESTORE •  CRON scripts that automatically perform snapshots – 

    Also cleans up old snapshots •  Notifies operations team upon snapshot errors •  Includes logic to automatically restore to a secondary site (disaster recovery)
  14. WHAT’S NEXT? •  Finish 2.X rollout •  HA Replication • 

    Integration with Database of Record •  Natural Language Search •  Intelligent Federated Search •  Dynamic Scaling in the cloud •  Additional security features •  Manage Ingest throughput