Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pass On What You Have Learned: Deploying to Pro...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for Ian Lee Ian Lee
March 13, 2024

Pass On What You Have Learned: Deploying to Production

Presentation about the LLNL experience with deploying Elastic into production over the past year. Discussion of both the good things we've seen, as well as a variety of warts we've found along the way.

Originally presented at: https://elasticpublicsectorsummit.upgather.com/

Avatar for Ian Lee

Ian Lee

March 13, 2024
Tweet

More Decks by Ian Lee

Other Decks in Technology

Transcript

  1. LLNL-PRES-861410 This work was performed under the auspices of the

    U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE- AC52-07NA27344. Lawrence Livermore National Security, LLC Pass On What You Have Learned: Deploying to Production Elastic Public Sector Summit 2024 Ian Lee HPC Security Architect Security Operations Team Lead 2024-03-13
  2. 4 LLNL-PRES-861410 • Each bubble represents a cluster • Size

    reflects theoretical peak performance, color indicates computing zone • Only large user-facing compute clusters are represented El Capitan vs the Rest El Capitan ~ 2 EF (~ 2,000,000 TF)
  3. 6 LLNL-PRES-861410 § Significantly better performance à Opens new doors

    for analysis of log data § Splunk (fast mode) — index=lc | stats count by source | sort –count § Elastic (ESQL) — from logs-* | stats count = count() by log.file.path | limit 10000 | sort count desc Performance is Noticeably Better Lookback # documents Splunk (fast mode) Elastic (ESQL) 60 minutes 50 M 133 sec ~ 2 sec 24 hours 1.8 B 2,294 sec ~ 10 sec 7 days 14 B 10,440 sec ~ 20 sec
  4. 11 LLNL-PRES-861410 § Explore other offerings — ML / Anomaly

    Detection — Enterprise Search (unified search across web + confluence + gitlab) — Elastic Defend § More automated alerts / processes Monitoring Vision Going Forward
  5. 14 LLNL-PRES-861410 § We’ve taken a significant hit to capabilities

    we’ve enjoyed for years. § Operational monitoring of our HPC systems is not as good today as it was with Splunk. — User Experience in particular is not as polished. — Enrich/Transform/Watcher system has been a significant pain point. § Looking forward to partnering with the Elastic and Federal communities further — https://github.com/LLNL/elastic-stacker — Upcoming Continuous Monitoring Dashboard repository — ESQL “Difficult to see; always in motion is the future” - Yoda
  6. 15 LLNL-PRES-861410 § Kibana — [Dashboard][Research] Refactor Grid and Layout

    Systems #88710 ** — Filter only the relevant panels in a dashboard #170395 ** — Sparklines #3395 — Allow dynamic naming of file attachments to watcher emails #169891 — [Fleet] Allow KQL queries with no field specified in fleet endpoints #171425 § Elastic Agent — Reusable integration policies #2227 ** — Fleet Server configuration does not contain all the hosts available in the Elasticsearch cluster #2784 § Elasticsearch — GPU accelerated Machine learning #61690 § Integrations — Gitlab #1741 Ongoing Work We’re Excited About
  7. 16 LLNL-PRES-861410 § LLNL product decisions and timelines are decades

    long — We are expected to deliver on our timelines and roadmaps § Will issues that we care about receive meaningful attention? Product Roadmaps? https://github.com/elastic/kibana/issues/17888