Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hydrolix - IT Press Tour #58 Oct. 2024

Hydrolix - IT Press Tour #58 Oct. 2024

The IT Press Tour

October 09, 2024

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Collect faster | Retain more | Query everything HYDROLIX OVERVIEW

    FOR IT PRESS TOUR MARTY KAGAN, CEO » WIFI: Hyatt_Meeting » Pwd: Hydrolix
  2. Agenda » Introductions » Hydrolix Overview ◦ History and technology

    ◦ User overviews » Featured Users and News ◦ Akamai ◦ New Balance ◦ Quesma (with demo) » Q&A
  3. Hydrolix At a Glance » Streaming data lake » 287

    customers, 12 x YoY growth » 59 in EMEA » Global team of 100+ » ~ 15% in EMEA including key leadership » Key leadership in EMEA » David Sztykman, Chief Strategy Officer, FR » Jason Turner, VP of Global Services, UK » Alexandra Keith, Director of Sales, EMEA, UK » Marty Kagan, CEO/Co-founder » Akamai’s Director of Technology, International based in Paris (2004-07)
  4. A Global Customer Base ➢ The largest worldwide brands in

    ◦ media ◦ consumer packaged goods ◦ gaming ◦ finance ◦ automotive ◦ software ◦ sportswear ◦ security ➢ Sold direct and through partners like Akamai
  5. From the pioneers in Distributed Observability | The Genesis of

    Hydrolix » Firsthand experience handling massive data sets at Cedexis and Splunk » Processed real-time logs – multiple CDNs and client telemetry » Cost of data storage and processing grew to be second only to headcount » Before Hydrolix, companies were simply throwing the data away
  6. What is a streaming data lake? All the properties of

    a traditional data lake » Flexible schema » Raw storage » Decoupled storage » Independently scalable query and ingest compute
  7. What is a streaming data lake? All the properties of

    a traditional data lake » Flexible schema » Raw storage » Decoupled storage » Independently scalable query and ingest compute With specific optimizations for real-time log streams » Streaming ingest that scales to 100’s of TBs/day » ETL for real-time log stream processing » Combine multiple log sources into a single table » SQL and Spark Query on ingest » Multi-year hot storage » Especially for high-cardinality/dimensionality data
  8. INGEST STORE QUERY STREAM | BATCH LOG RUM CDN SIEM

    APP NET DS2 MultiCDN CMCD DNS mPulse INTAKE PEERS 20x-50x COMPRESSION S3 COMPATIBLE OBJECT STORE QUERY PEERS COMPRESS DEPLOY ON: AKAMAI LKE | GOOGLE GKE | AMAZON EKS | AZURE AKS SPLUNK DATABRICKS LOOKER GRAFANA SUPERSET ETC. APIs SQL REST JDBC SPARK DASHBOARDS
  9. Notable Architectural Features » Kubernetes-native » Decoupled storage plus advanced

    compression (20x-50x) » Ingest, query compute scale independently » Apply transforms to data on ingest » Query isolation and scale-up/down query pools » SQL, Spark query interfaces
  10. How the Streaming Data Lake Benefits Users Combine multiple log

    sources into a single table SQL and Spark Query on ingest Multi-year hot storage Raw logs means greater visibility, better outcomes Streaming ingest that scales to 100’s of TBs/day ETL for real-time log stream processing
  11. SUPER SCALABILITY Paramount » Peak ingestion rate of 10.8 million

    rows/sec » 53 billion records collected » 41 TiB of data transformed into 5.76 TiB of compressed data stored. » Hydrolix’s merge service continues to compress the data over time At peak, across all clients » 20 million rows/sec » 100 billion log lines
  12. “ It’s been great working with the Hydrolix team overhauling

    two of our data platforms. We’ve addressed cost, performance, and scale challenges and as a result we have a much more robust platform. – Sean McCarthy Director of Product Management Paramount MULTI-CDN OBSERVABILITY AT SCALE
  13. How the Streaming Data Lake Benefits Users Combine multiple log

    sources into a single table SQL and Spark Query on ingest Multi-year hot storage Raw logs means greater visibility, better outcomes Streaming ingest that scales to 100’s of TBs/day ETL for real-time log stream processing
  14. Note three (3) second ingest latency at ~10M events/sec Customer

    dashboard with real-time ingest with ETL
  15. How the Streaming Data Lake Benefits Users Combine multiple log

    sources into a single table SQL and Spark Query on ingest Multi-year hot storage Raw logs means greater visibility, better outcomes Streaming ingest that scales to 100’s of TBs/day ETL for real-time log stream processing
  16. “ Previously, managing our CDN log files from various vendors

    was a manual, cumbersome process involving decompression and searching. Monitoring, controlling, and configuring this setup was challenging. Hydrolix lets us see all our CDN logs together, in one dashboard. – Simon LaRoque Project Manager TF1 MULTIPLE VENDORS, ONE DASHBOARD
  17. How the Streaming Data Lake Benefits Users Combine multiple log

    sources into a single table SQL and Spark Query on ingest Multi-year hot storage Raw logs means greater visibility, better outcomes Streaming ingest that scales to 100’s of TBs/day ETL for real-time log stream processing
  18. “ The entire [DDoS] event from spotting to stopping the

    attack was instant. No sites went down and our customers experienced zero impact. We have a lot of seasonality in our traffic and just being able to scale on-demand infrastructure just from an ingest and query standpoint, that gives us a ton of flexibility. – Jonas Petersson Team Lead for eCommerce ELKJØP PREVENTS BLACK FRIDAY DDOS ATTACK
  19. How the Streaming Data Lake Benefits Users Combine multiple log

    sources into a single table SQL and Spark Query on ingest Multi-year hot storage Raw logs means greater visibility, better outcomes Streaming ingest that scales to 100’s of TBs/day ETL for real-time log stream processing
  20. Most companies don’t keep data past 90 days because of

    cost Cost to store a TB goes down over time HYDROLIX OTHER DB’s 3 DAYS HYDROLIX OTHER DB’S 12 MOS 3 days 7 days 15 days 30 days 90 days 6 months 12 months TCO OVER TIME
  21. Key Benefits Hydrolix » Combine multiple log sources into a

    single table SQL and Spark Query on ingest » Multi-year hot storage » Raw logs means greater visibility, better outcomes » Streaming ingest that scales to 100’s of TBs/day » ETL for real-time log stream processing
  22. “In my nearly three years at [a global media and

    entertainment company ], this is the first event where every single impacting issue was proactively identified and mitigated through our TrafficPeak observability work - not a single issue remained hidden, unknown, or unresolved. - Sr. Dir, Cloud Architecture at a global media company 2024 SUMMER OLYMPICS “
  23. FOR REAL-TIME AND HISTORICAL BIG DATA ANALYTICS Use Cases »

    Platform and network observability » Compliance (long-term, hot data retention) » SIEM (cybersecurity) » Multi-CDN observability and traffic steering » Real user monitoring (media, SaaS, e-commerce) » ML/AI – anomaly detection, feature storage » Bot, piracy, and fraudulent user detection » Anomaly detection » APT/Retro hunts
  24. Data Ecosystem Connectors to leading data platforms » Splunk »

    Spark » Kibana (ELK) Complimentary - store raw logs, export summary views and query results » Big Query » Snowflake
  25. Comparing Hydrolix and Snowflake Core-hour Ingest + Compute + Data

    Transfer + Concurrent Warehouses COST MODEL Decoupled, stateless Decoupled, stateless 15 months+ (user option) 12 months Yes No. Micro-batching with ELT 20-50x 4x On-prem, managed, SaaS SaaS DEFAULT RETENTION COMPRESSION ETL & STREAMING INGEST ARCHITECTURE DELIVERY MODELS
  26. Comparing Hydrolix and Big Query Core-hour Ingest + per Query

    + Storage COST MODEL Decoupled, stateless Decoupled, stateless 12 months or more (user option) 60 days Yes Yes (1 GB/sec limit) 20-50x 6-12x On-prem, managed (AWS, GCP, Azure, Akamai) SaaS (GCP only) DEFAULT RETENTION COMPRESSION STREAMING INGEST ARCHITECTURE DELIVERY MODELS
  27. Power and protect life online. Dan Lawrence, Vice President, Global

    Enterprise Cloud Sales + New Balance customer perspective
  28. What is Akamai Connected Cloud? Akamai Connected Cloud is a

    massively distributed edge and cloud platform. • brings together core cloud computing and edge computing, along with industry-leading security and content delivery • makes it easy for businesses to develop and run applications and workloads, while putting experiences closer to users and keep threats farther away. TrafficPeak is the observability product built on Akamai Connected Cloud.
  29. What is TrafficPeak? TrafficPeak, on Akamai Connected Cloud, is an

    observability platform that enables you to ingest, monitor, query, store and analyze massive amounts of data in real time, at 75% less cost than other providers. With its visualization dashboards, you can uncover and address security issues preemptively, safeguarding your brand’s trust and desirability.
  30. Problems TrafficPeak Solves • Colossal amounts of data are expensive

    to collect & retain. • Current SIEM technology struggles to scale to collect and report on so much data. • Discarding data for budgetary reasons rather than risk considerations. Without historical data it’s tough to find the root cause of events, which elevates the risk of repeat attacks. It also helps uncover how long an attack has been occuring. • Alert fatigue. Can’t find the needle in the haystack. • Real-time visibility into imminent threats is hard, especially when data is displayed on multiple dashboards. • The bigger the data, the slower and more expensive the query process, which lengthens time to mitigation.
  31. What you can achieve with TrafficPeak • Reduce observability costs

    by 75%. Repurpose your resources on other priority projects. • Ingest as much data as you want and retain it for 15 months or longer. • Prevent incidents. Use historical data to reduce the risk of repeat attacks and discover how long attacks have been happening. • Uncover incidents in real time by viewing alerts, trends and anomalies in your data, and mitigate them before they impact your users. • Find the data you need in seconds, no matter its age. Keep all data hot. • Ingest and combine all data types — DS2, SIEM, mPulse, DNS and CMCD — into one unified dashboard view of the data that matters most. • Integrate Akamai SIEM and other data feeds with one click. No extra provisioning needed.
  32. What New Balance achieved with TrafficPeak • Who is New

    Balance • Main online goals • Problem • How they solved it with TP • Value of Akamai Connected Cloud
  33. Mission statement: Help customers to innovate faster by re-shaping the

    way applications are architected and connected to their DBs.
  34. Who are we? - Founded Sep 2023 - Early stage

    startup with $2.5M seed funding - Main investors: Heartcore (Neo4j) & Inovo (Spacelift) - Core engineering team in Warsaw, PL Jacek Migdal, co-founder & CEO Ex Facebook, Nvidia, Sumo Logic Pawel Brzoska, co-founder & CPO Ex Dynatrace, Sumo Logic + Team of 5 seasoned engineers Ex Elastic, ScyllaDB, OpenTelemetry, Dynatrace, Sumo Logic
  35. The problem CTOs and their teams are unable to modernise

    database stack due to tight client to DB coupling and complex application architectures. This slows innovation, introduces bad customer experience and affects IT budgets.
  36. The solution Route your data through an intelligent gateway that

    breaks the dependency of app and DB layers. Save the cost, streamline devops and improve the performance by introducing elasticity via DB connection fabric.
  37. Solve the big problem by splitting it into two Hard

    to overhaul as a single step Application stack Database stack Modernise gradually, step-by-step Application stack Modernise first, easy to go back Database stack Gateway
  38. The problem Inverted index while being designed and working great

    for document search, tends to be costly and slow for OLAP use cases required in log management for Security or Observability
  39. The solution Use Quesma to connect your Kibana/OSD and Beats/Logstash/Prepper

    to a modern platform of your choice (e.g. Hydrolix)
  40. ES queries / Kibana OpenSearch Dashboards Beats/Logstash DataPrepper bulk insert

    BENEFIT Save money, time and precious resources of engineering team by: - using the DB platform designed for the use case you need it for - keeping full backward compatibility of your Security/ Observability Elastic stack
  41. News for ITPressTour: Quesma and Hydrolix Partnership Making database migrations

    easier with Kibana » The partnership adds Kibana to the list of analytics tools teams can use with the Hydrolix platform. » The integration enables ELK stack users to offload their log data to Hydrolix’s cheaper and faster platform and keep using Kibana and Logstash/Beats. » This can dramatically reduce the total cost of ownership (TCO) for ELK use cases that involve log data, including observability.
  42. News for ITPressTour: Quesma and Hydrolix Partnership Special Capabilities of

    Hydrolix with the ELK Stack » Hydrolix can replace Elastic for use cases that involve log data at petabyte scale and can complement an existing ELK stack. » Hydrolix can ingest data from both Logstash and Beats; just point existing agents and pipelines towards Hydrolix. » Hydrolix works with Elastic Common Schema (ECS), transforming logs in real time when they're ingested, providing tremendous flexibility in terms of mapping data. » Hydrolix enables ingestion of multiple log sources into a single table where logs can easily be compared.
  43. News for ITPressTour: Quesma and Hydrolix Partnership News release being

    shared via email and will publish tomorrow morning.