$30 off During Our Annual Pro Sale. View Details »

Keynote: Kishore Gopalakrishna, StarTree - The Rise of Real-Time Analytics | RTA Summit 2023

Keynote: Kishore Gopalakrishna, StarTree - The Rise of Real-Time Analytics | RTA Summit 2023

In recent years, real-time streaming has revolutionized transactional data, and while legacy data warehouse processes have been replaced by data lakes and cloud-native solutions, traditional batch stack are no longer adequate to meet the needs of today’s fast-paced world where everyone is a decision maker.

Leading companies like LinkedIn, Uber, Stripe, and others worldwide have realized that analytics must keep up with the transformation that application architectures have undergone. Merely generating reports and dashboards for internal decision-makers is no longer sufficient. In today’s competitive environment, businesses require real-time, actionable insights that can guide their interactions with users on websites, mobile apps, and services. The rise of real-time analytics is imperative to empower businesses to make data-driven decisions in the moment, enabling them to stay ahead in the ever-evolving world where every user is a decision maker. Embracing this shift towards real-time analytics is essential to deliver exceptional user experiences and meet the dynamic demands of the modern business landscape.

StarTree
PRO

May 23, 2023
Tweet

More Decks by StarTree

Other Decks in Technology

Transcript

  1. The Rise of
    Real-Time Analytics
    Kishore Gopalakrishna

    View Slide

  2. Java Database Real-Time

    View Slide

  3. Decisions, Decisions, Decisions…
    35K
    Per day
    1 Billion
    In a lifetime

    View Slide

  4. Decisions, Decisions, Decisions…
    35K
    Per day
    1 Billion
    In a lifetime

    View Slide

  5. But the real question is:
    HOW are the decisions being made?

    View Slide

  6. Everyone
    Intuition Based
    Executives
    Data-Driven
    (Internal)
    Operators Engineers
    Data-Driven
    (Executives, CXOs)
    Evolution of Decision-Making

    View Slide

  7. What about our users and customers?
    and data freshness?

    View Slide

  8. User-Facing Analytics
    Customers and users
    Everyone
    (billions)
    Everyone in
    the company
    (millions)
    Executives
    (thousands)
    Real-Time Analytics
    World is constantly changing

    View Slide

  9. Use Case: User-Facing + Real-Time Analytics
    Analysts
    Data Science
    Workbench
    Internal Users
    Operational
    Workbench
    Operators
    External Users
    Eater
    Rider Restaurant
    Deliver Order
    Pickup

    View Slide

  10. Real-Time Analytics
    Maximize Value
    Accuracy
    Agility
    Real-Time
    Batch
    Value
    Time
    Milliseconds Seconds Minutes Days Months

    View Slide

  11. Defining Real-Time?
    Value
    Time
    Maximize the Value
    Based on Use Case
    Impact Point

    View Slide

  12. REAL-TIME
    INTERNAL
    EXTERNAL
    BATCH
    User-Facing + Real-Time Analytics

    View Slide

  13. Real-Time + User-Facing
    Batch + Internal

    View Slide

  14. Walk Bike Car Flight Rocket
    Real-Time != Micro-Batching

    View Slide

  15. Internal Users
    Batch
    Existing Batch Architecture
    Database ETL
    Datalake /
    DWH
    Hours/Days
    Events Insights

    View Slide

  16. Internal Users
    Batch
    Rise of Real-Time Architecture
    Database ETL
    Datalake /
    DWH
    Hours/Days
    Real-Time
    Event Source
    Events Insights

    View Slide

  17. Internal Users
    Batch
    Rise of Real-Time Architecture
    Database ETL
    Datalake /
    DWH
    Hours/Days
    Real-Time
    Event Source
    Real-Time
    Processing
    Events Insights

    View Slide

  18. Internal Users
    Batch
    Rise of Real-Time Architecture
    Database ETL
    Datalake /
    DWH
    Hours/Days
    Batch
    Real-Time
    Event Source
    Real-Time
    Processing
    Events Insights

    View Slide

  19. Internal Users
    Batch
    Rise of Real-Time Architecture
    Database ETL
    Datalake /
    DWH
    Hours/Days
    Real-Time
    Event Source
    Real-Time
    Processing
    Batch
    Real-Time
    Database
    Missing piece
    Milliseconds/secs
    Events Insights

    View Slide

  20. Internal Users
    Batch
    Rise of Real-Time Architecture
    Database ETL
    Datalake /
    DWH
    Hours/Days
    Real-Time
    Event Source
    Real-Time
    Processing
    Batch
    Real-Time
    Database
    Missing piece
    External Users
    Milliseconds/secs
    Events Insights

    View Slide

  21. IBM Db2
    mSQL
    Rocket M204
    Which Database?

    View Slide

  22. 1PB+
    DATA SIZE
    200K+
    QUERIES/SEC
    < 100ms
    QUERY LATENCY

    View Slide

  23. Apache Pinot Impact
    1,000 Nodes
    75
    Nodes
    45X Improvement in Efficiency
    ● 5,000 queries/sec
    ● ~5ms average latency
    ● <100ms 95th percentile
    After
    Before
    Before Pinot
    After Pinot
    1,000 queries/sec
    5,000 queries/sec

    View Slide

  24. Powered by Apache Pinot
    Retail FinTech/Banking Food/Logistics Media/Comms
    Cloud Native/
    SaaS
    Other industries

    View Slide

  25. Community
    Contributors
    300+
    Slack Members
    4,600+
    Docker Downloads
    5.6M+

    View Slide

  26. Dimensions of Real-Time Analytics

    View Slide

  27. Dimensions for Real-Time Database
    Freshness Minutes Seconds
    Minutes Seconds Milliseconds
    1 User 10’s 100’s - Millions
    Days
    Latency
    Concurrency
    Data Warehouse Real-Time Database

    View Slide

  28. The Power of Indexes
    Other databases try and do the same work faster, Pinot works differently
    Indexes: Startree, Inverted, Sorted, JSON, GEO
    Users can run a lot more queries with the same resources

    View Slide

  29. Multiple Use Cases, Single System
    Time
    Value
    Raw Data
    Decreasing value over
    time for single event
    Aggregated Data
    Increasing value over time for
    aggregated event

    View Slide

  30. Data age
    Query frequency
    Local Storage
    Cloud Storage
    Ultra-low latency but
    tightly coupled
    Slight latency trade-off in
    decoupled
    StarTree: Cost, Performance Trade-Off
    30 days and older

    View Slide

  31. Data age
    Query frequency
    Local Storage
    Cloud Storage
    StarTree: Cost, Performance Trade-Off

    View Slide

  32. Apache Pinot™ as a Service

    View Slide

  33. View Slide

  34. Public SaaS
    Customer Network StarTree Network
    Control Plane
    Data Plane
    End Users
    Apps
    Systems
    End Users
    Private SaaS
    (Bring Your
    Own Cloud)
    Data
    Plane
    Apps
    Systems
    Control Plane
    StarTree Flexible Deployment

    View Slide

  35. StarTree Applications

    View Slide

  36. Startree Customers
    “Pinot enables us to execute sub-second petabyte-scale
    aggregation queries over fresh financial events in our internal
    ledger. We choose Pinot because of its rich feature set and
    scalability, which has enabled better performance than our
    previous solution – at a lower cost”
    Stripe
    “StarTree Cloud made it easy to get started with Pinot and
    real-time applications. We were able to ingest batch data and
    use real-time apps to significantly reduce Mean Time to Detect
    and Mean Time to Respond for key business metrics ”
    Just Eats Takeaway

    View Slide

  37. Make the leap to real-time. We are here to help.
    Everyone
    Random /
    Intuition Based
    Executives
    Data-Driven
    (Internal)
    Data-Driven
    (Users/Customers)
    Operators Engineers
    Everyone
    Data-Driven
    (Executives, CXOs)

    View Slide

  38. And prepare for the future.
    Everyone
    Random /
    Intuition Based
    Executives
    Data-Driven
    (Internal)
    Data-Driven
    (Users/Customers)
    Operators Engineers
    Everyone
    Data-Driven
    (Executives, CXOs)
    Data-Driven
    (Machines)
    Machines

    View Slide

  39. Thank you.

    View Slide

  40. The power of indexes
    Indexes: Startree, Inverted, Sorted, JSON, GEO
    Users can run a lot more queries with the same resources
    Other databases try and do the same work faster, Pinot works differently

    View Slide

  41. Dimensions for Real-Time Database
    Freshness Minutes Seconds
    Minutes Seconds Milliseconds
    1 User 10’s 100’s - Millions
    Days
    Latency
    Concurrency
    Data
    Warehouse
    Real-Time
    Database

    View Slide

  42. Customer Slides
    Retail FinTech/Banking Food/Logistics Media/Comms
    Cloud Native/
    SaaS
    Other industries

    View Slide

  43. The Power of Indexes

    View Slide