Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stackable - IT Press Tour #63 Sep 2025

Stackable - IT Press Tour #63 Sep 2025

Avatar for The IT Press Tour

The IT Press Tour PRO

September 10, 2025

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. your data, your platform The modular open source data platform

    Popular data apps. Kubernetes-native. Easy to deploy and run.
 IT Press Tour 63 @Amsterdam

  2. Agenda 1. History 
 2. Why Stackable?
 3. What Pains

    do we Ease?
 4. Who are our competitors?
 5. How does it work?
 6. Next steps
 7. Case Studies
 8. Where to from here?

  3. A bit of context • GFS Paper
 • The rise

    of Hadoop
 • “Data is the new oil”
 • Cloudera, Hortonworks, MapR, IBM, Intel, ….
 • Clouds appear on the horizon..
 • The fall of Hadoop distros..

  4. Stackable in a nutshell Our Team
 
 20 experts
 


    International in Germany and Europe
 Founded
 
 2020
 Our Offer
 
 > Product Support
 > Big Data Consulting
 > Training
 Stackable Data Platform
 
 > Open Source
 > Infrastructure as Code
 > Cloud-nativ (Kubernetes)
 > On-Premises, Cloud, Hybrid
 Our Customers
 Our Network

  5. Stackable has a stable foundation • Deep technological expertise in

    the (big) data ecosystem
 • Large customer network around the world
 • Network of experts and partners from the data and streaming world
 • Many years of expert knowledge of the data & analytics market
 • Access to 250+ customers in DACH in all industries
 • Strong market presence and network
 • Support of Stackable through the shared services organization of b.telligent
 • Strategic focus on open source software
 • Capital and developers for the further development of Stackable
 • Data centers in Europe and the USA
 • C5 certificate, BSI IT basic protection & ISO 27001 certified
 • Consortium partner in GAIA-X funding projects

  6. Stackable Management Team Sebastian Amtage
 CEO
 Founder and Managing Director

    of b.telligent, founder and investor of several companies in the data-driven B2B sector 
 Jim Halfpenny
 MD Stackable UK
 Expert for Big Data solutions, very experienced in the design of big data architectures and open source projects 
 Fabian 
 Jasinski
 CRO
 Expert with many years of experience in technology management, digital transformation, product management, and data science 
 Sönke 
 Liebau
 CPO
 Co-Founder of Stackable and OpenCore, has been working with (Big)-data open source software, speaker, contributor to various projects 
 Lars 
 Francke
 CTO
 Co-founder of Stackable and OpenCore, committer to open source projects, has been working in the (big) data sector since 2008 
 Dr. Stefan 
 Igel
 COO
 Big data expert with many years of experience in IT projects, agile leadership, team and organizational development 

  7. A data platform like many others? Stackable Data Platform
 Cloud

    and/or data center
 Business systems
 Financial applications
 Machine sensors
 Smart meter sensors
 Social Media
 eCommerce / retail
 Pictures / videos
 Weather
 Satellites
 Reporting/BI
 Dashboards
 Control Panel
 Data Science
 Machine Learning
 AI
 APIs
 Applications
 Data products
 collect
 Save/process
 provide
 Any amount of data
 
 Batch processing & event streaming in real time
 Various data formats and structures

  8. Data sovereignty: challenges without your own data platform Lock-in with

    the (cloud) provider
 Inadequate service & support
 High license and operating costs
 Increasing complexity of data management
 Limited expandability
 No state-of-the-art 
 platform security
 Lack of support for compliance & regulatory requirements

  9. Everything from one source
 All the open source tools for

    your modern data platform from a single source - no patchwork or vendors and a single point of contact.
 
 Flexible, modular, expandable
 Choose the ideal combination of data tools in different versions and move away from monolithic distributions. 
 
 Easy integration
 Uniform interfaces for monitoring, alerting and log management simplify individual integration into your corporate IT.
 
 Made in Europe
 Developed in Europe according to European security standards - including vulnerability management, VEX declarations, consideration of CRA etc.
 Popular Data Apps
 Offers a curated selection of the best open source data apps such as Apache Kafka®, Apache Druid, Trino and Apache Spark™.
 Kubernetes-native
 Based on Kubernetes, the platform runs everywhere - in your own data center or in the cloud
 Easy to set up and operate
 All apps work seamlessly together and can be added or removed to create unique data architectures.
 Why Stackable? your platform
  10. Flexible answers for next-generation data platforms Option 1
 Stackable Data

    Platform
 for on-premises 
 open-source data platforms
 Option 2
 Stackable data platform for open-source data services with an IaaS / K8S provider
 Option 3
 Managed Stackable Data Platform as PaaS service in the IONOS Cloud
 The Future
 GAIA-X compliant, sovereign data services powered by Stackable

  11. Questions 1. Have you ever installed a project from the

    Apache Software Foundation?
 2. Do you know what Kerberos is?
 3. Have you ever encountered an “No valid path to certificate root”-exception?
 4. Ever wonder if the setting “timeout.duration” takes seconds, milliseconds or minutes?
 5. Many many many more examples…

  12. Popular data apps. Kubernetes-naive. Easy to deploy and run. Storage


    100% open source
 
 Modular and flexible
 
 In any cloud and in your 
 own data center
 
 also as Managed Service with IONOS Cloud
 Visualization
 Analytics & AI
 Data Processing
 Infrastructure
 Orchestration
 Data Ingestion
 Security
 Monitoring
 Logging

  13. SDP: Infrastructure Orchestration Apache Airflow
 Streamlining Data Workflows with precision

    and scalability.
 • Advanced Workflow Orchestration: Apache Airflow provides comprehensive workflow planning and management that enables precise control of data processing tasks within the Stackable Data Platform.
 • Dynamic Pipeline Creation: Easily define, schedule, and monitor complex data pipelines using Airflow’s intuitive UI and powerful programming framework. Customize your workflows to match your data processing needs perfectly.
 • Scalable and Reliable: Airflow can be easily scaled. Simultaneous workflows can be handled effortlessly so that data tasks are executed reliably regardless of volume.
 • Efficient monitoring and logging: Airflow’s monitoring functions enable quick identification and resolution of problems and ensure smooth data operation. 

  14. SDP: Data Ingestion Apache NiFi
 Facilitates ingestion, data flow management

    and automated data exchange between systems.
 • Easy Data Routing and Transformation: Offers a user-friendly interface for data flow management, supporting rapid design and deployment of processing pipelines.
 • System Integration: Connects to a variety of data sources and sinks, facilitating data ingestion from disparate systems.
 • Data Lineage: Tracks data flow from source to destination, enhancing auditing and compliance.
 • Flexibility: Customizable processors and the ability to handle various data formats and sizes. 
 Apache Kafka
 Distributed event streaming system, providing robust, scalable messaging and stream processing.
 • High Throughput: Capable of handling millions of messages per second, making it ideal for large-scale message processing tasks.
 • Scalability: Easily scales out with minimal downtime, supporting growing data needs.
 • Stability and Reliability: Ensures data is not lost and can withstand failures, maintaining data integrity.
 • Versatility: Supports a wide range of use cases. 

  15. SDP: Storage Apache Hadoop Distributed File System
 The scalable Hadoop

    Distributed File System
 • Fault Tolerance: HDFS is built to handle hardware failures gracefully. It achieves this by data replication and automated recovery.
 • Scalability: HDFS is designed to handle vast amounts of data and can scale out horizontally by adding more commodity hardware nodes.
 • High Throughput Access: HDFS is optimized for streaming large files rather than random access. Its design focuses on maximizing data bandwidth, making it ideal for processing large datasets in batch jobs. 
 Apache HBase
 a distributed, scalable, and NoSQL database designed for real-time read and write access to big data. 
 • Scalability: HBase scales horizontally by adding more nodes to handle increased data and throughput without a significant drop in performance.
 • High Performance: It is optimized fast random, real-time read/write operations on massive datasets, making it suitable for applications requiring low-latency access to specific rows or columns.
 • Schema Flexibility: HBase stores data in a column-family format, allowing for sparse datasets and efficient access to specific data subsets. It doesn't enforce a fixed schema, enabling flexibility to add or modify columns on the fly without downtime. 
 Apache Druid
 Provides real-time analytics and OLAP querying capabilities, ideal for insights on streaming data.
 • Real-Time Analytics: Designed for sub-second query response times, making it ideal for interactive applications.
 • Scalability: Handles massive volumes of data and concurrent users without compromising performance.
 • High Availability: Distributed architecture ensures that the system is always on and can serve queries even during partial failures. 

  16. Trino
 Enables (virtualized) data access across different data sources and

    improves the flexibility and speed of queries.
 • Flexible Table Formats: Supporting Apache Iceberg and Delta Lake.
 • Fast Query Processing: Engineered for high-speed data querying across distributed data sources.
 • Federated Queries: Allows querying data from multiple sources, simplifying analytics across disparate data stores (data federation).
 • Scalable and Flexible: Easily scales to accommodate large datasets and complex queries. 
 Apache Spark
 Offers powerful data processing capabilities, enabling performant complex analytics and machine learning on big data.
 • In-Memory Computing: Accelerates processing speeds by keeping data in RAM, significantly faster than disk-based alternatives.
 • Advanced Analytics: Supports complex algorithms for machine learning, graph processing, and more.
 • Fault Tolerance: Resilient distributed datasets (RDDs) provide fault tolerance through lineage information.
 • Language Support: Offers APIs in Python, Java, Scala, and R, broadening its accessibility and usability. 
 Apache Hive Metastore
 A centralized repository for storing metadata about Hive tables, databases, and other schema-related objects.
 • Centralized Metadata Management: Maintains metadata for all Hive tables, e.g. schema information, table locations, partitioning, and more. It enables seamless integration with tools like Apache Spark, Trino, and other query engines that rely on Hive for schema discovery.
 • Partition and Schema Handling: Partition awareness significantly improves query performance for large datasets by partition pruning.
 • Schema Evolution: It supports schema changes, such as adding or modifying columns, without requiring migration of existing data. 
 SDP: Data Processing
  17. SDP: Data Visualization Apache Superset
 A modern, open-source data exploration

    and visualization platform.
 • Interactive Data Exploration: It provides an intuitive interface for querying and exploring data. Users can write SQL queries or use drag-and-drop features to slice, filter, and group data interactively.
 • Custom SQL Lab: A powerful SQL IDE for running queries against databases and visualizing results in real time.
 • Advanced Data Visualization: It supports a wide variety of rich, interactive charts and dashboards, such as bar charts, heatmaps, time series, maps, and more.
 • Customizable Dashboards: Users can create and organize visualizations into shareable dashboards with drag-and-drop ease.
 • Integration with Popular Databases: Superset connects seamlessly to a broad range of databases via SQLAlchemy, including Trino, Druid, MySQL, PostgreSQL and others. 

  18. Questions 1. Have you ever gotten an Excel sheet with

    367 vulnerabilities that were found in your application from the Compliance department with the request to “comment on these please” ?
 2. Have you ever tried to find someone actually knowledgeable in the support department of one of your large vendors?
 3. Have you ever tried buying support for an open source product?

  19. Stackable Data Platform - License and Subscription model Permissive License

    (AL 2.0)
 SDP Subscription Copyleft License (OSL 3.0)
 SDP Data Apps
 Apache Kafka, Apache NiFi, Apache Spark, Trino, Apache Druid, Apache Superset, Apache Zookeeper, Apache HBase, 
 Apache Hive MS, Apache Hadoop DFS
 SDP Operators
 Operators for managing data apps for 
 standardized day2ops incl. settings, monitoring, logging, ...
 Config Checks, Dashboards, Metrics, Dual License, VEX…
 Support
 Everything that makes customer life easier

  20. Stackable Data Platform - Subscription and Support Services 
Community Edition


    Basic Subscription
 Business Subscription
 SDP Software Management Framework (Operator & Configuration Framework)
 ✔
 ✔
 ✔
 SDP Operators
 ✔
 ✔
 ✔
 Telemetry (Monitoring / Alerting / Logging)
 ✔
 ✔
 ✔
 Security (Encryption, Authentication, Authorization)
 ✔
 ✔
 ✔
 Software vulnerability management (incl. SBOM)
 ✔
 ✔
 ✔
 Community Support
 ✔
 ✔
 ✔
 Configuration templates
 
 ✔
 ✔
 Preconfigured monitoring dashboards
 
 ✔
 ✔
 Vulnerability Exploitability eXchange (VEX) for Open Source Data Apps
 
 ✔
 ✔
 Support Ticket System (Jira) 
 
 ✔
 ✔
 Individual license agreement
 
 
 ✔
 Service level agreement (9/5)*
 
 
 ✔
 Support for SDP on Red Hat® OpenShift®
 
 
 ✔
 *Extended service times on individual request

  21. Advanced supply chain security Vulnerability scanners 
 CI Pipeline
 Runtime

    Environment 
 Container Registry 
 Source code 
 Container image
 SBOM
 Container image
 Customer
 Act based on the VEX advisories (e.g. update container image to latest version)
 Build Process 
 Report scan results
 Cryptographically attested to 
 Apply patches and create new version 
 Publish analysis results (VEX)
 Daily scans
 Pull from 
 SBOM
 Container image
 Push and sign
 Push and attest to container image 
 Verify signature and run
 Can inspect 
 + Our commitment to secure software
 • Correct and signed SBOMs 
 • Signed Docker images 
 • Effective vulnerability management for open-source products 
 • Machine-readable VEX statements in CSAF format
 • CVE Numbering Authority
 • Participation in standardisation committees
 ◦ ECMA International
 ◦ DIN
 ◦ EU CRA Expert Group
 • SBOM = Software Bill of Materials
 • VEX = Vulnerability Exploitability Exchange
 • CSAF = Common Security Advisory Framework 

  22. Professional service & training by our experts Razvan
 Big Data

    
 Engineer
 Andrew
 Data Architect
 Lukas
 Software 
 Engineer
 Benedikt
 Software 
 Engineer
 Xenia
 Site Reliability Engineer
 Nick
 Rust Developer
 Sebastian
 Software 
 Engineer
 Maximilian
 Software 
 Engineer
 Big Data Consulting
 • Big data architecture consulting
 • Data Platform workshops
 • Migration support
 • Big data team advisor
 • Workshops on the open source data products
 Big Data Training
 • Stackable Data Platform basics
 • Big data ecosystem overview
 • Big data security
 • High availability
 • Monitoring and optimization of clusters

  23. Referral and reseller partner model 
 Powerful, open platform
 


    • Big Data, BI, ML & AI capable
 • Open-Source for maximum flexibility
 
 Financial advantages
 
 • Referral fee of 5 % of turnover
 • Reseller discount of up to 20 % for independent support
 • Extra- discounts for multi-year contracts (up to 6 %)
 • Full price transparency
 
 Partner Support
 
 
 • Training & certifications
 • Marketing materials
 • Product support (demos, sales strategies)
 • Possibility to act as a subcontractor for Stackable on projects.
 
 Flexible design possibilities
 
 • Exclusive lead registration
 • Stackable and partners work together on marketing activities.
 
 High quality technical support
 
 • 1st & 2nd Level Support with Stackable-Backup
 • Customization of support levels

  24. AWS

  25. An enterprise-ready data platform Security
 • TLS encryption of communication


    • Authentication via various protocols (e.g. LDAP or Kerberos)
 • Centralized certificate management
 • Role/rule-based rights management
 • Single sign-on via OpenID Connect (OIDC) 
 Logging
 • Standardized interfaces for all SDP components 
 • Integrated log aggregation for all SDP components 
 • Exemplary integration OpenSearch & OpenSearch Dashboard 
 Monitoring
 • Standardized interface for all SDP components 
 • Exemplary connection to Prometheus and Grafana
 • Predefined Grafana dashboards 
 Management
 • Versioned configurations (Infrastructure as Code)
 • Start / Stop / Restart
 • Service discovery 

  26. “Data Platform as Code” approach "Data Platform as Code"
 (based

    on Infrastructure-as-Code (IaC)
 Stackable data platforms are defined declaratively in order to specify configurations such as size, components and use cases. This enables automated, standardized and efficient deployment via GitOps.
 
 Advantages:
 Consistency
 Standardized environments, fewer errors
 Speed
 Faster deployment and updates
 Version control
 Comprehensible changes, simple rollbacks
 Collaboration
 Clearly defined configurations promote teamwork
 Security
 Integrated compliance and security checks

  27. “Data Platform as Code” approach Stackable Data Platform:
 Software components

    that install the latest open source products
 • install,
 • manage,
 • monitor,
 • update,
 • secure
 using a 100% infrastructure-as-code approach
 Define it once…
 Git Repository
 deploy Deploy again and again…
 On-premises NiFi Cluster
 Kafka Cluster
 ZooKeeper Cluster
 NiFi Cluster
 Kafka Cluster
 ZooKeeper Cluster
 Cloud …everywhere
 control
  28. Streaming & big data infrastructure as code on Kubernetes Stackable

    Repositories
 Stackable Helm Repository
 Data Platform Definitions
 Repository
 Stackable Container Registry
 CRD 
 Stackable Control Plane (Operators) Stackable Operator for Apache NiFi
 Data Platform Deployment Stackable Operator for Apache Kafka
 Stackable Operator for Apache ZooKeeper
 install Kafka CRD
 ZK CRD
 Nifi CRD
 deploy Kafka Cluster
 ZooKeeper Cluster
 control 
 control 
 control 
 NiFi Cluster

  29. Deployment in different environments - different clouds Stackable Repositories
 Stackable

    Helm Repository
 Data Platform Definitions
 Repository
 Stackable Container Registry
 IONOS Managed Kubernetes IMK
 CRD 
 Kafka Cluster
 NiFi Cluster
 ZooKeeper Cluster
 Azure Managed Kubernetes (AKS) IMK CRD 
 Kafka Cluster
 NiFi Cluster
 ZooKeeper Cluster
 deploy IMK
 CRD 
 AKS
 CRD 

  30. Deployment in different environments - different stages Stackable Control Plane

    (Operators) Development Environment
 NiFi Cluster
 Kafka Cluster
 ZooKeeper Cluster
 Prod
 CRD 
 NiFi Cluster
 Kafka Cluster
 ZooKeeper Cluster
 Production Environment
 Dev CRD 
 Dev
 CRD 
 deploy Prod
 CRD 
 Stackable Repositories
 Stackable Helm Repository
 Data Platform Definitions
 Repository
 Stackable Container Registry

  31. Just get started… … with only one command*:
 
 


    
 
 
 
 
 
 
 
 
 *Prerequisites: These tools must be installed: Docker / Kubernetes
 Ki
 stackablectl operator install trino druid superset commons secret opa Kind Cluster
  32. Low water or risk of flooding - the water levels

    of our rivers have become a matter of public interest in times of climate change.
 Our Stackable Data Platform demo shows the water levels of rivers almost in real time based on data from Pegel Online.
 Several components of the Stackable Data Platform work together without requiring a great deal of configuration:
 Apache Nifi and Apache Kafka are used to retrieve water level measurements from measuring stations distributed throughout Germany via an API from Pegel Online and store them in Apache Druid.
 Druid is a scalable real-time database that can be queried using SQL. This method is used in the demo to query the water levels via Apache Superset and visualize them in the dashboard. For permanent storage, Druid requires a so-called “deep storage”, which in our example is realized via MinIO as S3-compatible object storage, as it is available in most public and private cloud environments.
 Try it for yourself - demos and tutorials Real-time display of water levels
 More Demos

  33. Stackable service offer Stackable's commercial service offering covers all data

    strategy requirements.
 SDP QuickStart
 Experienced Stackable consultants support the customer in live sessions. After an introduction to Kubernetes, initial installation options will be discussed and practical installation support will be offered. Finally, Stackable will support the customer with the next steps. The customer will then be able to use the SDP for initial deployment and/or further assessments.
 SDP QuickCheck
 Workshops are used to record, evaluate and analyze the current situation. Use cases are then collected and evaluated. The current situation and the use cases form the basis for an initial outline of the target architecture and implementation recommendations. As a result, the discussed solution approaches are provided as a rough concept including a functional reference architecture and a possible roadmap.
 SDP PoC
 Depending on the customer situation, a longer-term professional proof of concept is developed with the customer. Initial use cases are implemented with the help of Stackable consultants. The customer receives a detailed picture of Stackable as a new data platform.
 SDP Subscription
 An SDP subscription entitles the customer to Stackable support throughout the entire lifecycle of use. Different support models offer specific options to meet the customer's need for longer support: whether by informal email, 5/9 or even 24/7 helpdesk, all with or without SLAs, Stackable can cover most situations - even as a layer with a customer-owned multi-level support organization.
 starting at 10.000 EUR
 starting at 16.000 EUR
 starting at 11.000 EUR
 starting at 1.600 EUR

  34. Further information on the case can be found at:
 Customized

    solutions for every industry
 Urban data platforms for smart cities https://stackable.tech/en/open-source-data-platform/
 Data catalogue & metadata
 Highlights:
 Data integration & aggregation
 Open Source & cloud-first
 Urban Data Platforms
  35. Further information on the case can be found at:
 Customized

    solutions for every industry
 Real-time Fraud Detection https://stackable.tech/en/financialservices/
 Security By Design
 Highlights:
 Open Source & Support
 Private & Public Cloud Support
 Financial Services
  36. Further information on the case can be found at:
 Customized

    solutions for every industry
 Performance monitoring for modern waste-to-energy solutions https://stackable.tech/en/data-platform-for-manufacturing-industry/
 Batch Processing & Reporting
 Highlights:
 (Real-)Time Streaming Processing & Monitoring
 Self-Service-Analytics
 Manufacturing
  37. Further information on the case can be found at:
 Customized

    solutions for every industry
 Gaia-X Data Spaces https://stackable.tech/en/gaia-x-dataspaces/
 Identity and trust
 Highlights:
 Data exchange
 Orchestration of services
 Data Spaces
  38. Further information on the case can be found at:
 Customized

    solutions for every industry
 Gaia-X Industrial Data Spaces https://stackable.tech/en/gaia-x-dataspaces/
 Trustworthy supply chains
 Highlights:
 Digital twins
 Platform-based maintenance
 Data Spaces
  39. Further information on the case can be found at:
 Customized

    solutions for every industry
 FAIR and Open Data https://stackable.tech/en/opendata/
 Data access and interoperability
 Highlights:
 (Meta)data standardization
 Data Governance
 Open Data
  40. Roadmap Higher Level Functionality
 
 04
 Integration with Metadata Tools


    
 03
 Security Abstraction Layer
 
 02
 OpenSearch Operator
 
 01

  41. Did We Talk About Everything? 1. Genesis of the company,

    background, financial
 2. Mission and Vision
 3. Pains and challenges you address
 4. Why Stackable and value prop. aligned on these challenges
 5. Products pitch and details (how it works), and how do you solve these pains
 6. Demo if you consider it add values to the pitch
 7. Use cases
 8. Case studies (references even anonymous)
 9. Competition and differentiators, why you win
 10. Roadmap
 11. Go to market, partner ecosystem, oem, distribution/reseller
 12. Pricing model (Preise nicht schreiben, aber sagen)
 13. A bit of future directions for the company and product, what's next
 14. Questions, questions, questions