Stackable - IT Press Tour #63 Sep 2025

your data, your platform The modular open source data platform
Popular data apps. Kubernetes-native. Easy to deploy and run.  IT Press Tour 63 @Amsterdam 

Agenda 1. History   2. Why Stackable?  3. What Pains
do we Ease?  4. Who are our competitors?  5. How does it work?  6. Next steps  7. Case Studies  8. Where to from here? 

History

A bit of context • GFS Paper  • The rise
of Hadoop  • “Data is the new oil”  • Cloudera, Hortonworks, MapR, IBM, Intel, ….  • Clouds appear on the horizon..  • The fall of Hadoop distros.. 

There can only be one

And then … Corona

Stackable in a nutshell Our Team    20 experts   
International in Germany and Europe  Founded    2020  Our Oﬀer    > Product Support  > Big Data Consulting  > Training  Stackable Data Platform    > Open Source  > Infrastructure as Code  > Cloud-nativ (Kubernetes)  > On-Premises, Cloud, Hybrid  Our Customers  Our Network 

Stackable has a stable foundation • Deep technological expertise in
the (big) data ecosystem  • Large customer network around the world  • Network of experts and partners from the data and streaming world  • Many years of expert knowledge of the data & analytics market  • Access to 250+ customers in DACH in all industries  • Strong market presence and network  • Support of Stackable through the shared services organization of b.telligent  • Strategic focus on open source software  • Capital and developers for the further development of Stackable  • Data centers in Europe and the USA  • C5 certiﬁcate, BSI IT basic protection & ISO 27001 certiﬁed  • Consortium partner in GAIA-X funding projects 

Ownership Structure of Stackable

Stackable Management Team Sebastian Amtage  CEO  Founder and Managing Director
of b.telligent, founder and investor of several companies in the data-driven B2B sector   Jim Halfpenny  MD Stackable UK  Expert for Big Data solutions, very experienced in the design of big data architectures and open source projects   Fabian   Jasinski  CRO  Expert with many years of experience in technology management, digital transformation, product management, and data science   Sönke   Liebau  CPO  Co-Founder of Stackable and OpenCore, has been working with (Big)-data open source software, speaker, contributor to various projects   Lars   Francke  CTO  Co-founder of Stackable and OpenCore, committer to open source projects, has been working in the (big) data sector since 2008   Dr. Stefan   Igel  COO  Big data expert with many years of experience in IT projects, agile leadership, team and organizational development  

A satisfied community & happy customers

Strong Partners

Why Stackable?

A data platform like many others? Stackable Data Platform  Cloud
and/or data center  Business systems  Financial applications  Machine sensors  Smart meter sensors  Social Media  eCommerce / retail  Pictures / videos  Weather  Satellites  Reporting/BI  Dashboards  Control Panel  Data Science  Machine Learning  AI  APIs  Applications  Data products  collect  Save/process  provide  Any amount of data    Batch processing & event streaming in real time  Various data formats and structures 

Data sovereignty: challenges without your own data platform Lock-in with
the (cloud) provider  Inadequate service & support  High license and operating costs  Increasing complexity of data management  Limited expandability  No state-of-the-art   platform security  Lack of support for compliance & regulatory requirements 

your data, your platform

Everything from one source  All the open source tools for
your modern data platform from a single source - no patchwork or vendors and a single point of contact.    Flexible, modular, expandable  Choose the ideal combination of data tools in diﬀerent versions and move away from monolithic distributions.     Easy integration  Uniform interfaces for monitoring, alerting and log management simplify individual integration into your corporate IT.    Made in Europe  Developed in Europe according to European security standards - including vulnerability management, VEX declarations, consideration of CRA etc.  Popular Data Apps  Oﬀers a curated selection of the best open source data apps such as Apache Kafka®, Apache Druid, Trino and Apache Spark™.  Kubernetes-native  Based on Kubernetes, the platform runs everywhere - in your own data center or in the cloud  Easy to set up and operate  All apps work seamlessly together and can be added or removed to create unique data architectures.  Why Stackable? your platform

Flexible answers for next-generation data platforms Option 1  Stackable Data
Platform  for on-premises   open-source data platforms  Option 2  Stackable data platform for open-source data services with an IaaS / K8S provider  Option 3  Managed Stackable Data Platform as PaaS service in the IONOS Cloud  The Future  GAIA-X compliant, sovereign data services powered by Stackable 

What Pains do we Ease?

We solve different pains for different people..

Questions 1. Have you ever installed a project from the
Apache Software Foundation?  2. Do you know what Kerberos is?  3. Have you ever encountered an “No valid path to certiﬁcate root”-exception?  4. Ever wonder if the setting “timeout.duration” takes seconds, milliseconds or minutes?  5. Many many many more examples… 

Popular data apps. Kubernetes-naive. Easy to deploy and run. Storage 
100% open source    Modular and ﬂexible    In any cloud and in your   own data center    also as Managed Service with IONOS Cloud  Visualization  Analytics & AI  Data Processing  Infrastructure  Orchestration  Data Ingestion  Security  Monitoring  Logging 

SDP: Infrastructure Orchestration Apache Airflow  Streamlining Data Workflows with precision
and scalability.  • Advanced Workflow Orchestration: Apache Airflow provides comprehensive workflow planning and management that enables precise control of data processing tasks within the Stackable Data Platform.  • Dynamic Pipeline Creation: Easily define, schedule, and monitor complex data pipelines using Airflow’s intuitive UI and powerful programming framework. Customize your workflows to match your data processing needs perfectly.  • Scalable and Reliable: Airflow can be easily scaled. Simultaneous workflows can be handled effortlessly so that data tasks are executed reliably regardless of volume.  • Efficient monitoring and logging: Airflow’s monitoring functions enable quick identification and resolution of problems and ensure smooth data operation.  

SDP: Data Ingestion Apache NiFi  Facilitates ingestion, data flow management
and automated data exchange between systems.  • Easy Data Routing and Transformation: Offers a user-friendly interface for data flow management, supporting rapid design and deployment of processing pipelines.  • System Integration: Connects to a variety of data sources and sinks, facilitating data ingestion from disparate systems.  • Data Lineage: Tracks data flow from source to destination, enhancing auditing and compliance.  • Flexibility: Customizable processors and the ability to handle various data formats and sizes.   Apache Kafka  Distributed event streaming system, providing robust, scalable messaging and stream processing.  • High Throughput: Capable of handling millions of messages per second, making it ideal for large-scale message processing tasks.  • Scalability: Easily scales out with minimal downtime, supporting growing data needs.  • Stability and Reliability: Ensures data is not lost and can withstand failures, maintaining data integrity.  • Versatility: Supports a wide range of use cases.  

SDP: Storage Apache Hadoop Distributed File System  The scalable Hadoop
Distributed File System  • Fault Tolerance: HDFS is built to handle hardware failures gracefully. It achieves this by data replication and automated recovery.  • Scalability: HDFS is designed to handle vast amounts of data and can scale out horizontally by adding more commodity hardware nodes.  • High Throughput Access: HDFS is optimized for streaming large files rather than random access. Its design focuses on maximizing data bandwidth, making it ideal for processing large datasets in batch jobs.   Apache HBase  a distributed, scalable, and NoSQL database designed for real-time read and write access to big data.   • Scalability: HBase scales horizontally by adding more nodes to handle increased data and throughput without a significant drop in performance.  • High Performance: It is optimized fast random, real-time read/write operations on massive datasets, making it suitable for applications requiring low-latency access to specific rows or columns.  • Schema Flexibility: HBase stores data in a column-family format, allowing for sparse datasets and efficient access to specific data subsets. It doesn't enforce a fixed schema, enabling flexibility to add or modify columns on the fly without downtime.   Apache Druid  Provides real-time analytics and OLAP querying capabilities, ideal for insights on streaming data.  • Real-Time Analytics: Designed for sub-second query response times, making it ideal for interactive applications.  • Scalability: Handles massive volumes of data and concurrent users without compromising performance.  • High Availability: Distributed architecture ensures that the system is always on and can serve queries even during partial failures.  

Trino  Enables (virtualized) data access across different data sources and
improves the flexibility and speed of queries.  • Flexible Table Formats: Supporting Apache Iceberg and Delta Lake.  • Fast Query Processing: Engineered for high-speed data querying across distributed data sources.  • Federated Queries: Allows querying data from multiple sources, simplifying analytics across disparate data stores (data federation).  • Scalable and Flexible: Easily scales to accommodate large datasets and complex queries.   Apache Spark  Offers powerful data processing capabilities, enabling performant complex analytics and machine learning on big data.  • In-Memory Computing: Accelerates processing speeds by keeping data in RAM, significantly faster than disk-based alternatives.  • Advanced Analytics: Supports complex algorithms for machine learning, graph processing, and more.  • Fault Tolerance: Resilient distributed datasets (RDDs) provide fault tolerance through lineage information.  • Language Support: Offers APIs in Python, Java, Scala, and R, broadening its accessibility and usability.   Apache Hive Metastore  A centralized repository for storing metadata about Hive tables, databases, and other schema-related objects.  • Centralized Metadata Management: Maintains metadata for all Hive tables, e.g. schema information, table locations, partitioning, and more. It enables seamless integration with tools like Apache Spark, Trino, and other query engines that rely on Hive for schema discovery.  • Partition and Schema Handling: Partition awareness significantly improves query performance for large datasets by partition pruning.  • Schema Evolution: It supports schema changes, such as adding or modifying columns, without requiring migration of existing data.   SDP: Data Processing

SDP: Data Visualization Apache Superset  A modern, open-source data exploration
and visualization platform.  • Interactive Data Exploration: It provides an intuitive interface for querying and exploring data. Users can write SQL queries or use drag-and-drop features to slice, ﬁlter, and group data interactively.  • Custom SQL Lab: A powerful SQL IDE for running queries against databases and visualizing results in real time.  • Advanced Data Visualization: It supports a wide variety of rich, interactive charts and dashboards, such as bar charts, heatmaps, time series, maps, and more.  • Customizable Dashboards: Users can create and organize visualizations into shareable dashboards with drag-and-drop ease.  • Integration with Popular Databases: Superset connects seamlessly to a broad range of databases via SQLAlchemy, including Trino, Druid, MySQL, PostgreSQL and others.  

Putting it all together

An Example of What our Product Does

Questions 1. Have you ever gotten an Excel sheet with
367 vulnerabilities that were found in your application from the Compliance department with the request to “comment on these please” ?  2. Have you ever tried to ﬁnd someone actually knowledgeable in the support department of one of your large vendors?  3. Have you ever tried buying support for an open source product? 

Stackable Data Platform - License and Subscription model Permissive License
(AL 2.0)  SDP Subscription Copyleft License (OSL 3.0)  SDP Data Apps  Apache Kafka, Apache NiFi, Apache Spark, Trino, Apache Druid, Apache Superset, Apache Zookeeper, Apache HBase,   Apache Hive MS, Apache Hadoop DFS  SDP Operators  Operators for managing data apps for   standardized day2ops incl. settings, monitoring, logging, ...  Conﬁg Checks, Dashboards, Metrics, Dual License, VEX…  Support  Everything that makes customer life easier 

Stackable Data Platform - Subscription and Support Services  Community Edition 
Basic Subscription  Business Subscription  SDP Software Management Framework (Operator & Configuration Framework)  ✔  ✔  ✔  SDP Operators  ✔  ✔  ✔  Telemetry (Monitoring / Alerting / Logging)  ✔  ✔  ✔  Security (Encryption, Authentication, Authorization)  ✔  ✔  ✔  Software vulnerability management (incl. SBOM)  ✔  ✔  ✔  Community Support  ✔  ✔  ✔  Configuration templates    ✔  ✔  Preconfigured monitoring dashboards    ✔  ✔  Vulnerability Exploitability eXchange (VEX) for Open Source Data Apps    ✔  ✔  Support Ticket System (Jira)     ✔  ✔  Individual license agreement      ✔  Service level agreement (9/5)*      ✔  Support for SDP on Red Hat® OpenShift®      ✔  *Extended service times on individual request 

Advanced supply chain security Vulnerability scanners   CI Pipeline  Runtime
Environment   Container Registry   Source code   Container image  SBOM  Container image  Customer  Act based on the VEX advisories (e.g. update container image to latest version)  Build Process   Report scan results  Cryptographically attested to   Apply patches and create new version   Publish analysis results (VEX)  Daily scans  Pull from   SBOM  Container image  Push and sign  Push and attest to container image   Verify signature and run  Can inspect   + Our commitment to secure software  • Correct and signed SBOMs   • Signed Docker images   • Eﬀective vulnerability management for open-source products   • Machine-readable VEX statements in CSAF format  • CVE Numbering Authority  • Participation in standardisation committees  ◦ ECMA International  ◦ DIN  ◦ EU CRA Expert Group  • SBOM = Software Bill of Materials  • VEX = Vulnerability Exploitability Exchange  • CSAF = Common Security Advisory Framework  

Software Bill of Materials

CRA Expert Group

Members of the Expert Group

Standardisierung für (Open Source) Entwickler

Our Commitment to Security

Professional service & training by our experts Razvan  Big Data
  Engineer  Andrew  Data Architect  Lukas  Software   Engineer  Benedikt  Software   Engineer  Xenia  Site Reliability Engineer  Nick  Rust Developer  Sebastian  Software   Engineer  Maximilian  Software   Engineer  Big Data Consulting  • Big data architecture consulting  • Data Platform workshops  • Migration support  • Big data team advisor  • Workshops on the open source data products  Big Data Training  • Stackable Data Platform basics  • Big data ecosystem overview  • Big data security  • High availability  • Monitoring and optimization of clusters 

Referral and reseller partner model   Powerful, open platform   
• Big Data, BI, ML & AI capable  • Open-Source for maximum ﬂexibility    Financial advantages    • Referral fee of 5 % of turnover  • Reseller discount of up to 20 % for independent support  • Extra- discounts for multi-year contracts (up to 6 %)  • Full price transparency    Partner Support      • Training & certiﬁcations  • Marketing materials  • Product support (demos, sales strategies)  • Possibility to act as a subcontractor for Stackable on projects.    Flexible design possibilities    • Exclusive lead registration  • Stackable and partners work together on marketing activities.    High quality technical support    • 1st & 2nd Level Support with Stackable-Backup  • Customization of support levels 

Who are our Competitors?

But There Are More ..

Daten Infrastruktur auf AWS  Source: https://docs.aws.amazon.com/prescriptive-guidance/latest/strategy-aws-data/aws-architecture.html

Daten Infrastruktur auf Azure  Source: https://azuremarketplace.microsoft.com/de-de/marketplace/consulting-services/inovexgmbh1597328899616.sol-62190-jqf?filters=automotive&country=DE&page=1

Daten Infrastruktur auf Google  Source: https://www.unifieddatascience.com/data-lake-design-patterns-on-google-cloud

Workload Storage, Routing, IAM, … How Stackable can Integrate with
Clouds

ETL QUERY STORAGE VISUALIZATION An Example Architecture

Geschäfts Logik Plattform An Example Architecture

IAM & ACCESS CONTROL METRICS LOGGING Shared Functionality

Die Architektur - Azure  Azure

IAM & ACCESS CONTROL METRICS LOGGING Azure - Shared Functionality

IAM & ACCESS CONTROL METRICS LOGGING AWS - Querschnittsfunktionen  AWS
- Shared Functionality

How does it work?

An enterprise-ready data platform Security  • TLS encryption of communication 
• Authentication via various protocols (e.g. LDAP or Kerberos)  • Centralized certificate management  • Role/rule-based rights management  • Single sign-on via OpenID Connect (OIDC)   Logging  • Standardized interfaces for all SDP components   • Integrated log aggregation for all SDP components   • Exemplary integration OpenSearch & OpenSearch Dashboard   Monitoring  • Standardized interface for all SDP components   • Exemplary connection to Prometheus and Grafana  • Predefined Grafana dashboards   Management  • Versioned configurations (Infrastructure as Code)  • Start / Stop / Restart  • Service discovery  

“Data Platform as Code” approach "Data Platform as Code"  (based
on Infrastructure-as-Code (IaC)  Stackable data platforms are defined declaratively in order to specify configurations such as size, components and use cases. This enables automated, standardized and efficient deployment via GitOps.    Advantages:  Consistency  Standardized environments, fewer errors  Speed  Faster deployment and updates  Version control  Comprehensible changes, simple rollbacks  Collaboration  Clearly defined configurations promote teamwork  Security  Integrated compliance and security checks 

“Data Platform as Code” approach Stackable Data Platform:  Software components
that install the latest open source products  • install,  • manage,  • monitor,  • update,  • secure  using a 100% infrastructure-as-code approach  Deﬁne it once…  Git Repository  deploy Deploy again and again…  On-premises NiFi Cluster  Kafka Cluster  ZooKeeper Cluster  NiFi Cluster  Kafka Cluster  ZooKeeper Cluster  Cloud …everywhere  control

Streaming & big data infrastructure as code on Kubernetes Stackable
Repositories  Stackable Helm Repository  Data Platform Deﬁnitions  Repository  Stackable Container Registry  CRD   Stackable Control Plane (Operators) Stackable Operator for Apache NiFi  Data Platform Deployment Stackable Operator for Apache Kafka  Stackable Operator for Apache ZooKeeper  install Kafka CRD  ZK CRD  Niﬁ CRD  deploy Kafka Cluster  ZooKeeper Cluster  control   control   control   NiFi Cluster 

Deployment in different environments - different clouds Stackable Repositories  Stackable
Helm Repository  Data Platform Deﬁnitions  Repository  Stackable Container Registry  IONOS Managed Kubernetes IMK  CRD   Kafka Cluster  NiFi Cluster  ZooKeeper Cluster  Azure Managed Kubernetes (AKS) IMK CRD   Kafka Cluster  NiFi Cluster  ZooKeeper Cluster  deploy IMK  CRD   AKS  CRD  

Deployment in different environments - different stages Stackable Control Plane
(Operators) Development Environment  NiFi Cluster  Kafka Cluster  ZooKeeper Cluster  Prod  CRD   NiFi Cluster  Kafka Cluster  ZooKeeper Cluster  Production Environment  Dev CRD   Dev  CRD   deploy Prod  CRD   Stackable Repositories  Stackable Helm Repository  Data Platform Deﬁnitions  Repository  Stackable Container Registry 

Next steps

Just get started… … with only one command*:     
                  *Prerequisites: These tools must be installed: Docker / Kubernetes  Ki  stackablectl operator install trino druid superset commons secret opa Kind Cluster

Stackablectl  Stackablectl

Low water or risk of flooding - the water levels
of our rivers have become a matter of public interest in times of climate change.  Our Stackable Data Platform demo shows the water levels of rivers almost in real time based on data from Pegel Online.  Several components of the Stackable Data Platform work together without requiring a great deal of configuration:  Apache Nifi and Apache Kafka are used to retrieve water level measurements from measuring stations distributed throughout Germany via an API from Pegel Online and store them in Apache Druid.  Druid is a scalable real-time database that can be queried using SQL. This method is used in the demo to query the water levels via Apache Superset and visualize them in the dashboard. For permanent storage, Druid requires a so-called “deep storage”, which in our example is realized via MinIO as S3-compatible object storage, as it is available in most public and private cloud environments.  Try it for yourself - demos and tutorials Real-time display of water levels  More Demos 

Stackable service offer Stackable's commercial service offering covers all data
strategy requirements.  SDP QuickStart  Experienced Stackable consultants support the customer in live sessions. After an introduction to Kubernetes, initial installation options will be discussed and practical installation support will be offered. Finally, Stackable will support the customer with the next steps. The customer will then be able to use the SDP for initial deployment and/or further assessments.  SDP QuickCheck  Workshops are used to record, evaluate and analyze the current situation. Use cases are then collected and evaluated. The current situation and the use cases form the basis for an initial outline of the target architecture and implementation recommendations. As a result, the discussed solution approaches are provided as a rough concept including a functional reference architecture and a possible roadmap.  SDP PoC  Depending on the customer situation, a longer-term professional proof of concept is developed with the customer. Initial use cases are implemented with the help of Stackable consultants. The customer receives a detailed picture of Stackable as a new data platform.  SDP Subscription  An SDP subscription entitles the customer to Stackable support throughout the entire lifecycle of use. Different support models offer specific options to meet the customer's need for longer support: whether by informal email, 5/9 or even 24/7 helpdesk, all with or without SLAs, Stackable can cover most situations - even as a layer with a customer-owned multi-level support organization.  starting at 10.000 EUR  starting at 16.000 EUR  starting at 11.000 EUR  starting at 1.600 EUR 

Case Studies

Further information on the case can be found at:  Customized
solutions for every industry  Urban data platforms for smart cities https://stackable.tech/en/open-source-data-platform/  Data catalogue & metadata  Highlights:  Data integration & aggregation  Open Source & cloud-ﬁrst  Urban Data Platforms

solutions for every industry  Real-time Fraud Detection https://stackable.tech/en/ﬁnancialservices/  Security By Design  Highlights:  Open Source & Support  Private & Public Cloud Support  Financial Services

solutions for every industry  Performance monitoring for modern waste-to-energy solutions https://stackable.tech/en/data-platform-for-manufacturing-industry/  Batch Processing & Reporting  Highlights:  (Real-)Time Streaming Processing & Monitoring  Self-Service-Analytics  Manufacturing

solutions for every industry  Gaia-X Data Spaces https://stackable.tech/en/gaia-x-dataspaces/  Identity and trust  Highlights:  Data exchange  Orchestration of services  Data Spaces

solutions for every industry  Gaia-X Industrial Data Spaces https://stackable.tech/en/gaia-x-dataspaces/  Trustworthy supply chains  Highlights:  Digital twins  Platform-based maintenance  Data Spaces

solutions for every industry  FAIR and Open Data https://stackable.tech/en/opendata/  Data access and interoperability  Highlights:  (Meta)data standardization  Data Governance  Open Data

Where to from here?

Roadmap Higher Level Functionality    04  Integration with Metadata Tools 
  03  Security Abstraction Layer    02  OpenSearch Operator    01 

Did We Talk About Everything? 1. Genesis of the company,
background, ﬁnancial  2. Mission and Vision  3. Pains and challenges you address  4. Why Stackable and value prop. aligned on these challenges  5. Products pitch and details (how it works), and how do you solve these pains  6. Demo if you consider it add values to the pitch  7. Use cases  8. Case studies (references even anonymous)  9. Competition and diﬀerentiators, why you win  10. Roadmap  11. Go to market, partner ecosystem, oem, distribution/reseller  12. Pricing model (Preise nicht schreiben, aber sagen)  13. A bit of future directions for the company and product, what's next  14. Questions, questions, questions 

Thank you! Contact:  Sönke Liebau  [email protected]  +49 4103 926 3100 

Stackable - IT Press Tour #63 Sep 2025

Stackable - IT Press Tour #63 Sep 2025

More Decks by The IT Press Tour

Other Decks in Technology

Featured

Transcript