Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Domain Data Platform for Scalable Data Management

Domain Data Platform for Scalable Data Management

Domain Data Platform for Scalable Data Management
by Weera Kasetsin, CPO at LINE Thailand

LINE Developers Thailand

November 01, 2024
Tweet

More Decks by LINE Developers Thailand

Other Decks in Technology

Transcript

  1. คุณเคยประสบปัญหาเหล่านี้หรือไม่? > I want to ingest my application (domain) data

    to IU, but I have to wait for a data engineer to do it > A data engineer has to learn every schema > When the IU resource is insufficient, you cannot use any BI reports, even if it’s just the operational report
  2. What-if > You (and your members) have more flexibility to

    develop and use BI reports or operational reports with less dependence on the central platform > No need many data engineers to handle data ingestion. > Engineers can handle data integration with less effort (More focused on creating data facility tools) > We can fully utilize IU computing resources rather than data ingestion task
  3. What we believed Centralized Data Analytic Platform > Improved collaboration

    (maximize data utilization across the company) > Fewer resources required > Help to streamline processes > Improve security > Better data quality
  4. Centralized Data Analytic Platform (CDAP) What we know about it

    > Monolithic designs > Centralistic operation models > Always complex due to various requirements > Insufficient resource utilization in the analysis
  5. The Downside of CDAP > The complexity of raw data

    is that use cases always require reworking the data. > Data quality problems must be sorted out, transformations are required, and other data are enriched to bring the data into context. > When data is repeatedly copied and scattered throughout the organization, it becomes more difficult to find its origin and judge its quality.
  6. The Downside of CDAP > Requires you to develop a

    single logical view of the same data that is managed in different locations. > Extensive data distribution makes controlling the data much more difficult because data can be spread even further
  7. Data Strategy What is it? A data strategy is a

    long-term plan that defines the technology, processes, people, and rules required to manage an organization's information assets.
  8. Data Strategy Best practice - Data strategy 2 parts: >

    Operational transactional processing > Analytical data warehousing and big data processing
  9. Data Strategy > Focus on your business goals and strategy

    > Determine the correct balance between “defensive” and “offensive.” > Is full control a top priority? Or flexibility for innovation? > How does regulation impact your strategy? > These considerations will influence your initial design and the pace of federating certain responsibilities.
  10. Data Strategy > Operational analytics, focuses on predicting and improving

    the existing operational processes > The analytical results need to be integrated back into the operational system’s core so that insights become relevant in the operational context.
  11. Data Strategy > Regulations, such as the new EU laws

    on data governance and artificial intelligence > Force large companies to be transparent about what data is collected and purchased, what data is combined, how data is used within analytical models, and what data is distributed (sold)
  12. Future of Data Strategy New LCT’s Data Strategy and Platform

    Architecture Balance the centralized and decentralized data strategy, which includes customer-focused, business functions, legal, finance, compliance, and company-wide data governance.
  13. Implementation the new Data Strategy (1) Generating refined data assets

    within the Domain data platform > Empowering the domain to self-managed its data and authorizing Self-served domain analytics > Cost-effective utilization of On-Demand computation resources > Reduce Time to consume data
  14. Implementation the new Data Strategy (2) Publishing Curated Data Assets

    to a Central Data Platform for Cross-Domain Analysis. > Mitigate spoiled data assets, which were prematurely ingested in a centralized data platform from domain raw data assets. > Minimizing Premature Data Governance Efforts
  15. Domain-Data Concept > The (domain) context of the business problem

    influences the design of the application and finds its way into the data > Unique business problems require unique thinking, unique data, and optimized technology to provide the best solution
  16. Data Mesh Architecture > An (exciting new) methodology for managing

    data at large. > The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. > It puts an emphasis on the human factor and addresses the challenges of managing the increasing complexity of data architectures.
  17. Principles for Distributed and Domain-Oriented Data Management 1. Avoid data

    silos 2. Only capture and modify data at the golden source 3. Respect the rules of data ownership
  18. Domain Ownership Responsibilities > Taking ownership of data pipelines, such

    as ingesting, cleaning, and transforming data, to serve as many data customers’ needs as possible > Improving data quality and respecting service level agreements (SLAs) and quality measures set by data consumers > Encapsulating metadata or using reserved column names for fine-grained row/column-level filtering and dynamic data masking
  19. Domain Ownership Responsibilities Adhering to metadata management standards, > Application

    and source system schema registration > Providing metadata for improved discoverability > Observing versioning rules > Linking data attributes and business terms > Ensuring the integrity of metadata information to allow better integration between domains
  20. Domain Ownership Responsibilities > Adhering to data interoperability standards, including

    protocols, data formats, and data types > Providing lineage, either manually or by linking source systems and integration services to scanners > Completing data-sharing tasks, including identity and access management reviews and data contract creation