Trino (Presto) Raw Data Data Mart DSI Persona DS.DATASET Data API In-house Tool DSI Job Server DS.DATASET Job Server Cassandra UI Service Computing Data DS.API Data Mart DB
Trino (Presto) Raw Data Data Mart DSI Persona DS.DATASET Data API In-house Tool DSI Job Server DS.DATASET Job Server Cassandra UI Service Computing Data DS.API Data Mart DB
Same context - with or without space - word order - spelling inconsistencies - unique noun - products name - professional jargon keywords metric1 metric2 “yahoo! DataSolution” 1200 650 “データソリューション Yahoo!” 600 400 “yahoo!DataSolution” 400 300 keywords metric1 metric2 “yahoo! DataSolution” 2200 1350
such as technical terms - Realization of summarization by context “Yahoo!DataSolution” “Yahoo! DataSolution” “Yahoo!” & “DataSolution” “Yahoo!” & “Data” & “Solution” which one to choose?
region label1 group 1 metrics 20221118 tokyo “価格” “商品A” 1200 20221118 tokyo “価格” “商品B” 600 20221118 osaka “価格” “商品A” 800 20221118 nagoya “価格” “商品A” 600 virtual table label group
is necessary to modify the configuration depending on the application, either generic or dedicated - Ensure that the master data is in one place - Eliminating the need for pre-processing - Create necessary rules - Use the table structure consciously so that it can be used for analysis