locations, -> For simple, unrepeated queries : Use Athena for short interactive queries -> For complex or repeated analysis on the same data schema : Relationalize (Transform) and load onto Redshift.
Version a. MSCK REPAIR TABLE bs_go_general 2. Manual Version a. ALTER TABLE bs_go_general ADD IF NOT EXISTS PARTITION (year = {year}, month = {month}, day = {day}) LOCATION 's3://buzzvil-log/prod/buzzscreen/buzzscreen-go-general/{year}/{month}/{day}/'; Table Partitioning
aggregate, window, and array functions a. https://prestodb.io/docs/0.172/functions.html 2. In memory SQL engine a. We don’t need to manage this 3. Optimized for star schema joins (1 large Fact table and many smaller dimension tables)
changes in data schema from S3 to data sink 3. Manage Athena, S3, and Redshift at the same time 4. Relationalize data in S3, keep track of schema, sync necessary data to redshift or use Spectrum if applicable
into more efficiently queryable formats ( Parquet, Relationalized JSON, etc ) 2. Decouple ETL process from the rest of the service 3. Easily monitor data loading procedures (CloudWatch) a. https://docs.aws.amazon.com/glue/latest/dg/monitor-glue.html