Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GCPUGOsaka_11.pdf

Youkey
June 06, 2022

 GCPUGOsaka_11.pdf

For GCPUG Osaka #11 ( With Shiga/Nara)
https://gcpug-osaka.connpass.com/event/248592/

Youkey

June 06, 2022
Tweet

More Decks by Youkey

Other Decks in Technology

Transcript

  1. Agenda 14:50 - 15:00 プレオープン 15:00 - 15:20 GCPUG *の紹介、今日のやること

    15:20 - 16:50 もくもく 16:50 - 17:00 クロージング 17:00 - 何かあれば追加の質問タイム
  2. ようこそ GCPUG Nara へ‼ メンバー 約130名 オーガナイザー: Hironobu NAKANAGA Akira

    TSURUTA 新型コロナ禍の休眠から再始動!! 奈良でゆるりと活動しますー.
  3. ようこそ GCPUG Osaka へ! GCPUG とは、Google Cloud Platform User Group

    といいます。 Googleが提供するクラウドサービスのユーザグループ会で、 Osakaは大阪近隣のユーザが 参加する800人超のブランチです。オフライン復活も視野 オーガナイザー Yuki Nakata tw:(@unclear2st) / fb:(https://www.facebook.com/yuki.nakata.7739) フェンリル株式会社 データエンジニア兼エンジニアリングマネージャー、他 Google Cloud Skills Boost プロフィール https://www.cloudskillsboost.google/public_profiles/1d756237-2a07-4352-9136-c6d749929c71
  4. 気になったところ データセットの登録方法がわかりづらいぞ 新しいデータセットを含むプロジェクトは data-to-insights です。 BigQuery の一般公開データセットは、デフォルトでは BigQuery のウェブ UI

    には表示されません。 data-to-insights は一般公開データセットのプロジェク トであるため、リソースツリーに固定する必要があります。 1. 新しいブラウザ ウィンドウで、一般公開データセットのプロジェクト (https://console.cloud.google.com/bigquery?p=data-to-insights&page=ecommerce)を開きます。 2. 左側のパネルの [リソース] で、[data-to-insights] をクリックします。右側のパネルで、 [プロジェクトを固定 (めっちゃ右側にある )] をクリックしま す。 BigQuery での JSON、配列、構造体の 操作 (チューター用だけなら無視) “SQLで配列を操作する ”セクション データセットで [fruit_details] をクリックします。 じゃなくて、 データセットで [fruit_store]の[新しいテーブルを作成 ]をクリックします Challenge Lab. Task1 (#直った。数値前後が逆だった) 日付分割バージョンの有効期限を 180日に設定するように求められる。だが、 oxford_policy_trackerのデータには2020/1/1から2021/9/10までしか存在しない。バージョニングの有効期限 を設定すると今日現在だと 2021/9/10より180日以上過ぎているためレコードが0になる Challenge Lab. Task5 (6/6直った) bigquery-public-data.covid19_google_mobility.mobility_reportがNot found
  5. BigQuery によるデータ ウェアハウスの構築と最適化: チャレンジラボ タスク 1: 日付でテーブルをパーティション分割する 1. データセット作成 (covid_***)

    2. Publicテーブルから無駄データ (GBRとか)を弾き出しつつ、日付分割テーブルを作成 (oxford_policy_tracker_***) 3. 以下、無茶なクエリ CREATE OR REPLACE TABLE `qwiklabs-gcp-04-0ba24a4a122b.covid_YOURDATASET.oxford_policy_tracker_YOURTABLE` ( country_name STRING OPTIONS(description="Name of the country"), alpha_3_code STRING OPTIONS(description="3-letter alpha code abbreviation of the country/region. See `bigquery-public-data.utility_us.country_code_iso` for more details"), region_name STRING OPTIONS(description="Name of the region within the country"), region_code STRING OPTIONS(description= "Code of the region within the country"), date DATE OPTIONS(description="Date of the measured policy action status"), school_closing STRING OPTIONS(description="C1 - Ordinal scale record closings of schools and universities; 0 - No measures 1 - recommend closing 2 - Require closing (only some levels or categories eg just high school or just public schools) 3 - Require closing all levels No data - blank"), school_closing_flag STRING OPTIONS(description="Are C1 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank"), school_closing_notes STRING OPTIONS(description="Additional details about C1 policy actions"), workplace_closing STRING OPTIONS(description="C2 - Ordinal scale record closings of workplace; 0 - No measures 1 - recommend closing (or work from home) 2 - require closing (or work from home) for some sectors or categories of workers 3 - require closing (or work from home) all-but-essential workplaces (eg grocery stores doctors) No data - blank"), workplace_closing_flag STRING OPTIONS(description="Are C2 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank"), workplace_closing_notes STRING OPTIONS(description="Additional details about C2 policy actions"), cancel_public_events STRING OPTIONS(description="C3 - Ordinal scale record cancellations of public events;0- No measures 1 - Recommend cancelling 2 - Require cancelling No data - blank"), cancel_public_events_flag STRING OPTIONS(description="Are C3 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank"), cancel_public_events_notes STRING OPTIONS(description="Additional details about C3 policy actions"), restrictions_on_gatherings STRING OPTIONS(description="C4 - Ordinal scale to record the cut-off size for bans on private gatherings; 0 - No restrictions 1 - Restrictions on very large gatherings (the limit is above 1000 people) 2 - Restrictions on gatherings between 100-1000 people 3 - Restrictions on gatherings between 10-100 people 4 - Restrictions on gatherings of less than 10 people No data - blank"), restrictions_on_gatherings_flag STRING OPTIONS(description="Are C4 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank"), restrictions_on_gatherings_notes STRING OPTIONS(description="Additional details about C4 policy actions"), close_public_transit STRING OPTIONS(description="C5 - Ordinal scale to record closing of public transportation; 0 - No measures 1 - Recommend closing (or significantly reduce volume/route/means of transport available) 2 - Require closing (or prohibit most citizens from using it)"), close_public_transit_flag STRING OPTIONS(description="Are C5 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank"), close_public_transit_notes STRING OPTIONS(description="Additional details about C5 policy actions"), stay_at_home_requirements STRING OPTIONS(description="C6 - Ordinal scale record of orders to \u201cshelter-in- place\u201d and otherwise confine to home."), stay_at_home_requirements_flag STRING OPTIONS(description="Are C6 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank\"\\"), stay_at_home_requirements_notes STRING OPTIONS(description="Additional details about C6 policy actions"), restrictions_on_internal_movement STRING OPTIONS (description="C7 - Ordinal scale of restrictions on internal movement; 0 - No measures 1 - Recommend closing (or significantly reduce volume/route/means of transport) 2 - Require closing (or prohibit most people from using it)"), restrictions_on_internal_movement_flag STRING OPTIONS(description="Are C7 actions targeted at specific areas or general:0 - Targeted 1- General No data - blank"), restrictions_on_internal_movement_notes STRING OPTIONS(description="Additional details about C7 policy actions"), international_travel_controls STRING OPTIONS(description="C8 - Ordinal scale record of restrictions on international travel; 0 - No measures 1 - Screening 2 - Quarantine arrivals from high-risk regions 3 - Ban on high-risk regions 4 - Total border closure No data - blank"), international_travel_controls_notes STRING OPTIONS(description="Additional details about C8 policy actions"),
  6. BigQuery によるデータ ウェアハウスの構築と最適化: チャレンジラボ タスク 2: テーブルに新しい列を追加する 1. Schema edit

    でテキストで書いちゃう [ { "name": "population", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "country_area", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "mobility", "type": "RECORD", "mode": "NULLABLE", "fields": [ { "name": "avg_retail", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "avg_grocery", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "avg_parks", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "avg_transit", "type": "FLOAT", "mode": "NULLABLE"
  7. BigQuery によるデータ ウェアハウスの構築と最適化: チャレンジラボ タスク 3: 国の人口データを population 列に追加する 1.

    Updateする UPDATE covid_YOURDATASET.oxford_policy_tracker_YOURTABLE t0 SET t0.population = t2.pop_data_2019 FROM (SELECT DISTINCT country_territory_code, pop_data_2019 FROM `bigquery-public-data.covid19_ecdc.covid_19_geographic_distribution_worldwide`) AS t2 WHERE t0.alpha_3_code = t2.country_territory_code;
  8. BigQuery によるデータ ウェアハウスの構築と最適化: チャレンジラボ タスク 4: country_area 列に国の地域データを追加する 1. Updateしちゃう

    UPDATE covid_YOURDATASET.oxford_policy_tracker_YOURTABLE t0 SET t0.country_area = t2.country_area FROM ( SELECT DISTINCT country_name,country_area FROM `bigquery-public-data.census_bureau_international.country_names_area` ) AS t2 WHERE t0.country_name = t2.country_name;
  9. BigQuery によるデータ ウェアハウスの構築と最適化: チャレンジラボ タスク 5: モビリティ レコード データを入力する 1.

    Updateしちゃえ UPDATE covid_YOURDATASET.oxford_policy_tracker_YOURTABLE t0 SET t0.mobility.avg_retail = t2.avg_retail, t0.mobility.avg_grocery = t2.avg_grocery, t0.mobility.avg_parks = t2.avg_parks, t0.mobility.avg_transit = t2.avg_transit, t0.mobility.avg_workplace = t2.avg_workplace, t0.mobility.avg_residential = t2.avg_residential FROM ( SELECT country_region, date, AVG(retail_and_recreation_percent_change_from_baseline) as avg_retail, AVG(grocery_and_pharmacy_percent_change_from_baseline) as avg_grocery, AVG(parks_percent_change_from_baseline) as avg_parks, AVG(transit_stations_percent_change_from_baseline) as avg_transit, AVG( workplaces_percent_change_from_baseline ) as avg_workplace, AVG( residential_percent_change_from_baseline) as avg_residential FROM `bigquery-public-data.covid19_google_mobility.mobility_report` GROUP BY country_region, date ) AS t2 WHERE t0.country_name = t2.country_region AND t0.date = t2.date;
  10. BigQuery によるデータ ウェアハウスの構築と最適化: チャレンジラボ タスク 6: population 列と country_area 列の不足データを問い合

    わせる 1. UNION が大事 SELECT alpha_3_code ,date FROM `qwiklabs-gcp-04-0ba24a4a122b.covid_YOURDATASET.oxford_policy_tracker_YOURTABLE` AS t0 WHERE t0.population IS NULL UNION ALL SELECT alpha_3_code ,date FROM `qwiklabs-gcp-04-0ba24a4a122b.covid_YOURDATASET.oxford_policy_tracker_YOURTABLE` AS t0 WHERE t0.country_area IS NULL